Its probably because csv converts it to numbers. Try changing the format in splunk to string. Something like this
.... rex mode=sed field=ssn "s/(\d{3})(\d{2})(\d{4})/\1-\2-\3/g"
In the testing I've done, the limitation is mostly (if not exclusively) with in Excel. Splunk is correctly preserving the leading zeros in the CSV that is generated... for example 041234567. However, through testing, it can be seen that Splunk is not/cannot (even by force using a tostring()) output those values to CSV with quotes... for example "041234567". However, the reason I say that this is an Excel limitation is because whether your CSV data is or is not wrapped with quotes... 041234567 vs "041234567" Excel will convert both these to numeric and trim the leading zero. I'm fairly certain that this behavior will be consistent regardless of the source system that generates the CSV, Splunk or anything else.
With all that being said, presuming the ultimate goal is to get the CSV to not trim the leading zeros, you have only a few options:
1. Force the data in Splunk to look like an SSN/String by reinserting the dashes as proposed in the SED command above
2. Use the text import wizard of Excel and open the CSV that way. This gives you the option to forcibly specify a column as text in which case the leading zeros will display
3. Apply column formatting in excel to the column to re-display the leading zeros. Simplest option is to highlight the column -> format cells -> special -> social security number
Admittedly option #3 is partially flawed in that it will display 041234567 and 41234567 in exactly the same format (041-23-4567) which is arguably bad or wrong as you can't easily tell that the second value is actually a bogus SSN because it only has 8 digits... not 9.
If anyone else can find a better solution, or think's I've missed or misstated something above, I'm all ears for a better option.
jwiedemann, very kind of you to explain the issue so clearly.
The bigger question is about data typing especially with Hunk, where strong typing exists via Hive. I opened a distinct thread for it at data typing and hunk .
Any feedback would be appreciated.
Its probably because csv converts it to numbers. Try changing the format in splunk to string. Something like this
.... rex mode=sed field=ssn "s/(\d{3})(\d{2})(\d{4})/\1-\2-\3/g"
@ddrillic Assuming this worked for you, I converted it to an answer so you can accept it.
Great. I really wondered about it... so, we can convert a comment to an answer, right?
I wonder also whether we can add quotes and "force" the number to be a string in the csv file. Any ideas?
You could try a | eval ssn=tostring(ssn)
see if CSV sees that as string. I might. Then you dont have to deal with formatting.
Interesting - the | eval ssn=tostring(ssn)
doesn't have any effect on the data type. The UI still shows the ssn as numerical values.
When doing | eval ssn="\"" + tostring(ssn) + "\""
, I get quotes around the ssn in the csv file.
Interesting -
I ran -
basesearch | eval rex mode=sed field=ces_ssn "s/(\d{5})(\d{2})(\d{4})/\1-\2-\3/g"
It says -
Error in 'eval' command: The expression is malformed.
There should not be an eval
before the rex
Gorgeous - thank you!!!