Solved: Exporting SSNs to a CSV file trims leading zeros. ...

ddrillic · ‎04-28-2016

When we export Social Security Numbers to a csv file, the leading zeros of the SSN are being trimmed. We wonder how we can preserve these leading zeros.

Any ideas?

sundareshr · ‎04-28-2016

Its probably because csv converts it to numbers. Try changing the format in splunk to string. Something like this
.... rex mode=sed field=ssn "s/(\d{3})(\d{2})(\d{4})/\1-\2-\3/g"

View solution in original post

jwiedemann_splu · ‎05-02-2016

In the testing I've done, the limitation is mostly (if not exclusively) with in Excel. Splunk is correctly preserving the leading zeros in the CSV that is generated... for example 041234567. However, through testing, it can be seen that Splunk is not/cannot (even by force using a tostring()) output those values to CSV with quotes... for example "041234567". However, the reason I say that this is an Excel limitation is because whether your CSV data is or is not wrapped with quotes... 041234567 vs "041234567" Excel will convert both these to numeric and trim the leading zero. I'm fairly certain that this behavior will be consistent regardless of the source system that generates the CSV, Splunk or anything else.

With all that being said, presuming the ultimate goal is to get the CSV to not trim the leading zeros, you have only a few options:
1. Force the data in Splunk to look like an SSN/String by reinserting the dashes as proposed in the SED command above
2. Use the text import wizard of Excel and open the CSV that way. This gives you the option to forcibly specify a column as text in which case the leading zeros will display
3. Apply column formatting in excel to the column to re-display the leading zeros. Simplest option is to highlight the column -> format cells -> special -> social security number

Admittedly option #3 is partially flawed in that it will display 041234567 and 41234567 in exactly the same format (041-23-4567) which is arguably bad or wrong as you can't easily tell that the second value is actually a bogus SSN because it only has 8 digits... not 9.

If anyone else can find a better solution, or think's I've missed or misstated something above, I'm all ears for a better option.

ddrillic · ‎05-02-2016

jwiedemann, very kind of you to explain the issue so clearly.

The bigger question is about data typing especially with Hunk, where strong typing exists via Hive. I opened a distinct thread for it at data typing and hunk .

Any feedback would be appreciated.

sundareshr · ‎04-28-2016

Its probably because csv converts it to numbers. Try changing the format in splunk to string. Something like this
.... rex mode=sed field=ssn "s/(\d{3})(\d{2})(\d{4})/\1-\2-\3/g"

sundareshr · ‎04-28-2016

@ddrillic Assuming this worked for you, I converted it to an answer so you can accept it.

ddrillic · ‎04-28-2016

Great. I really wondered about it... so, we can convert a comment to an answer, right?

ddrillic · ‎04-28-2016

I wonder also whether we can add quotes and "force" the number to be a string in the csv file. Any ideas?

sundareshr · ‎04-28-2016

You could try a | eval ssn=tostring(ssn) see if CSV sees that as string. I might. Then you dont have to deal with formatting.

ddrillic · ‎04-29-2016

Interesting - the | eval ssn=tostring(ssn) doesn't have any effect on the data type. The UI still shows the ssn as numerical values.

When doing | eval ssn="\"" + tostring(ssn) + "\"", I get quotes around the ssn in the csv file.

ddrillic · ‎04-28-2016

Interesting -

I ran -

basesearch | eval rex mode=sed field=ces_ssn "s/(\d{5})(\d{2})(\d{4})/\1-\2-\3/g"

It says -

Error in 'eval' command: The expression is malformed.

sundareshr · ‎04-28-2016

There should not be an eval before the rex

ddrillic · ‎04-28-2016

Gorgeous - thank you!!!

Exporting SSNs to a CSV file trims leading zeros. How do we preserve them?

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?