Splunk Search

How to extract fields from a CSV file that has commas in the fields?

renems
Communicator

Hi There!

I have an issue with a field extraction. I have a Windows CSV file, that has several fields that have commas in the fields itself, forcing it to break at the wrong position.
Is there any clever way I could fix this? Unfortunately I'm not in the position to edit the source.

Here's an example of the so-called-csv:

"ssrv0052,true,,,,,,,unix,,,""Sunday, February 7, 2016 8:05:15 PM CET"",Integratie Service Manager : DS_Integratie Service Manager _A02 - SM9 Linux-Unix-Windows Servers - Full,,IPv4,,,,,,,,,,,,false,,,5de7125b74af5305076f03648a900aea,false,""Sunday, February 7, 2016 8:05:15 PM CET"",""Sunday, February 7, 2016 8:05:15 PM CET"",,ssrv0052,,,false,,,,,,,,,,,,[server],,,,,,,,,,,,,,,Integratie Service Manager : DS_Integratie Service Manager _A02 - SM9 Linux-Unix-Windows Servers - Full,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"

Take a close look at the date field. It shows the issue I have.

Thank you in advance!

0 Karma
1 Solution

s2_splunk
Splunk Employee
Splunk Employee

Well, if you can't change the source and get it cleaned up (and you should probably still try to do that), you'll have to do some cleanup using props/transforms. Try this:

props.conf

[dirtycsv]
SEDCMD-cleanupMess = s/^\"//g s/\"$//g s/\"\"/"/g
REPORT-cleanFields = cleancsv_fields 
SHOULD_LINEMERGE=false
NO_BINARY_CHECK = true

transforms.conf

[cleancsv_fields]
DELIMS = ","
FIELDS = f01,f02,f03,f04,f05,f06,f07,f08,f09,f10,f11,f12,f13,f14,f15,f16,f17,f18,f19,f20,f21,f22,f23,f24,f25,f26,f27,f28,f29,f30,f31,f32,f33,f34,f35,f36,f37,f38,f39,f40

Change your FIELDS list to the field names you want to assign to each of the field names in the - now structurally cleaned up - csv file and you should be on a happier path.

The SEDCMD above contains a space-separated list of three SED replacements:

  1. Remove double quote at beginning of line
  2. Remove double quote at end of line
  3. Replace double-double quotes around timestamp fields with single double quote

There may be a more efficient way to do this with RegEx, but this should work. The most efficient way would be to clean up the source before it is indexed into Splunk.... 😉

I tested this on my local 6.3.x install:
alt text

props/transforms need to be on your first parsing tier, either indexer or heavy forwarder, not on universal forwarder.

View solution in original post

Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...