Hi There!
I have an issue with a field extraction. I have a Windows CSV file, that has several fields that have commas in the fields itself, forcing it to break at the wrong position.
Is there any clever way I could fix this? Unfortunately I'm not in the position to edit the source.
Here's an example of the so-called-csv:
"ssrv0052,true,,,,,,,unix,,,""Sunday, February 7, 2016 8:05:15 PM CET"",Integratie Service Manager : DS_Integratie Service Manager _A02 - SM9 Linux-Unix-Windows Servers - Full,,IPv4,,,,,,,,,,,,false,,,5de7125b74af5305076f03648a900aea,false,""Sunday, February 7, 2016 8:05:15 PM CET"",""Sunday, February 7, 2016 8:05:15 PM CET"",,ssrv0052,,,false,,,,,,,,,,,,[server],,,,,,,,,,,,,,,Integratie Service Manager : DS_Integratie Service Manager _A02 - SM9 Linux-Unix-Windows Servers - Full,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"
Take a close look at the date field. It shows the issue I have.
Thank you in advance!
Well, if you can't change the source and get it cleaned up (and you should probably still try to do that), you'll have to do some cleanup using props/transforms. Try this:
props.conf
[dirtycsv]
SEDCMD-cleanupMess = s/^\"//g s/\"$//g s/\"\"/"/g
REPORT-cleanFields = cleancsv_fields
SHOULD_LINEMERGE=false
NO_BINARY_CHECK = true
transforms.conf
[cleancsv_fields]
DELIMS = ","
FIELDS = f01,f02,f03,f04,f05,f06,f07,f08,f09,f10,f11,f12,f13,f14,f15,f16,f17,f18,f19,f20,f21,f22,f23,f24,f25,f26,f27,f28,f29,f30,f31,f32,f33,f34,f35,f36,f37,f38,f39,f40
Change your FIELDS list to the field names you want to assign to each of the field names in the - now structurally cleaned up - csv file and you should be on a happier path.
The SEDCMD above contains a space-separated list of three SED replacements:
There may be a more efficient way to do this with RegEx, but this should work. The most efficient way would be to clean up the source before it is indexed into Splunk.... 😉
I tested this on my local 6.3.x install:
props/transforms need to be on your first parsing tier, either indexer or heavy forwarder, not on universal forwarder.