Splunk Search

CSV File with inline field data extracting headers incorrectly

burras
Communicator

I have a csv file that we're getting from an ALU application that is proving incredibly difficult to work with. This csv file represents metrics collected from various pieces of the system and application all melded into a single file. Each "group" or metrics collection within the file has a differing number of fields and the fields that are there are not always represented in the same order. In general, it has seemed impossible to work with the entire file so I've had to parse it out into multiple individual csv files representing each group. However, even at the group level there are inconsistencies.

Here's a sample of the parsed csv data for one group:

Start_Time_In_MS=1468990800004,Start_Time_Local=Wed_Jul_20_01:00:00_EDT_2016,End_Time_In_MS=1468991700003,End_Time_Local=Wed_Jul_20_01:15:00_EDT_2016,Site=tmpafl,Group=Diameter,Application=Proprietary,Command=PAR,Destination_Host=csb.tmpaflpcrf.prod,Destination_Realm=tmpaflpcrf.prod,Egress_Peer_Origin_Host=csb.tmpaflpcrf.prod,Egress_Peer_Origin_Realm=tmpaflpcrf.prod,Origin_Host=pcrf1.tmpaflpcrf.prod,Origin_Realm=tmpaflpcrf.prod,Result=DIAMETER_SUCCESS,Role=Client,Count=2,Rate=0.002222224691360768,Average_Latency=14.5,
Start_Time_In_MS=1469078100003,Start_Time_Local=Thu_Jul_21_01:15:00_EDT_2016,End_Time_In_MS=1469079000004,End_Time_Local=Thu_Jul_21_01:30:00_EDT_2016,Site=tmpafl,Group=Diameter,Application=NS,Command=PNR,Destination_Host=csb.tmpaflpcrf.prod,Destination_Realm=tmpaflpcrf.prod,Egress_Peer_Origin_Host=csb.tmpaflpcrf.prod,Egress_Peer_Origin_Realm=tmpaflpcrf.prod,Origin_Host=pcrf1.tmpaflpcrf.prod,Origin_Realm=tmpaflpcrf.prod,Result=DIAMETER_SUCCESS,Role=Client,Count=2,Rate=0.002222219753089163,Average_Latency=28.5,
Start_Time_In_MS=1469078100003,Start_Time_Local=Thu_Jul_21_01:15:00_EDT_2016,End_Time_In_MS=1469079000004,End_Time_Local=Thu_Jul_21_01:30:00_EDT_2016,Site=tmpafl,Group=Diameter,Application=GS,Command=PAR,Destination_Host=csb.tmpaflpcrf.prod,Destination_Realm=tmpaflpcrf.prod,Egress_Peer_Origin_Host=csb.tmpaflpcrf.prod,Egress_Peer_Origin_Realm=tmpaflpcrf.prod,Origin_Host=pcrf1.tmpaflpcrf.prod,Origin_Realm=tmpaflpcrf.prod,Result=DIAMETER_SUCCESS,Role=Client,Count=2,Rate=0.002222219753089163,Average_Latency=18.5,

The main problem I'm having is that despite having no header file, Splunk is insisting on trying to represent the first line as the header. Even if I don't attempt to use the INDEXED_EXTRACTION=CSV option and instead use default settings with manual configurations, Splunk still identifies fields improperly. As as example from the above data, there should be a field called "Application" with three resulting values - proprietary, NS, and GS. However, Splunk returns the field as "Application_Proprietary" with values of "Application=Proprietary, Application=NS, Application=GS". I have a feeling that if I can make it work on one field, I can represent that across all of the remaining fields to finally make this ingest work properly. I've tried various options inside of props.conf to try to get it to identify properly but I've had no luck. Any help would be greatly appreciated!

0 Karma

sundareshr
Legend

Try this in your props

[your sourcetype]
BREAK_ONLY_BEFORE = (Start)
DATETIME_CONFIG = 
KV_MODE = auto
MAX_TIMESTAMP_LOOKAHEAD = 25
NO_BINARY_CHECK = true
TIME_FORMAT = %a_%b_%d_%H:%M:%S
TIME_PREFIX = Start_Time_Local=
category = Custom
pulldown_type = true
0 Karma

burras
Communicator

That looks like it might have fixed the field issue but definitely caused problems with line breaking. It's now pulling all of the individual lines into a single event. I tried removing the BREAK_ONLY_BEFORE statement and replacing it with SHOULD_LINEMERGE = false but still not breaking properly. Here's what I've got right now in props.conf:

[diameter_metrics]
SHOULD_LINEMERGE = false
KV_MODE=auto
MAX_TIMESTAMP_LOOKAHEAD=48
NO_BINARY_CHECK=true
TIME_PREFIX=Start_Time_Local=
TIME_FORMAT=%a_%b_%e_%H:%M:%S

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...

SplunkTrust Application Period is Officially OPEN!

It's that time, folks! The application/nomination period for the 2026-2027 SplunkTrust is officially open. If ...