Splunk Search
Highlighted

CSV File with inline field data extracting headers incorrectly

Communicator

I have a csv file that we're getting from an ALU application that is proving incredibly difficult to work with. This csv file represents metrics collected from various pieces of the system and application all melded into a single file. Each "group" or metrics collection within the file has a differing number of fields and the fields that are there are not always represented in the same order. In general, it has seemed impossible to work with the entire file so I've had to parse it out into multiple individual csv files representing each group. However, even at the group level there are inconsistencies.

Here's a sample of the parsed csv data for one group:

Start_Time_In_MS=1468990800004,Start_Time_Local=Wed_Jul_20_01:00:00_EDT_2016,End_Time_In_MS=1468991700003,End_Time_Local=Wed_Jul_20_01:15:00_EDT_2016,Site=tmpafl,Group=Diameter,Application=Proprietary,Command=PAR,Destination_Host=csb.tmpaflpcrf.prod,Destination_Realm=tmpaflpcrf.prod,Egress_Peer_Origin_Host=csb.tmpaflpcrf.prod,Egress_Peer_Origin_Realm=tmpaflpcrf.prod,Origin_Host=pcrf1.tmpaflpcrf.prod,Origin_Realm=tmpaflpcrf.prod,Result=DIAMETER_SUCCESS,Role=Client,Count=2,Rate=0.002222224691360768,Average_Latency=14.5,
Start_Time_In_MS=1469078100003,Start_Time_Local=Thu_Jul_21_01:15:00_EDT_2016,End_Time_In_MS=1469079000004,End_Time_Local=Thu_Jul_21_01:30:00_EDT_2016,Site=tmpafl,Group=Diameter,Application=NS,Command=PNR,Destination_Host=csb.tmpaflpcrf.prod,Destination_Realm=tmpaflpcrf.prod,Egress_Peer_Origin_Host=csb.tmpaflpcrf.prod,Egress_Peer_Origin_Realm=tmpaflpcrf.prod,Origin_Host=pcrf1.tmpaflpcrf.prod,Origin_Realm=tmpaflpcrf.prod,Result=DIAMETER_SUCCESS,Role=Client,Count=2,Rate=0.002222219753089163,Average_Latency=28.5,
Start_Time_In_MS=1469078100003,Start_Time_Local=Thu_Jul_21_01:15:00_EDT_2016,End_Time_In_MS=1469079000004,End_Time_Local=Thu_Jul_21_01:30:00_EDT_2016,Site=tmpafl,Group=Diameter,Application=GS,Command=PAR,Destination_Host=csb.tmpaflpcrf.prod,Destination_Realm=tmpaflpcrf.prod,Egress_Peer_Origin_Host=csb.tmpaflpcrf.prod,Egress_Peer_Origin_Realm=tmpaflpcrf.prod,Origin_Host=pcrf1.tmpaflpcrf.prod,Origin_Realm=tmpaflpcrf.prod,Result=DIAMETER_SUCCESS,Role=Client,Count=2,Rate=0.002222219753089163,Average_Latency=18.5,

The main problem I'm having is that despite having no header file, Splunk is insisting on trying to represent the first line as the header. Even if I don't attempt to use the INDEXEDEXTRACTION=CSV option and instead use default settings with manual configurations, Splunk still identifies fields improperly. As as example from the above data, there should be a field called "Application" with three resulting values - proprietary, NS, and GS. However, Splunk returns the field as "ApplicationProprietary" with values of "Application=Proprietary, Application=NS, Application=GS". I have a feeling that if I can make it work on one field, I can represent that across all of the remaining fields to finally make this ingest work properly. I've tried various options inside of props.conf to try to get it to identify properly but I've had no luck. Any help would be greatly appreciated!

0 Karma
Highlighted

Re: CSV File with inline field data extracting headers incorrectly

Legend

Try this in your props

[your sourcetype]
BREAK_ONLY_BEFORE = (Start)
DATETIME_CONFIG = 
KV_MODE = auto
MAX_TIMESTAMP_LOOKAHEAD = 25
NO_BINARY_CHECK = true
TIME_FORMAT = %a_%b_%d_%H:%M:%S
TIME_PREFIX = Start_Time_Local=
category = Custom
pulldown_type = true
0 Karma
Highlighted

Re: CSV File with inline field data extracting headers incorrectly

Communicator

That looks like it might have fixed the field issue but definitely caused problems with line breaking. It's now pulling all of the individual lines into a single event. I tried removing the BREAKONLYBEFORE statement and replacing it with SHOULD_LINEMERGE = false but still not breaking properly. Here's what I've got right now in props.conf:

[diametermetrics]
SHOULD
LINEMERGE = false
KVMODE=auto
MAX
TIMESTAMPLOOKAHEAD=48
NO
BINARYCHECK=true
TIME
PREFIX=StartTimeLocal=
TIMEFORMAT=%a%b%e%H:%M:%S

0 Karma