Splunk Search

CSV File with inline field data extracting headers incorrectly

burras
Communicator

I have a csv file that we're getting from an ALU application that is proving incredibly difficult to work with. This csv file represents metrics collected from various pieces of the system and application all melded into a single file. Each "group" or metrics collection within the file has a differing number of fields and the fields that are there are not always represented in the same order. In general, it has seemed impossible to work with the entire file so I've had to parse it out into multiple individual csv files representing each group. However, even at the group level there are inconsistencies.

Here's a sample of the parsed csv data for one group:

Start_Time_In_MS=1468990800004,Start_Time_Local=Wed_Jul_20_01:00:00_EDT_2016,End_Time_In_MS=1468991700003,End_Time_Local=Wed_Jul_20_01:15:00_EDT_2016,Site=tmpafl,Group=Diameter,Application=Proprietary,Command=PAR,Destination_Host=csb.tmpaflpcrf.prod,Destination_Realm=tmpaflpcrf.prod,Egress_Peer_Origin_Host=csb.tmpaflpcrf.prod,Egress_Peer_Origin_Realm=tmpaflpcrf.prod,Origin_Host=pcrf1.tmpaflpcrf.prod,Origin_Realm=tmpaflpcrf.prod,Result=DIAMETER_SUCCESS,Role=Client,Count=2,Rate=0.002222224691360768,Average_Latency=14.5,
Start_Time_In_MS=1469078100003,Start_Time_Local=Thu_Jul_21_01:15:00_EDT_2016,End_Time_In_MS=1469079000004,End_Time_Local=Thu_Jul_21_01:30:00_EDT_2016,Site=tmpafl,Group=Diameter,Application=NS,Command=PNR,Destination_Host=csb.tmpaflpcrf.prod,Destination_Realm=tmpaflpcrf.prod,Egress_Peer_Origin_Host=csb.tmpaflpcrf.prod,Egress_Peer_Origin_Realm=tmpaflpcrf.prod,Origin_Host=pcrf1.tmpaflpcrf.prod,Origin_Realm=tmpaflpcrf.prod,Result=DIAMETER_SUCCESS,Role=Client,Count=2,Rate=0.002222219753089163,Average_Latency=28.5,
Start_Time_In_MS=1469078100003,Start_Time_Local=Thu_Jul_21_01:15:00_EDT_2016,End_Time_In_MS=1469079000004,End_Time_Local=Thu_Jul_21_01:30:00_EDT_2016,Site=tmpafl,Group=Diameter,Application=GS,Command=PAR,Destination_Host=csb.tmpaflpcrf.prod,Destination_Realm=tmpaflpcrf.prod,Egress_Peer_Origin_Host=csb.tmpaflpcrf.prod,Egress_Peer_Origin_Realm=tmpaflpcrf.prod,Origin_Host=pcrf1.tmpaflpcrf.prod,Origin_Realm=tmpaflpcrf.prod,Result=DIAMETER_SUCCESS,Role=Client,Count=2,Rate=0.002222219753089163,Average_Latency=18.5,

The main problem I'm having is that despite having no header file, Splunk is insisting on trying to represent the first line as the header. Even if I don't attempt to use the INDEXED_EXTRACTION=CSV option and instead use default settings with manual configurations, Splunk still identifies fields improperly. As as example from the above data, there should be a field called "Application" with three resulting values - proprietary, NS, and GS. However, Splunk returns the field as "Application_Proprietary" with values of "Application=Proprietary, Application=NS, Application=GS". I have a feeling that if I can make it work on one field, I can represent that across all of the remaining fields to finally make this ingest work properly. I've tried various options inside of props.conf to try to get it to identify properly but I've had no luck. Any help would be greatly appreciated!

0 Karma

sundareshr
Legend

Try this in your props

[your sourcetype]
BREAK_ONLY_BEFORE = (Start)
DATETIME_CONFIG = 
KV_MODE = auto
MAX_TIMESTAMP_LOOKAHEAD = 25
NO_BINARY_CHECK = true
TIME_FORMAT = %a_%b_%d_%H:%M:%S
TIME_PREFIX = Start_Time_Local=
category = Custom
pulldown_type = true
0 Karma

burras
Communicator

That looks like it might have fixed the field issue but definitely caused problems with line breaking. It's now pulling all of the individual lines into a single event. I tried removing the BREAK_ONLY_BEFORE statement and replacing it with SHOULD_LINEMERGE = false but still not breaking properly. Here's what I've got right now in props.conf:

[diameter_metrics]
SHOULD_LINEMERGE = false
KV_MODE=auto
MAX_TIMESTAMP_LOOKAHEAD=48
NO_BINARY_CHECK=true
TIME_PREFIX=Start_Time_Local=
TIME_FORMAT=%a_%b_%e_%H:%M:%S

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...