How to extract fields from a log that is comma del...

rhaarmann · ‎08-05-2015

Ok, complex extraction. I have a log that is comma delimited, but they have key,value,key,value,key,value, etc. It's key-value structured like JSON, but in CSV format. I have figured out how to work with this in Python, but not familiar with writing splunk apps and scripts that will do a stream on a file.

I found this Splunk Answers post (http://answers.splunk.com/answers/204994/complex-kv-extraction.html ), but it doesn't completely answer my problem as I will have to hardcode each field, but all my fields are unknown and can change as developers make changes.

Log Example:

Key.Value,Timestamp,$timestamp,ms,$ts_ms,key1,value1,key2,value2,key3,value3
Key.Value,Timestamp,$timestamp,ms,$ts_ms,key1,value1,key4,value4
Key.Value,Timestamp,$timestamp,ms,$ts_ms,key2,value2,key3,value3,key5,value5,key6,value6,key4,value4

woodcock · ‎08-05-2015

Like this:

... | rex field=_raw "(?:[^,]+,){5}(?<KVPs>.*)" | streamstats current=t count AS serial | rex max_match=0 field=KVPs "(?<KVPNAME>[^,]+,[^,]+)(?:,|$)" | mvexpand KVPNAME | rex field=KVPNAME "(?<_KEY_1>[^,]+),(?<_VAL_1>[^,]+)" | eval {_KEY_1}=_VAL_1 | stats values(_*) AS _* values(*) AS * BY serial | fields - serial

rhaarmann · ‎02-15-2016

This worked for the case of when the raw data is already indexed, but I would like to setup a transform that converts it to a better format then index the transformed data. I am currently testing out the solution you provided with searches and summary indexes and it is very expensive on the search heads, especially since the amount of the log data is around 25GB/day. I tried doing a transform line that contained SEDCMD, but it would not work, not sure why yet.

How to extract fields from a log that is comma delimited?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

How to extract fields from a log that is comma delimited?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits