Solved: Re: Can you help me parse the following raw data s...

Stevelim · ‎10-04-2018

I have a raw data set that goes like this:

Logtime: 20181010_15:30:34

ID: V12

ArrivalTime: 15:30:33
No OFFSET DIRECTION LOAD
1 14.3  Counter 100
2 14.5  Reverse100
ExitTime: 15:30:34
Max: 1000
MIN: 900

What will be the best way to parse this data using Splunk?

maciep · ‎10-07-2018

Yeah, that's one terrible looking log file. If you have any control over its format, change it to something a bit more splunk-friendly. If not, then maybe something like this below.

I didn't test this at all and I'm sure the regex can be better...the examples is just to provide you an idea of how I'd parse it.

Essentially, I'd grab the whole thing as one event (at parse time) and then extract the fields i need from each event (at search time). Of course, if the format changes from event to event or is inconsistent in general, then i would have to modify the extractions appropriately.

props.conf

[your_sourcetype]
# PARSE-TIME SETTINGS
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)(?=Logtime:)
TIME_PREFIX = Logtime:\s*
TIME_FORMAT = %Y%m%d_%H:%M:%S
TIMESTAMP_LOOKAHEAD = 20

# SEARCH-TIME SETTINGS
EXTRACT-arrival_time = (?i)ArrivalTime:\s*(?<arrival_time>\S+)
EXTRACT-exit_time = (?i)ExitTime:\s*(?<exit_time>\S+)
EXTRACT-id = (?i)ID:\s*(?<id>\S+)
EXTRACT-max = (?i)Max:\s*(?<max>\S+)
EXTRACT-min = (?i)Min:\s*(?<min>\S+)
EXTRACT-no1_offset = (?i)^1\s+(?<no1_offset>\S+)\s*(?<no1_direction>counter|reverse)\s*(?<no1_something>\d+)
EXTRACT-no2_offset = (?i)^2\s+(?<no2_offset>\S+)\s*(?<no2_direction>counter|reverse)\s*(?<no2_something>\d+)

View solution in original post

maciep · ‎10-07-2018

Yeah, that's one terrible looking log file. If you have any control over its format, change it to something a bit more splunk-friendly. If not, then maybe something like this below.

I didn't test this at all and I'm sure the regex can be better...the examples is just to provide you an idea of how I'd parse it.

Essentially, I'd grab the whole thing as one event (at parse time) and then extract the fields i need from each event (at search time). Of course, if the format changes from event to event or is inconsistent in general, then i would have to modify the extractions appropriately.

props.conf

[your_sourcetype]
# PARSE-TIME SETTINGS
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)(?=Logtime:)
TIME_PREFIX = Logtime:\s*
TIME_FORMAT = %Y%m%d_%H:%M:%S
TIMESTAMP_LOOKAHEAD = 20

# SEARCH-TIME SETTINGS
EXTRACT-arrival_time = (?i)ArrivalTime:\s*(?<arrival_time>\S+)
EXTRACT-exit_time = (?i)ExitTime:\s*(?<exit_time>\S+)
EXTRACT-id = (?i)ID:\s*(?<id>\S+)
EXTRACT-max = (?i)Max:\s*(?<max>\S+)
EXTRACT-min = (?i)Min:\s*(?<min>\S+)
EXTRACT-no1_offset = (?i)^1\s+(?<no1_offset>\S+)\s*(?<no1_direction>counter|reverse)\s*(?<no1_something>\d+)
EXTRACT-no2_offset = (?i)^2\s+(?<no2_offset>\S+)\s*(?<no2_direction>counter|reverse)\s*(?<no2_something>\d+)

Stevelim · ‎10-07-2018

This works great! I learnt a lot from the regex pattern as I was primary stuck on how to extract the huge chunk of table. I would love to extract the raw to something splunk friendly but unfortunately that is out of my control.

asabatini85 · ‎10-04-2018

What is for you, the relevant values?

you can use the : with separator, but you need modify the file log.

FrankVl · ‎10-04-2018

And what exactly do you mean by parsing in this case? Timestamping and linebreaking, or field extractions (or both)?

Stevelim · ‎10-04-2018

My apologies, I realised I did not think deep enough about how it will appear in Splunk since I was originally working on it in Excel. So in Excel, I am able to just fit it columns but I forgot that in Splunk it will be associated with time. My end state is to be able to say do a search and create a time series chart of all the No = 1.

| No = 1

and it should return me with something like:
15:30:33 OFFSET = 14,3 <- From first log file
15:30:33 OFFSET = 15.2 <- From second log file of similar format

After which I can then append a | timechart avg(OFFSET) by No to see all the OFFSET.

I hope it makes sense.

Stevelim · ‎10-04-2018

I figured what I can extract most of the fields out of the box via Splunk. I will like the key value pair to be something along:

15:30:33 No1_OFFSET = 14.3
15:30:33 No2_OFFSET = 14.5
15:30:33 No1_Direction = Counter
15:30:33 No2_Direction = Reverse

Is this possible?

FrankVl · ‎10-05-2018

It is a damn ugly log file format to get into Splunk directly as separate events. Would be much easier if you had timestamp on each line and no footer.

Your best bet now might be to get the whole chunk in as one event and then further extract the contents and split it up with search commands.

Using Rex to extract the actual data lines into a multi valued field and then split the event into individual events and then for each of those pull out the individual fields of the data.

Can you help me parse the following raw data set using Splunk?

Transforming Financial Data into Fraud Intelligence

How to send events & findings from AWS to Splunk using Amazon EventBridge

Exciting News: The AppDynamics Community Joins Splunk!

Are you a member of the Splunk Community?

Can you help me parse the following raw data set using Splunk?

Transforming Financial Data into Fraud Intelligence

How to send events & findings from AWS to Splunk using Amazon EventBridge

Exciting News: The AppDynamics Community Joins Splunk!