I have a raw data set that goes like this:
Logtime: 20181010_15:30:34
ID: V12
ArrivalTime: 15:30:33
No OFFSET DIRECTION LOAD
1 14.3 Counter 100
2 14.5 Reverse100
ExitTime: 15:30:34
Max: 1000
MIN: 900
What will be the best way to parse this data using Splunk?
Yeah, that's one terrible looking log file. If you have any control over its format, change it to something a bit more splunk-friendly. If not, then maybe something like this below.
I didn't test this at all and I'm sure the regex can be better...the examples is just to provide you an idea of how I'd parse it.
Essentially, I'd grab the whole thing as one event (at parse time) and then extract the fields i need from each event (at search time). Of course, if the format changes from event to event or is inconsistent in general, then i would have to modify the extractions appropriately.
props.conf
[your_sourcetype]
# PARSE-TIME SETTINGS
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)(?=Logtime:)
TIME_PREFIX = Logtime:\s*
TIME_FORMAT = %Y%m%d_%H:%M:%S
TIMESTAMP_LOOKAHEAD = 20
# SEARCH-TIME SETTINGS
EXTRACT-arrival_time = (?i)ArrivalTime:\s*(?<arrival_time>\S+)
EXTRACT-exit_time = (?i)ExitTime:\s*(?<exit_time>\S+)
EXTRACT-id = (?i)ID:\s*(?<id>\S+)
EXTRACT-max = (?i)Max:\s*(?<max>\S+)
EXTRACT-min = (?i)Min:\s*(?<min>\S+)
EXTRACT-no1_offset = (?i)^1\s+(?<no1_offset>\S+)\s*(?<no1_direction>counter|reverse)\s*(?<no1_something>\d+)
EXTRACT-no2_offset = (?i)^2\s+(?<no2_offset>\S+)\s*(?<no2_direction>counter|reverse)\s*(?<no2_something>\d+)
Yeah, that's one terrible looking log file. If you have any control over its format, change it to something a bit more splunk-friendly. If not, then maybe something like this below.
I didn't test this at all and I'm sure the regex can be better...the examples is just to provide you an idea of how I'd parse it.
Essentially, I'd grab the whole thing as one event (at parse time) and then extract the fields i need from each event (at search time). Of course, if the format changes from event to event or is inconsistent in general, then i would have to modify the extractions appropriately.
props.conf
[your_sourcetype]
# PARSE-TIME SETTINGS
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)(?=Logtime:)
TIME_PREFIX = Logtime:\s*
TIME_FORMAT = %Y%m%d_%H:%M:%S
TIMESTAMP_LOOKAHEAD = 20
# SEARCH-TIME SETTINGS
EXTRACT-arrival_time = (?i)ArrivalTime:\s*(?<arrival_time>\S+)
EXTRACT-exit_time = (?i)ExitTime:\s*(?<exit_time>\S+)
EXTRACT-id = (?i)ID:\s*(?<id>\S+)
EXTRACT-max = (?i)Max:\s*(?<max>\S+)
EXTRACT-min = (?i)Min:\s*(?<min>\S+)
EXTRACT-no1_offset = (?i)^1\s+(?<no1_offset>\S+)\s*(?<no1_direction>counter|reverse)\s*(?<no1_something>\d+)
EXTRACT-no2_offset = (?i)^2\s+(?<no2_offset>\S+)\s*(?<no2_direction>counter|reverse)\s*(?<no2_something>\d+)
This works great! I learnt a lot from the regex pattern as I was primary stuck on how to extract the huge chunk of table. I would love to extract the raw to something splunk friendly but unfortunately that is out of my control.
What is for you, the relevant values?
you can use the : with separator, but you need modify the file log.
And what exactly do you mean by parsing in this case? Timestamping and linebreaking, or field extractions (or both)?
My apologies, I realised I did not think deep enough about how it will appear in Splunk since I was originally working on it in Excel. So in Excel, I am able to just fit it columns but I forgot that in Splunk it will be associated with time. My end state is to be able to say do a search and create a time series chart of all the No = 1.
| No = 1
and it should return me with something like:
15:30:33 OFFSET = 14,3 <- From first log file
15:30:33 OFFSET = 15.2 <- From second log file of similar format
After which I can then append a | timechart avg(OFFSET) by No to see all the OFFSET.
I hope it makes sense.
I figured what I can extract most of the fields out of the box via Splunk. I will like the key value pair to be something along:
15:30:33 No1_OFFSET = 14.3
15:30:33 No2_OFFSET = 14.5
15:30:33 No1_Direction = Counter
15:30:33 No2_Direction = Reverse
Is this possible?
It is a damn ugly log file format to get into Splunk directly as separate events. Would be much easier if you had timestamp on each line and no footer.
Your best bet now might be to get the whole chunk in as one event and then further extract the contents and split it up with search commands.
Using Rex to extract the actual data lines into a multi valued field and then split the event into individual events and then for each of those pull out the individual fields of the data.