Splunk Search

Attempting to extract multiple fields out of a large logfile and creating a stats chart

New Member

I have a logfile which has multiple lines of logs with each logline having nearly 700+ lines. I am trying to extract only following from each line and would like to create a chart or a graph based on these results.

Lines I am trying to extract are always at the start of the log file and below is the sample extract. Rest of the log file has all the other data.

====================================

01/29/2018  23:10:40
Monitors:
Monitor Data:

Monitor System Tasks:
    Status: OK , severity: Terrible , Last Update Time: 01/29/2018 23:10:40
    Jobs Types:
        Name: Log Sync, Status: OK , severity: Good , Last Update Time: 01/29/2018 15:45:15
            Log Sync , ignore = 507830 , jobs execution count = 6529 , jobs exection average = 6.26 sec
        Name: Collection, Status: N/A , severity: N/A , Last Update Time: 01/29/2018 15:45:15
            Collection , ignore = 0 , jobs execution count = 0 , jobs exection average = 0 ms
    Long Jobs Times:
        Name: server1-sample-binarylogs server1b, Status: N/A , severity: Terrible , Last Update Time: 01/27/2018 07:34:00
            fxccspricing-bubble-binlogs server1b , type = nog Sync , ignore = 0 , job execution count = 7 , job exection average = 11.96 min
        Name: server4-sample-core server1e, Status: N/A , severity: Terrible , Last Update Time: 01/27/2018 09:34:00
            server4-sample-core server1e , type = nog Sync , ignore = 0 , job execution count = 4 , job exection average = 3.67 min
        Name: server5-sample-binarylogs server1f, Status: N/A , severity: Poor , Last Update Time: 01/27/2018 10:34:00
            server5-sample-binarylogs server1f , type = nog Sync , ignore = 0 , job execution count = 4 , job exection average = 7.27 min

====================================
For below log line -

Name: server5-sample-binarylogs server1f, Status: N/A , severity: Poor , Last Update Time: 01/27/2018 10:34:00
            server5-sample-binarylogs server1f , type = nog Sync , ignore = 0 , job execution count = 4 , job execution average = 7.27 min

When I do -

index=abc source=/data/logs/abc/xyz/* "job execution average =" | rex "job execution average =(?[^ ]*)" 

I do get the results, however, the massive logfile still remains and I can't extract ONLY the meainingful lines out of it.
How can I only extract three below values and make a chart out of them -

Value 1 - Name: server4-sample-core
Value 2 - server1f
Value 3 - job execution average = 7.27 min (these mins value would change)

Can someone please help me on this, thanks!

0 Karma

Champion

A few things. One, i just grabbed your first sample up there, so my rex to just keep the Long Jobs Times won't be the same as what you would need in production, because it will grab the entire rest of the event (in my case, that was the entire rest of the event).

Also I used sed here to replace the raw data before extracting the values. You could just capture job times section as well and then used that for the field extractions. But in case you didn't know, I wanted to show you that you can replace the raw log at search time if you want it to look less busy when reviewing results.

And finally, I'm not sure if this a typo or not, but your sample data says "job exection average" (missing the "u"), but in your examples, the u was in there. So I'm not sure which to use, but I used the one the w/o the u because that's what I was working with.

So not sure if this is what you're after, but maybe something like it?

index=abc source=/data/logs/abc/xyz/*
|  rex mode=sed field=_raw "s/[\s\S]+(Long Jobs Times:[\s\S]+)/\1/g"
|  rex max_match=0 "Name:\s+(?<name>\S+)\s+(?<server>[^,]+)[\s\S]+?job exection average\s+=\s+(?<exec>.+)"
|  table name, server, exec
|  eval temp = mvzip(name,server), temp = mvzip(temp,exec)
|  fields temp
|  mvexpand temp
|  rex field=temp "^(?<name>[^,]+),(?<server>[^,]+),(?<exec>.+)"
|  fields name, server, exec
  1. So the rex/sed command will replace the event with just the Long Jobs Times
  2. Then the next rex will extract all of the iterations of the fields you wanted
  3. Because there are multiple, each of those will be mv fields
  4. Next use mvzip to put those mv fields together in one big mv field (they should match up correctly)
  5. Then mvexpand that field so they're now each their own event
  6. And then rex the fields back out at of each
0 Karma

New Member

Thanks Maciep, that was really great and I really appreciate the efforts you have put in for my problem.
About your 1st query -

could you elaborate on what you want to see? I get that you want to extract those 3 values, but do you want to just see those values?

Yes, I am basically trying to get a dashboard chart (real-time) where it will constantly check for these

Value 1 - Name: server4-sample-core
Value 2 - server1f
Value 3 - job execution average = 7.27 min (these mins value would change)

  1. -

Off this line - Name: server5-sample-binarylogs server5, Status: N/A
I am trying to capture this value for e.g "server5" in this case and run it real time so i can see what are the current problem hosts.

  1. - Another one is to capture - Name: server5-sample-binarylogs server5, Status: N/A Attempting to capture only "server5-sample-binarylogs" values from the entire logfile, again run it real time so i can see what are the current problem such entries are

And do you literally just care about server4-sample-core, or do you want the same info for the others that might show up there, e.g. server1 and server5 in your first log sample above?

From this entire log file I need to only pick up these values (server names)

  • server1-sample-binarylogs
  • server1f

Basically, i am trying to capture both of those values/fields and make two separate graphs for them so i can quickly see current top problem entries.

What comes after the Long Job Times section?

Post Long job times section, the log line on an average has 300-500 lines of log with entries like below.
However, I dont have any meaning ful data from those.

Working Jobs:
Name: Logs Group, Status: N/A , severity: N/A , Last Update Time: 01/29/2018 15:45:15
Logs Sync Group , type = logsync , created = 01/29/2018 15:45:15 , Jobs Count = 13 , Complete Jobs Count = 0 , Working Jobs Count = 4 , Ignore Jobs Count = 0
Name: Logs Sync Group, Status: N/A , severity: N/A , Last Update Time: 01/29/2018 15:45:15
Logs Sync Group , type = logsync , created = 01/29/2018 15:45:15 , Jobs Count = 20 , Complete Jobs Count = 15 , Working Jobs Count = 3 , Ignore Jobs Count = 0

I hope this helps to setup custom filter for getting those two values out of the logs real time and have them in a chart.

0 Karma

Champion

Ok, here is an updated search. It expects that the next line in your sample would be the "Working Jobs" section. It seemed to work with sample logs I created...

index=abc source=/data/logs/abc/xyz/*
| rex mode=sed "s/[\s\S]+(Long Jobs Times:[\s\S]+?)[\r\n]+\s+Working Jobs[\s\S]+/\1/g"
| rex max_match=0 "Name:\s+(?<name>\S+)\s+(?<server>[^,]+)[\s\S]+?job exection average\s+=\s+(?<exec>.+)"
| table _time name, server, exec
| eval temp = mvzip(name,server), temp = mvzip(temp,exec)
| fields _time temp
| mvexpand temp
| rex field=temp "^(?<name>[^,]+),(?<server>[^,]+),(?<exec>.+)"
| fields _time name, server, exec
0 Karma

New Member

Thanks maciep, however, i am getting some unusual results or stats after using this string.
I am currently working on modifying it to see if I can draw desired output.

I will update ASAP. Thank you very much for all the help done so far...!! Really appreciate it.

0 Karma

Champion

could you elaborate on what you want to see? I get that you want to extract those 3 values, but do you want to just see those values? And do you literally just care about server4-sample-core, or do you want the same info for the others that might show up there, e.g. server1 and server5 in your first log sample above?

What comes after the Long Job Times section?

0 Karma