Reporting

How do you create the report for the log access (Between 00:00:00 and 23:59:59) in a format shown

saifullakhalid
Explorer

I would like to search for keywords( mentioned below ) from the logs and create a report in the format shown

Every keyword has different pattern and it lies in middle of requests which start with ?pyActivity=

?ptActivity=...............................................PreActivity=DCBClaimSearch&HeaderButtonSectionName.................HTTP/1.1" 200 4502
?
ptActivity=...........................LanguageCode=&CountryCode=&PRODUCT_XXXX=XXXX=&LOB=&XXXXXCD=&Count=..........HTTP/1.1" 200 3402

?ptActivity=xxxxxxxxxxxxxxxxxxxxxxxxxxxxx%20ℜquest_Type=&xxxxxxxxxxxxxx_xxxxxxxxxxxxxxx&ELEMENT_CD=⟪uageCode=&CountryCode=&PRODUCT_LINE_CDXXXX=&LOB=&LOB_XXX_CD=&Count= HTTP/1.1" 200 5092

log format :

1x.xx.xxx.xxx - - 11xxxxx4 [03/Oct/2017:08:01:54 -0400] - /pxxx/Gxxxxt/uxxxxxxxxx4[/!TABTHREAD1 HTTP/1.1 oxxx-xxx.xxx.net TIME:0/123717 "POST /pxxxb/Gxxxxt/uxxxxxxxxxxxxxxxxx4%5B/!TABTHREAD1?ptActivity=Cxxxxxxxxx-xxxx.xxxxxx%20&Request_Type=&xxxxxTYPE_CD=COUNTRY&Exxxxxxxx_CD=&LanguageCode=&CountryCode=&PRODUCT_LINE_CD=&REGION_CD=&LOB=&LOB_SUB_CD=&Count= HTTP/1.1" 200 4011

1x.xx.xxx.xxx - - - [03/Oct/2017:08:01:54 -0400] - /pddddb/Gdddd/xxxxxxxxxxxxxxxxxx[/themeimages/h1expand_theme_ccddd.gif!!.gif HTTP/1.1 oxxxxxxxxxxx.aig.net TIME:0/12758 "GET / /pddddb/Gdddd/xxxxxxxxxxxxxxxxxx[/themeimages/h1expand_theme_ccddd.gif!!.gif HTTP/1.1" 200 69

1x.xx.xxx.xxx- - 1ssssss4 [03/Oct/2017:08:02:09 -0400] - /pxxxx/Gxxxxxt/uxxxxxxxxxxxxxxxxx4[/!TABTHREAD1 HTTP/1.1 oxxx-xxx.xx.net TIME:0/117091 "POST /pxxxb/Gxxxt/xxxxxxxxxxxxxxxxxxxxB/!TABTHREAD1?ptActivity=ReloadSection&pzIxxxd=xxxxxxxxxxxxxxxxxxx&pzFromFrame=pyxxxx&pzxxxxxxxxxxxe=pyxxxxxxxxe&pzxxxxxxx=false&StreamName=AddPropertyDetails&BaseReference=xxxxxxxxxx.xxxxxxxxxxe.Prxxxxxxx&Stxxxxxxxxxxxss=xxxxxxx-Section&bClientValidation=true&FieldError=ERRORTEXT&PreActivity=&xxxxxxxxxge=true&HexxxxxxxxnName=SubxxxxxxorkObjectHeaderB&inStandardsMode=true&AJAXTrackID=5&pzHarnessID=HIDxxxxxxxxx HTTP/1.1" 200 4512

reports to be generated:

Report 1 :

User Time Protocol server Elapsed Time (Seconds) Call Status Size logName
1ssssss4 17/Oct/04 01:15:00 HTTP/1.1 oxxxxxxxxxxx.net 0.201185 ptActivity=ReloadSection&pzIxxxd=xxxxxxxxxxxxxxxxxxx&pzFromFrame=pyxxxx&pzxxxxxxxxxxxe=pyxxxxxxxxe&pzxxxxxxx=false&StreamName=AddPropertyDetails&BaseReference=xxxxxxxxxx.xxxxxxxxxxe.Prxxxxxxx&Stxxxxxxxxxxxss=xxxxxxx-Section&bClientValidation=true&FieldError=ERRORTEXT&PreActivity=&xxxxxxxxxge=true&HexxxxxxxxnName=SubxxxxxxorkObjectHeaderB&inStandardsMode=true&AJAXTrackID=5&pzHarnessID=HIDxxxxxxxxxx HTTP/1.1 200 6188 \508\access_log_10_04_2017

Tags (1)
0 Karma
1 Solution

maciep
Champion

This seems to work for me on those two events. Except for the elapsed bit, still not sure how that is being calculated. Also, in your second example event, the first dash (-) is right up against the IP. I'm assuming there is actually a space there like the first event.

... |  rex "^(?<ip>\S+)(?:\s+\S+){2}\s+(?<user>\S+)\s+\[(?<time>[^\]]+)\](?:\s+\S+){2}\s+(?<protocol>\S+)\s+(?<server>\S+)\s+(?<elapsed>\S+)\s+\"(?<request>[^\"]+)\"\s+(?<status>\d+)\s+(?<bytes>\S+)" 
|  rex field=request "ptActivity=(?<call>.+)$"
|  table user,time,protocol,server,call,status,bytes,source

If this is what you want, you could also put these field extractions in props.conf for whatever sourcetype you all this on your search. That way the fields will automatically be extracted for you. So, you wouldn't need to use the rex commands to create them.

View solution in original post

0 Karma

maciep
Champion

This seems to work for me on those two events. Except for the elapsed bit, still not sure how that is being calculated. Also, in your second example event, the first dash (-) is right up against the IP. I'm assuming there is actually a space there like the first event.

... |  rex "^(?<ip>\S+)(?:\s+\S+){2}\s+(?<user>\S+)\s+\[(?<time>[^\]]+)\](?:\s+\S+){2}\s+(?<protocol>\S+)\s+(?<server>\S+)\s+(?<elapsed>\S+)\s+\"(?<request>[^\"]+)\"\s+(?<status>\d+)\s+(?<bytes>\S+)" 
|  rex field=request "ptActivity=(?<call>.+)$"
|  table user,time,protocol,server,call,status,bytes,source

If this is what you want, you could also put these field extractions in props.conf for whatever sourcetype you all this on your search. That way the fields will automatically be extracted for you. So, you wouldn't need to use the rex commands to create them.

0 Karma

saifullakhalid
Explorer

Thanks for your previous answer , I also need a reports like below,

Report2: summary report

Start Time End Time keyword Total # of executions Avg # of Executions per Hour Min Resp Time Max Resp Time Avg Resp Time 90th percentile Resp Time Std Dev Of Resp Time Min Size of Response Max Size of Response Avg Size of Response 90th percentile Size of Response Std Dev of Size of Response

sample:
Keyword1 17/Oct/04 00:11:46 17/Oct/04 23:24:05 2398 104 0.02 27.35 0.108 0.109 0.594 82 10342 4302.94 4543 424.21
Keyword2 17/Oct/04 00:11:46 17/Oct/04 23:24:05 2398 103 0.03 22.35 0.119 0.107 0.583 89 10332 43394 4523 4324.21

Report3: 24 hours

Start Time End Time keyword Total # of executions Avg # of Executions per Hour Min Resp Time Max Resp Time Avg Resp Time 90th percentile Resp Time Std Dev Of Resp Time Min Size of Response Max Size of Response Avg Size of Response 90th percentile Size of Response Std Dev of Size of Response

sample:

keyword_1 17/Oct/04 00:00:00 17/Oct/04 00:59:59 4 4 0.056125 0.070999 0.0613225 0.070999 0.00671778 3617 4533 3886.75 4533 437.5083809
keyword_1 17/Oct/04 01:00:00 17/Oct/04 01:59:59 3 3 0.058215 0.080105 0.066264 0.080105 0.012039662 3780 4548 4036 4548 443.4050067
keyword_1 17/Oct/04 02:00:00 17/Oct/04 02:59:59 9 9 0.039571 0.083275 0.058887778 0.083275 0.015465193 3628 4549 4018.777778 4549 400.1539634
keyword_1 17/Oct/04 03:00:00 17/Oct/04 03:59:59 8 8 0.038187 0.062873 0.053408625 0.062873 0.009202517 3615 4545 3834 4545 296.6532367
.
.
keyword_1 17/Oct/04 23:00:00 17/Oct/04 23:59:59 5 5 0.040078 0.07862 0.0598834 0.07862 0.013636071 3616 3628 3618.6 3628 5.272570531

similarly for keyword_2, _3 and soon.

Note My log format is same as shown in question initially.

0 Karma

maciep
Champion

Are you familiar with the stats command? If not, you might want to play around with it. I'm not going to type all of these out but hopefully this will give you the right idea

... | rex "^(?\S+)(?:\s+\S+){2}\s+(?\S+)\s+\[(?[^\]]+)\](?:\s+\S+){2}\s+(?\S+)\s+(?\S+)\s+(?\S+)\s+\"(?[^\"]+)\"\s+(?\d+)\s+(?\S+)" 
    | rex field=request "ptActivity=(?<call>.+)$"
    | bucket _time span=1h
    | stats count min(elapsed) as min_resp, max(elapsed) as max_resp, min(bytes) as min_size, max(bytes) as max_size by call _time
    | stats sum(count) as total_events, avg(count) as avg_per_hour, min(min_resp) as min_resp, max(max_resp) as max_resp, min(min_size) as min_size, max(max_size) as max_size by call

The bucket command will essentially floor all of the timestamps to the hour. Next we get all of our stats by the keyword and hour, because we need to calculate avg events per hour. Now that we have those counts by the hour/keyword, we can get the average per hour and then all of the remaining numbers grouped to just the keyword with another stats command.

Hopefully that helps. Note, you could pretty much eliminate that last step for your last report, since it looks like you do want the data by hour. And in that case, I'm guessing total events and avg per hour would be the same number, if you're aggregating over the hour.

0 Karma

saifullakhalid
Explorer

It worked fine thanks.

0 Karma

saifullakhalid
Explorer

If I have a set of keywords for which I need to obtained the above results. Is there a way in splunk I can automate to read the csv files for each keyword one at a time and generate the output in the format shown above.

0 Karma

maciep
Champion

If the csv file is a lookup in splunk, then that could be doable using a subsearch I believe, but you'd probably want to create field extractions in props for that field, as opposed to using rex in the command.

Or if it's a pretty static list, you can just filter for those keywords in the base search too. Or you could create a dashboard with a dropdown of keywords and have the search update as you select a different keyword.

So you have a few different options, but as far as just "looping through a csv", that's not really how splunk works, no.

0 Karma

saifullakhalid
Explorer

yes ur correct we need to convert timestamp to format (start_time,"%d/%m/%Y %I:%M:%S:%p") .


Yes ur right

0 Karma

maciep
Champion

have you done any of these field extractions yet? If not, can you share the log format? We might be able to guess from the examples, but i see some instances where the user is a "-". But then there are also other dashes that probably represent something?

Also, is the timestamp in splunk for these events (_time) the same as the timestamp in the event?

0 Karma

saifullakhalid
Explorer

timestamp in splunk you mean to say the format?

0 Karma

maciep
Champion

I meant timestamp. I was just wondering if these events in splunk already have the correct timestamp or fi that needs to be extracted as well.

I won't have time tonight, but I can try put together the regex to pull the fields out of this data so you can create the report. I think that's really all you need, right?

And if this is a common format like apache or something, then there is already probably an add-on that knows how to parse the events.

0 Karma

saifullakhalid
Explorer

yes ur correct we need to convert timestamp to format (start_time,"%d/%m/%Y %I:%M:%S:%p") .


Yes ur right

0 Karma

saifullakhalid
Explorer

The log format is typically same as shown above
- - 11xxxxx4 is same for all the lines
- - - only for static values like css, js, img, the user column is (-).

0 Karma

maciep
Champion

when i ask about the log format, this is what i have in mind. Not an example of the log, but exactly what parameters are being used to create the log

http://httpd.apache.org/docs/current/mod/mod_log_config.html

Also, not sure how you are calculating elapsed time with those examples. Does it come from this part: "TIME:0/117091" ??

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to July and August Tech Talks, Office Hours, and Webinars!

Dive into our sizzling summer lineup for July and August Community Office Hours and Tech Talks. Scroll down to ...

Edge Processor Scaling, Energy & Manufacturing Use Cases, and More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Get More Out of Your Security Practice With a SIEM

Get More Out of Your Security Practice With a SIEMWednesday, July 31, 2024  |  11AM PT / 2PM ETREGISTER ...