topic Re: Using head command in Splunk Search

Using head command

nidhiagrawal — Tue, 29 Sep 2020 08:13:02 GMT

I have the query with stats, and I want to use head command to retrieve limited events for everyday. But head command is limiting the events for whole query.

index=myindex "searchQuery" | rex "<messageId>(?<myMsgId>[^<]+)" | rex "refToMessageId>(?<myMsgId>[^<]+)" | rex field=_raw "(?<fldDay>[\d-]{10}).*\s[\s[a-zA-Z0-9-:.]" stats earliest(_time) AS startTime, latest(_time) AS endTime, count as TotalEvents by fldDay , myMsgId | eval responseTime=endTime-startTime | where TotalEvents = 2 | stats avg(responseTime) as avgResponseTime by fldDay

Re: Using head command

javiergn — Thu, 24 Dec 2015 08:34:21 GMT

Try top instead and use the date_mday built-in field (you can group by date including month and year too, it's just an example):

YOURQUERY | top limit=10 yourfield by date_mday

You can then pipe a stats avg after that if you want a daily average for the top 10.

Re: Using head command

AshimaE — Mon, 10 Jul 2017 12:10:24 GMT

This is taking too much time and fails for events of just a few hours. Any other alternative for queries that have longer timespans. My data is very huge and unable to give any significant results with this.

Re: Using head command

cmerriman — Mon, 10 Jul 2017 13:24:24 GMT

you could try:

...|sort 0 fldDay - avgResponseTime |streamstats count by fldDay|search count<=5

i'm not sure if that'll be quicker than the answer @javiergn gave, however it is another method. the sort should sort each fldDay and descend the avgResponseTime. streamstats will give a rowcount, essentially, for each event by fldDay and then you can search for the first 5 (or any set of number)

Re: Using head command

DalJeanis — Mon, 10 Jul 2017 14:48:38 GMT

Your code is invalid. At the very least, there's a pipe missing before the stats command in the middle.

Please mark your code as code, for instance using the 101 010 button.

Re: Using head command

DalJeanis — Mon, 10 Jul 2017 15:00:53 GMT

@AshimaE -

This code assumes that message IDs are relatively unique across the time you will be running the query:

index=myindex "searchQuery" ("<messageId>" OR "<refToMessageId>")
   [search index=myindex "searchQuery" "<messageId>"
   | rex "\<messageId\>(?<myMsgId>[^\<]+)" 
   | where isnotnull(myMsgId)
   | rex field=_raw "(?<fldDay>[\d-]{10}).*\s[\s[a-zA-Z0-9-\:.]" 
   | dedup 5 fldDay
   | table myMsgId
   | format "(" "" "" "" "OR" ")" 
   | rex field=search mode=sed "s/myMsgId=//g"
   ]
| rename COMMENT as "The above subsearch checks only one type of record and grabs the first five MsgIds for each day."
| rename COMMENT as "then only records with that MsgId somewhere in them will be returned from the main search." 

| stats earliest(_time) AS startTime, latest(_time) AS endTime, count as TotalEvents by fldDay, myMsgId 
| where TotalEvents = 2 
| eval responseTime=endTime-startTime 
| stats avg(responseTime) as avgResponseTime by fldDay

However, if you are going to be running this kind of search across any great span of time, then you probably should consider creating a summary index. Sampling the first five events for each day -- which in practice will be the LAST five events, since splunk retrieves in reverse chron order -- is a VERY blunt instrument, not at all valid statistically.

At the very least, I'd consider some kind of sampling protocol, like this. The number 769 can be just any reasonably large number, less than abut 15% of your expected daily volume if you want 5 samples, but I happen to prefer odd primes...

   [search index=myindex "searchQuery" "<messageId>"
   | eval my1in769sample= random()%769
   | where my1in769sample==0
   | rex "\<messageId\>(?<myMsgId>[^\<]+)" 
   | where isnotnull(myMsgId)
   | rex field=_raw "(?<fldDay>[\d-]{10}).*\s[\s[a-zA-Z0-9-\:.]" 
   | dedup 5 fldDay
   | table myMsgId
   | format "(" "" "" "" "OR" ")" 
   | rex field=search mode=sed "s/myMsgId=//g"
   ]