Splunk Search

Why is stats count not working correctly in my search?

byu168168
Path Finder

Hi all, below is the search that I'm working with

index=main source=mysql-pipe sourcetype=pipeline_logs AND (message="bank* per-node-processing" AND msg_type="BEGIN") | rex field=field7 "tag=\"(?.*)\"" | eval run_id=run_name.".".tag_value | eval stime=strptime(time_at,"%Y-%m-%d %H:%M:%S") | join run_id [search index=main source=mysql-pipe sourcetype=pipeline_logs AND (message="bank* per-node-processing" AND msg_type="COMPLETE") | rex field=field7 "tag=\"(?.*)\"" | eval run_id=run_name.".".tag_value | eval etime=strptime(time_at, "%Y-%m-%d %H:%M:%S")] | stats count by date_mday

When I run this either across All Time or the last 30 days, the count only returns for the days 14-22. However, if I choose a specific day, say Nov 8th, the count returns the accurate date_mday. It gets stranger too, if I search for a specific run that doesn't occur in the 14-22 range, a result is returned with the accurate date_mday even when run on an All Time basis.

I ran a stats count by date_mday on both separate searches and both return an accurate count with the range 1-31, it's when they are joined that this occurs. Any insight would be much appreciated.

0 Karma
1 Solution

somesoni2
Revered Legend

I've seem expensive command like join, transactions etc, dropping events silently when running of huge amount of data. In your case you can definitely remove join with a stats implementation. Try something like this. (check the rex is correct or not).

index=main source=mysql-pipe sourcetype=pipeline_logs AND (message="bank* per-node-processing" AND (msg_type="BEGIN" OR msg_type="COMPLETE") ) | rex field=field7 "tag=\"(?<tag_value>.*)\"" | eval run_id=run_name.".".tag_value | eval stime=strptime(time_at,"%Y-%m-%d %H:%M:%S") 
| stats dc(msg_type) as msg_types by run_id date_mday | where msg_types=2 | stats count by date_mday

NOTE: There is a code sample button on type of the text box (saying 101 010). Please select your query/configurations and click on that button OR press Ctrl+K key to format it and retains field extractions, asterisks etc.

Updated

Yup

index=main source=mysql-pipe sourcetype=pipeline_logs AND (message="bank* per-node-processing" AND (msg_type="BEGIN" OR msg_type="COMPLETE") ) | rex field=field7 "tag=\"(?<tag_value>.*)\"" | eval run_id=run_name.".".tag_value | eval stime=strptime(time_at,"%Y-%m-%d %H:%M:%S") 
| eval temp=date_mday."##".run_id
| chart values(stime) over temp by msg_type 
| eval TimeDiff=(COMPLETE-BEGIN)/3600
| eval date_mday=mvindex(split(temp,"##"),0)
| stats sum(TimeDiff) AS Sum by date_mday

View solution in original post

0 Karma

somesoni2
Revered Legend

I've seem expensive command like join, transactions etc, dropping events silently when running of huge amount of data. In your case you can definitely remove join with a stats implementation. Try something like this. (check the rex is correct or not).

index=main source=mysql-pipe sourcetype=pipeline_logs AND (message="bank* per-node-processing" AND (msg_type="BEGIN" OR msg_type="COMPLETE") ) | rex field=field7 "tag=\"(?<tag_value>.*)\"" | eval run_id=run_name.".".tag_value | eval stime=strptime(time_at,"%Y-%m-%d %H:%M:%S") 
| stats dc(msg_type) as msg_types by run_id date_mday | where msg_types=2 | stats count by date_mday

NOTE: There is a code sample button on type of the text box (saying 101 010). Please select your query/configurations and click on that button OR press Ctrl+K key to format it and retains field extractions, asterisks etc.

Updated

Yup

index=main source=mysql-pipe sourcetype=pipeline_logs AND (message="bank* per-node-processing" AND (msg_type="BEGIN" OR msg_type="COMPLETE") ) | rex field=field7 "tag=\"(?<tag_value>.*)\"" | eval run_id=run_name.".".tag_value | eval stime=strptime(time_at,"%Y-%m-%d %H:%M:%S") 
| eval temp=date_mday."##".run_id
| chart values(stime) over temp by msg_type 
| eval TimeDiff=(COMPLETE-BEGIN)/3600
| eval date_mday=mvindex(split(temp,"##"),0)
| stats sum(TimeDiff) AS Sum by date_mday
0 Karma

byu168168
Path Finder

Sorry, I was using a stats count command just for testing purposes. Instead of the stats count command it should be replaced with this

| eval TimeDiff = (etime-stime)/60/60 | stats sum(TimeDiff) AS Sum by date_mday

Can this be done without using a join command?

Also, what would be considered a large amount of data?

0 Karma

somesoni2
Revered Legend

Try the updated command. The join/transaction commands are expensive in terms of system resources, so a large amount is a number which can't be processed with available system resources. I don't believe you can benchmark something as large amount, it depends upon your server configurations, load at time of searching etc. So, I generally avoid something which may or may not work all the time.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...