Hi, I am getting logs from 2 servers which is exactly same unless there is some failure. We have to group the events based on an Id and consider it as a single event for reporting. So i used 'transaction' command. When I ran the query as a stand-alone it gives correct count as expected. But while it gets written to SI its giving wrong results. This SI is getting populate every hour.
index=test | fields content
| rex field=content "\n*Id:(?P<Id>\d[^~]+)"
| rex field=content "\n*Path\:(?<path>[^~|?]+)"
| transaction Id keepevicted=true
| fillnull value=NA path
| replace "" with "NA" in path
| bucket _time span=1h
| stats count by _time,path
content from hostA
time1 Id:A Path:AB1
time1 Id:A Path:AB2
time2 Id:B Path:AC1
time2 Id:C Path:AC1
content from hostB
time1 Id:A Path:AB1
time1 Id:A Path:AB2
time2 Id:B Path:AC1
time2 Id:C Path:AC1
Output while running standalone: - this is expected to fill in summary
time1 AB1 1
time1 AB2 1
time2 Ac1 2
Output while writing to summary: - this is counting from both the servers
time1 AB1 2
time1 AB2 2
time2 Ac1 4
Give this a try
index=test | fields content
| rex field=content "\n*Id:(?P<Id>\d[^~]+)"
| rex field=content "\n*Path\:(?<path>[^~|?]+)"
| fillnull value=NA path
| replace "" with "NA" in path
|dedup Id path
| bucket _time span=1h
| stats count by _time,path
Update
Try this
your base search giving your result from both host and all 5 fields
| table _time Id Path otherfield1 otherfield2 otherfield3...
| fillnull value=NA path
| replace "" with "NA" in path
| stats values(*) as * by _time Id
| bucket _time span=1h
| stats count by _time,path
Thanks for the answer. There are multiple other conditions also in this data which i have not explained. So i can't use dedup, because evenif id+path combination is unique there are other fields which can be different. So based on certain conditions, we extract the required fields from these files after using transaction command. so removing duplicated based on these 2 fields might remove the data that is required.
I've seen Splunk behaving differently when using transaction command (it's a resource intensive command and since scheduled searches have lower priority than ad-hoc, it has to work with (less) available resources). Consider replacing it with a stats or something. If you can add your full search in the question, answer community can help you with a solution .
This is my requirement 🙂 Hope this helps.
hostA:
Field1 | Field2 | Field3 | Field4 | Field5
time1 | Id1 | path1 | dog1 |
time1 | Id1 | path1 | _____ | cat1
time1 | Id2 | path1 | dog1 |
time1 | Id2 | path1 |____| cat1
time2 | Id3 | path2 | dog2 |
time2 | Id3 | path2 | ___ | cat2
time2 | Id4 | path2 | dog2 |
hostB:
Field1 | Field2 | Field3 | Field4 | Field5
time1 | Id1 | path1 | dog1 |
time1 | Id1 | path1 | _____ | cat1
time1 | Id2 | path1 | dog1 |
time1 | Id2 | path1 |____| cat1
time2 | Id3 | path2 | dog2 |
time2 | Id3 | path2 | ___ | cat2
time2 | Id5 | path2 | dog2 |
I want the out to be like this and want this to be stored in a summary index.
Field1 | Field3 | Field4 | Field5 | Count
time1 | path1 | dog1 | cat1 | 1
time1 | path1 | dog1 | cat1 | 1
time2 | path2 | dog2 | cat2 | 1
time2 | path2 | dog2 | NA | 2