Summary Index getting populated with incorrect dat...

Roopaul · ‎06-03-2016

Hi, I am getting logs from 2 servers which is exactly same unless there is some failure. We have to group the events based on an Id and consider it as a single event for reporting. So i used 'transaction' command. When I ran the query as a stand-alone it gives correct count as expected. But while it gets written to SI its giving wrong results. This SI is getting populate every hour.

index=test | fields content
| rex field=content "\n*Id:(?P<Id>\d[^~]+)"  
| rex field=content "\n*Path\:(?<path>[^~|?]+)"
| transaction Id keepevicted=true
| fillnull value=NA path
| replace  "" with "NA" in path
| bucket _time span=1h
| stats count by _time,path

content from hostA
time1 Id:A Path:AB1
time1 Id:A Path:AB2
time2 Id:B Path:AC1
time2 Id:C Path:AC1

content from hostB
time1 Id:A Path:AB1
time1 Id:A Path:AB2
time2 Id:B Path:AC1
time2 Id:C Path:AC1

Output while running standalone: - this is expected to fill in summary
time1 AB1 1
time1 AB2 1
time2 Ac1 2

Output while writing to summary: - this is counting from both the servers
time1 AB1 2
time1 AB2 2
time2 Ac1 4

somesoni2 · ‎06-03-2016

Give this a try

 index=test | fields content
 | rex field=content "\n*Id:(?P<Id>\d[^~]+)"  
 | rex field=content "\n*Path\:(?<path>[^~|?]+)"
 | fillnull value=NA path
 | replace  "" with "NA" in path
 |dedup Id path
 | bucket _time span=1h
 | stats count by _time,path

Update
Try this

your base search giving your result from both host and all 5 fields
| table _time Id Path otherfield1 otherfield2 otherfield3...
| fillnull value=NA path
| replace  "" with "NA" in path
| stats values(*) as * by _time Id
 | bucket _time span=1h
 | stats count by _time,path

Roopaul · ‎06-03-2016

Thanks for the answer. There are multiple other conditions also in this data which i have not explained. So i can't use dedup, because evenif id+path combination is unique there are other fields which can be different. So based on certain conditions, we extract the required fields from these files after using transaction command. so removing duplicated based on these 2 fields might remove the data that is required.

somesoni2 · ‎06-03-2016

I've seen Splunk behaving differently when using transaction command (it's a resource intensive command and since scheduled searches have lower priority than ad-hoc, it has to work with (less) available resources). Consider replacing it with a stats or something. If you can add your full search in the question, answer community can help you with a solution .

Roopaul · ‎06-03-2016

This is my requirement 🙂 Hope this helps.

hostA:
Field1 | Field2 | Field3 | Field4 | Field5
time1 | Id1 | path1 | dog1 |

time1 | Id1 | path1 | _____ | cat1
time1 | Id2 | path1 | dog1 |

time1 | Id2 | path1 |____| cat1
time2 | Id3 | path2 | dog2 |

time2 | Id3 | path2 | ___ | cat2
time2 | Id4 | path2 | dog2 |

hostB:
Field1 | Field2 | Field3 | Field4 | Field5
time1 | Id1 | path1 | dog1 |

time1 | Id1 | path1 | _____ | cat1
time1 | Id2 | path1 | dog1 |

time1 | Id2 | path1 |____| cat1
time2 | Id3 | path2 | dog2 |

time2 | Id3 | path2 | ___ | cat2
time2 | Id5 | path2 | dog2 |

I want the out to be like this and want this to be stored in a summary index.

Field1 | Field3 | Field4 | Field5 | Count
time1 | path1 | dog1 | cat1 | 1
time1 | path1 | dog1 | cat1 | 1
time2 | path2 | dog2 | cat2 | 1
time2 | path2 | dog2 | NA | 2

Summary Index getting populated with incorrect data

Thanks for the Memories! Splunk University, .conf25, and our Community

Data Persistence in the OpenTelemetry Collector

Introducing Splunk 10.0: Smarter, Faster, and More Powerful Than Ever

Are you a member of the Splunk Community?

Summary Index getting populated with incorrect data

Thanks for the Memories! Splunk University, .conf25, and our Community

Data Persistence in the OpenTelemetry Collector

Introducing Splunk 10.0: Smarter, Faster, and More Powerful Than Ever