I am new to splunk queries and was trying to combine results from multiple queries without using subsearches due to its limitation of restricting subsearches to 50000 results but our dataset has more than 50000 records to be considered. Below is the query I was trying
(index="B" logType="REQUEST") OR( index="B" logType="TRACES" message="searchString1*") OR (index="B" logType="TRACES" message="searchString2*")
| stats latest(*) as * by Id
All above queries have the id field in the result which match and correspond to some kind of a correlation id between these logs. I would like to have the end result show all the common fields which has same values, but also with message field having the consolidated message content from the individual queries made on the same index B. The message field alone can have different values between the queries and need to be consolidated on the result. Can someone help on how this can be done ?
It sounds like you need the values function.
(index="B" logType="REQUEST") OR( index="B" logType="TRACES" message="searchString1*") OR (index="B" logType="TRACES" message="searchString2*")
| stats values(message) as messages, latest(*) as * by Id
It sounds like you need the values function.
(index="B" logType="REQUEST") OR( index="B" logType="TRACES" message="searchString1*") OR (index="B" logType="TRACES" message="searchString2*")
| stats values(message) as messages, latest(*) as * by Id
thanks @richgalloway for the response! This indeed helps. Can I extend the question also to understand how I can enforce that the individual searches between the OR conditions return result for sure and only then combine the results (similar to inner join) using Id field?
(index="B" logType="REQUEST") OR( index="B" logType="TRACES" message="searchString1*") OR (index="B" logType="TRACES" message="searchString2*")
| stats values(message) as messages, latest(*) as * by Id
| where like(messages, "searchString1%") and like(messages, "searchString2%")
I'm not sure what you're looking for, but it sounds like you want the stats command to return the _raw field, perhaps like this.
(index="B" logType="REQUEST") OR( index="B" logType="TRACES" message="searchString1*") OR (index="B" logType="TRACES" message="searchString2*")
| stats values(_raw) as raw by Id
If I understand your question here, I believe adding something like this to your stats aggregation can give you additional fields you can use to filter on and only include the Ids that have events occurring from each 3 of the scenarios you have separated with ORs in the original search.
index="B" AND (logType="REQUEST" OR (logType="TRACES" AND message IN ("searchString1*", "searchString2*")))
```
In the below stats aggregation the max(eval(if())) functions are checking if a specific event matches a condition inside your if statement.
If there is at least a single event that matches the criteria for a specific 'Id' then this value will be 1.
If the condition is not met for an 'Id' then it will be a 0.
```
| stats
max(eval(if('logType'=="REQUEST", 1, 0))) as has_request_log,
max(eval(if('logType'=="TRACES" AND like(message, "searchString1%"), 1, 0))) as has_trace_type_1,
max(eval(if('logType'=="TRACES" AND like(message, "searchString2%"), 1, 0))) as has_trace_type_2,
values(message) as messages,
latest(*) as *
by Id
``` Only include the Ids where there were events from all 3 of these search criteria ```
| where 'has_request_log'==1 AND 'has_trace_type_1'==1 AND 'has_trace_type_2'==1
Alternatively, you can just classify the log types before the stats aggregation and do your filtration based off of that field.
index="B" AND (logType="REQUEST" OR (logType="TRACES" AND message IN ("searchString1*", "searchString2*")))
```
Eval to classify the logs that are returned from your search to a field named 'event_category'
```
| eval
event_category=case(
'logType'=="REQUEST", "Request",
'logType'=="TRACES" AND LIKE(message, "searchString1%"), "Traces_1",
'logType'=="TRACES" AND LIKE(message, "searchString2%"), "Traces_2"
)
```
Group all unique values of 'event_category' seen for each Id
```
| stats
values(event_category) as event_category
values(message) as messages,
latest(*) as *
by Id
```
Only include the Ids where there were events from all 3 of these search criteria.
mvcount() function checks how many values are contained withing the field, since we used a
values(event_category) as event_category
we only want the Ids that have all 3 unique classifications
```
| where mvcount(event_category)>=3