OK, I'm trying to improve performance by replacing some join queries with stats, but struggling on a filter.
I have the below query, two source types where the common field between events is 'Correlator' . In source_one I have fields 'Correlator', 'sysplex' and 'servername'.
In source_detail I have 'Correlator', 'sysplex' and multiple other fields, the one for this data is Sample_NAME. 'servername' in source_one could have multiple values and I want to filter on a match so search servername=xyz*
I've tried a number of ways and I can't seem to manage to limit results to a filter on 'servername' without losing everything else, 'sysplex' which is in both sourcetypes filters just fine.
Any thoughts would be appreciated.
index=my_index sourcetype=source_one OR sourcetype=source_detail sysplex=ABC* | stats values(SAMPLE_NAME) AS SampleName values(SAMPLE_TIME) AS SampleTime by Correlator,SampleTime | eval _time=strptime(SampleTime,"%Y-%m-%d %H:%M:%S.%N") | timechart span=1m count by SampleName
I don't understand your stats
|stats values(SAMPLE_NAME) AS SampleName values(SAMPLE_TIME) AS SampleTime by Correlator,SampleTime
You have values(SAMPLE_TIME) as SampleTime which you are trying to split by the same field (SampleTime) again. Its hard to understand your requirement, but your usage of SampleTime is the stats is a bit like doing something like
|stats values(host) by source,host
Is it possible to get an extract of your data and the desired output?
In your primary search you are doing
index=my_index sourcetype=source_one OR sourcetype=source_detail sysplex=ABC*
How have you tried to filter for servername=xyz*? If you use brackets you should be able to filter your results with a servername match, i.e.
index=my_index ((sourcetype=source_one servername=xyz*) OR sourcetype=source_detail) sysplex=ABC*
As for the rest of the search, there is no point in doing values(SampleTime) as you are splitting by time anyway
| stats values(SAMPLE_NAME) AS SampleName by Correlator, SampleTime
| eval _time=strptime(SampleTime,"%Y-%m-%d %H:%M:%S.%N")
| timechart span=1m count by SampleName
However, your original post is not clear on a couple of points
Is SampleTime in both sources?
What are you actually wanting to count? Is it the count of source1+source2 rows for each Correlator?
When you filter by servername in source1, do you then want that to exclude all source2 events where there is no common Correlator? You are not carrying any source1 fields through the stats command, so you cannot exclude the source2 events, so you would have to carry servername through, e.g. with
| stats values(servername) as servername....
Then post the stats you can do
| where match(servername,"^xyz")
to filter out source2 results where there is no common Correlator for the servername events you want
Thanks,
SampleTime is only in source_detail.
The common field is Correlator and I only want events where the Correlator is for servername=xyz*.
I then want to count the number of events for SAMPLE_NAME per minute in source_detail where Correlator matches what we returned from source_one.
so what I want out is :
_time (derived from SampleTime) name1 name2 etc
2021-11-29 20:04:00 0 5
2021-11-29 20:05:00 2 1
etc.
One additional bit of information, the Correlator is a 1 to many relationship, so one entry in source_one and many entries in source_detail
Probably You have already read this https://conf.splunk.com/files/2020/slides/TRU1761C.pdf ?There are some other useful conf presentations too about this subject.
r. Ismo
Hi,
Yes thanks, read through that, or at least a copy of the same from a different conference I suspect. I think if I can work this one query out then I should be ok to convert some other joins which make a dashboard quite expensive.
Just struggling on building the right stats results to chart correctly.
Regards
Steve
Here is another answer about replacing joins https://community.splunk.com/t5/Splunk-Search/What-is-the-relation-between-the-Splunk-inner-left-joi... check woodcock’s answer with examples.
oh that's a good one, will tuck that away for future reference, thanks.
I'm close, I just can't get the span to work in the timechart, will span correctly for periods where there are no events showing 0, but events appear individually rather than counted over the span period.
index=ti-p_ibmzos (sourcetype=source_one servername=ABC*) OR (sourcetype=source_detail) sysplex=XYZ
| stats dc(sourcetype) AS sourcetypeCount values(*) AS * BY Correlator
| where sourcetype="source_one"
| eval NewTime=strptime(sample_time,"%Y-%m-%d %H:%M:%S")
| eval _time=NewTime
| timechart span=5m count by sample_name
_time name_one name_two
2021-12-01 08:36:28 1 1
2021-12-01 08:36:29 1 1
2021-12-01 08:44:16 1 1
2021-12-01 08:44:18 1 1
2021-12-01 08:49:59 1 1
2021-12-01 08:50:01 1 1
2021-12-01 09:01:31 1 1
Hi @mcaulsc,
you have to put:
In few words, something like this:
index=my_index (sourcetype=source_one OR sourcetype=source_detail) sysplex=ABC*
| stats
values(SAMPLE_NAME) AS SampleName
values(SAMPLE_TIME) AS SampleTime
by Correlator sysplex
| eval _time=strptime(SampleTime,"%Y-%m-%d %H:%M:%S.%N")
| timechart span=1m count by SampleName
Ciao.
Giuseppe
Thanks,
That doesn't filter by 'servername' at any point though? Also I need to use the time in the BY clause otherwise it doesn't group correctly.
regards
Steve
Hi @mcaulsc,
if you want to use the time in BY clause, remember to use the bin command to have groupable values.
If you need another field:
Ciao.
Giuseppe