join replacement with stats

mcaulsc · ‎11-29-2021

OK, I'm trying to improve performance by replacing some join queries with stats, but struggling on a filter.

I have the below query, two source types where the common field between events is 'Correlator' . In source_one I have fields 'Correlator', 'sysplex' and 'servername'.

In source_detail I have 'Correlator', 'sysplex' and multiple other fields, the one for this data is Sample_NAME. 'servername' in source_one could have multiple values and I want to filter on a match so search servername=xyz*

I've tried a number of ways and I can't seem to manage to limit results to a filter on 'servername' without losing everything else, 'sysplex' which is in both sourcetypes filters just fine.

Any thoughts would be appreciated.

index=my_index sourcetype=source_one OR sourcetype=source_detail sysplex=ABC* | stats values(SAMPLE_NAME) AS SampleName values(SAMPLE_TIME) AS SampleTime by Correlator,SampleTime | eval _time=strptime(SampleTime,"%Y-%m-%d %H:%M:%S.%N") | timechart span=1m count by SampleName

Sukisen1981 · ‎12-01-2021

I don't understand your stats

 |stats values(SAMPLE_NAME) AS SampleName values(SAMPLE_TIME) AS SampleTime by Correlator,SampleTime

You have values(SAMPLE_TIME) as SampleTime which you are trying to split by the same field (SampleTime) again. Its hard to understand your requirement, but your usage of SampleTime is the stats is a bit like doing something like

|stats values(host) by source,host

Is it possible to get an extract of your data and the desired output?

bowesmana · ‎11-29-2021

In your primary search you are doing

index=my_index sourcetype=source_one OR sourcetype=source_detail sysplex=ABC*

How have you tried to filter for servername=xyz*? If you use brackets you should be able to filter your results with a servername match, i.e.

index=my_index ((sourcetype=source_one servername=xyz*) OR sourcetype=source_detail) sysplex=ABC*

As for the rest of the search, there is no point in doing values(SampleTime) as you are splitting by time anyway

| stats values(SAMPLE_NAME) AS SampleName by Correlator, SampleTime 
| eval _time=strptime(SampleTime,"%Y-%m-%d %H:%M:%S.%N") 
| timechart span=1m count by SampleName

However, your original post is not clear on a couple of points

Is SampleTime in both sources?

What are you actually wanting to count? Is it the count of source1+source2 rows for each Correlator?

When you filter by servername in source1, do you then want that to exclude all source2 events where there is no common Correlator? You are not carrying any source1 fields through the stats command, so you cannot exclude the source2 events, so you would have to carry servername through, e.g. with

| stats values(servername) as servername....

Then post the stats you can do

| where match(servername,"^xyz")

to filter out source2 results where there is no common Correlator for the servername events you want

mcaulsc · ‎11-30-2021

Thanks,

SampleTime is only in source_detail.

The common field is Correlator and I only want events where the Correlator is for servername=xyz*.

I then want to count the number of events for SAMPLE_NAME per minute in source_detail where Correlator matches what we returned from source_one.

so what I want out is :

_time (derived from SampleTime) name1 name2 etc

2021-11-29 20:04:00 0 5

2021-11-29 20:05:00 2 1

etc.

mcaulsc · ‎12-01-2021

One additional bit of information, the Correlator is a 1 to many relationship, so one entry in source_one and many entries in source_detail

isoutamo · ‎11-29-2021

Probably You have already read this https://conf.splunk.com/files/2020/slides/TRU1761C.pdf ?There are some other useful conf presentations too about this subject.

r. Ismo

mcaulsc · ‎11-30-2021

Hi,

Yes thanks, read through that, or at least a copy of the same from a different conference I suspect. I think if I can work this one query out then I should be ok to convert some other joins which make a dashboard quite expensive.

Just struggling on building the right stats results to chart correctly.

Regards

Steve

isoutamo · ‎11-30-2021

Here is another answer about replacing joins https://community.splunk.com/t5/Splunk-Search/What-is-the-relation-between-the-Splunk-inner-left-joi... check woodcock’s answer with examples.

mcaulsc · ‎12-01-2021

oh that's a good one, will tuck that away for future reference, thanks.

I'm close, I just can't get the span to work in the timechart, will span correctly for periods where there are no events showing 0, but events appear individually rather than counted over the span period.

index=ti-p_ibmzos (sourcetype=source_one servername=ABC*) OR (sourcetype=source_detail) sysplex=XYZ
| stats dc(sourcetype) AS sourcetypeCount values(*) AS * BY Correlator
| where sourcetype="source_one"
| eval NewTime=strptime(sample_time,"%Y-%m-%d %H:%M:%S")
| eval _time=NewTime
| timechart span=5m count by sample_name

_time name_one name_two
2021-12-01 08:36:28 1 1
2021-12-01 08:36:29 1 1
2021-12-01 08:44:16 1 1
2021-12-01 08:44:18 1 1
2021-12-01 08:49:59 1 1
2021-12-01 08:50:01 1 1
2021-12-01 09:01:31 1 1

gcusello · ‎11-29-2021

Hi @mcaulsc,

you have to put:

in the main search both the searches,
in the BY clause the common keys in both searches,
before the BY clause the fields you need using a function as value or max or etc...

In few words, something like this:

index=my_index (sourcetype=source_one OR sourcetype=source_detail) sysplex=ABC* 
| stats 
     values(SAMPLE_NAME) AS SampleName 
     values(SAMPLE_TIME) AS SampleTime 
     by Correlator sysplex
| eval _time=strptime(SampleTime,"%Y-%m-%d %H:%M:%S.%N") 
| timechart span=1m count by SampleName

Ciao.

Giuseppe

mcaulsc · ‎11-29-2021

Thanks,

That doesn't filter by 'servername' at any point though? Also I need to use the time in the BY clause otherwise it doesn't group correctly.

regards

Steve

gcusello · ‎11-29-2021

Hi @mcaulsc,

if you want to use the time in BY clause, remember to use the bin command to have groupable values.

If you need another field:

if it's in both the sources, you can use in the BY clause,
otherwise in a values option.

Ciao.

Giuseppe

join replacement with stats

join

stats

Introducing the Splunk Community Dashboard Challenge!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...