Splunk Search

join replacement with stats

mcaulsc
Path Finder

OK, I'm trying to improve performance by replacing some join queries with stats, but struggling on a filter.

I have the below query, two source types where the common field between events is 'Correlator' . In source_one I have fields 'Correlator', 'sysplex' and 'servername'.

In source_detail I have 'Correlator', 'sysplex' and multiple other fields, the one for this data is Sample_NAME. 'servername' in source_one could have multiple values and I want to filter on a match so search servername=xyz*

I've tried a number of ways and I can't seem to manage to limit results to a filter on 'servername' without losing everything else, 'sysplex' which is in both sourcetypes filters just fine.

Any thoughts would be appreciated.

index=my_index sourcetype=source_one OR sourcetype=source_detail sysplex=ABC* | stats values(SAMPLE_NAME) AS SampleName values(SAMPLE_TIME) AS SampleTime by Correlator,SampleTime | eval _time=strptime(SampleTime,"%Y-%m-%d %H:%M:%S.%N") | timechart span=1m count by SampleName

Labels (2)
0 Karma

Sukisen1981
Champion

I don't understand your stats

 |stats values(SAMPLE_NAME) AS SampleName values(SAMPLE_TIME) AS SampleTime by Correlator,SampleTime

You have values(SAMPLE_TIME) as SampleTime which you are trying to split by the same field (SampleTime) again. Its hard to understand your requirement, but your usage of SampleTime is the stats is a bit like doing something like 

|stats values(host) by source,host

Is it possible to get an extract of your data and the desired output?

0 Karma

bowesmana
SplunkTrust
SplunkTrust

In your primary search you are doing

index=my_index sourcetype=source_one OR sourcetype=source_detail sysplex=ABC*

How have you tried to filter for servername=xyz*?  If you use brackets you should be able to filter your results with a servername match, i.e.

index=my_index ((sourcetype=source_one servername=xyz*) OR sourcetype=source_detail) sysplex=ABC*

As for the rest of the search, there is no point in doing values(SampleTime) as you are splitting by time anyway

| stats values(SAMPLE_NAME) AS SampleName by Correlator, SampleTime 
| eval _time=strptime(SampleTime,"%Y-%m-%d %H:%M:%S.%N") 
| timechart span=1m count by SampleName

However, your original post is not clear on a couple of points

Is SampleTime in both sources?

What are you actually wanting to count? Is it the count of source1+source2 rows for each Correlator?

When you filter by servername in source1, do you then want that to exclude all source2 events where there is no common Correlator? You are not carrying any source1 fields through the stats command, so you cannot exclude the source2 events, so you would have to carry servername through, e.g. with

| stats values(servername) as servername....

Then post the stats you can do

| where match(servername,"^xyz")

to filter out source2 results where there is no common Correlator for the servername events you want

 

 

0 Karma

mcaulsc
Path Finder

Thanks,

SampleTime is only in source_detail.

The common field is Correlator and I only want events where the Correlator is for servername=xyz*.

I then want to count the number of events for SAMPLE_NAME per minute in source_detail where Correlator matches what we returned from source_one.

 so what I want out   is :

_time (derived from SampleTime)     name1            name2   etc

2021-11-29 20:04:00                                  0                        5

2021-11-29 20:05:00                                  2                       1

etc.

 

0 Karma

mcaulsc
Path Finder

One additional bit of information, the Correlator is a 1 to many relationship, so one entry in source_one and many entries in source_detail

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Probably You have already read this https://conf.splunk.com/files/2020/slides/TRU1761C.pdf ?There are some other useful conf presentations too about this subject.

r. Ismo

0 Karma

mcaulsc
Path Finder

Hi,

Yes thanks, read through that, or at least a copy of the same from a different conference I suspect. I think if I can work this one query out then I should be ok to convert some other joins which make a dashboard quite expensive.

Just struggling on building the right stats results to chart correctly.

Regards

Steve

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Here is another answer about replacing joins https://community.splunk.com/t5/Splunk-Search/What-is-the-relation-between-the-Splunk-inner-left-joi... check woodcock’s answer with examples.

0 Karma

mcaulsc
Path Finder

oh that's a good one, will tuck that away for future reference, thanks.

I'm close, I just can't get the span to work in the timechart, will span correctly for periods where there are no events showing 0, but events appear individually rather than counted over the span period.

index=ti-p_ibmzos (sourcetype=source_one servername=ABC*) OR (sourcetype=source_detail) sysplex=XYZ
| stats dc(sourcetype) AS sourcetypeCount values(*) AS * BY Correlator
| where sourcetype="source_one"
| eval NewTime=strptime(sample_time,"%Y-%m-%d %H:%M:%S")
| eval _time=NewTime
| timechart span=5m count by sample_name

 

_time                                   name_one      name_two
2021-12-01 08:36:28        1                         1
2021-12-01 08:36:29        1                         1
2021-12-01 08:44:16        1                         1
2021-12-01 08:44:18        1                         1
2021-12-01 08:49:59        1                         1
2021-12-01 08:50:01        1                         1
2021-12-01 09:01:31        1                         1

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @mcaulsc,

you have to put:

  • in the main search both the searches,
  • in the BY clause the common keys in both searches,
  • before the BY clause the fields you need using a function as value or max or etc...

In few words, something like this:

index=my_index (sourcetype=source_one OR sourcetype=source_detail) sysplex=ABC* 
| stats 
     values(SAMPLE_NAME) AS SampleName 
     values(SAMPLE_TIME) AS SampleTime 
     by Correlator sysplex
| eval _time=strptime(SampleTime,"%Y-%m-%d %H:%M:%S.%N") 
| timechart span=1m count by SampleName

Ciao.

Giuseppe

0 Karma

mcaulsc
Path Finder

Thanks,

That doesn't filter by 'servername' at any point though? Also I need to use the time in the BY clause otherwise it doesn't group correctly.

regards

Steve

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @mcaulsc,

if you want to use the time in BY clause, remember to use the bin command to have groupable values.

If you need another field:

  • if it's in both the sources, you can use in the BY clause,
  • otherwise in a values option.

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...