Issue when joining results of large tstats queries...

moystard · ‎05-11-2017

Hello everyone,

I am currently struggling with Splunk limitations when it comes to joining two queries handling very large datasets (the two sets are two large to hold within the limits.conf limitations and I only get partial results).

I am trying to find out for each entity if a first property is true for each hour, and if this first property is true, find out in these valid hours the value of a second property to finally perform its sum per entity. Each query information is present in a dedicated datamodel, which I need to join using the entity id + the time (to get the hourly result). Unfortunately, the query gives me only partial results since the first data model contains more than 2 millions events, while the second data model contains approximately 35 millions. The computation is done over a week.

The result of these two queries is then summed up to provide the count of hours that contain the second property per entity.

Below is what I am trying to achieve:

| tstats values(main.is_valid) as is_valid from datamodel=validity where nodename=main AND main.is_valid=true groupby _time span=1h, main.entity
| rename main.entity as entity
| eval is_valid=if(is_valid LIKE "%true%", 1, 0)
| fields _time, entity, is_valid
| join _time, entity type=left [
    | tstats count as second_events from datamodel=secondary where nodename=main groupby _time span=1h, main.entity
    | rename main.entity as entity
    | eval is_true=if(second_events > 0, 1, 0)
    | fields _time, entity, is_true
]
| fillnull is_true
| fields _time, entity, is_valid, is_true
| stats sum(is_true) by entity

I might not be using the best possible approach or might be missing something. I would be extremely glad if you could provide me with suggestions to achieve what I am trying to do.

Thank you very much!

woodcock · ‎05-11-2017

Have you tried tstats append=t?
If that won't work for you (I am not sure about limits there), then there is only one way to losslessly join datasets, do it like this:

| tstats values(main.is_valid) as is_valid from datamodel=validity where nodename=main AND main.is_valid=true groupby _time span=1h, main.entity
| rename main.entity as entity
| eval is_valid=if(is_valid LIKE "%true%", 1, 0)
| eval DATASET="1"
| fields _time, entity, is_valid DATASET
| appendpipe [
    | tstats count as second_events from datamodel=secondary where nodename=main groupby _time span=1h, main.entity
    | rename main.entity as entity
    | eval is_true=if(second_events > 0, 1, 0)
    | eval DATASET="2"
    | fields _time, entity, is_valid DATASET
]
| fillnull is_true
| stats values(*) AS * dc(DATASET) AS numDatasets BY _time entity
| rename COMMENT AS "put your join logic here: use '|search DATASET=1' for left join, '|search DATASET=2' for right join, '|search numDatasets=1' for disunion, '|search numDatasets=2 for intersection"
| fields _time, entity, is_valid, is_true
| stats sum(is_true) by entity

prats84 · ‎07-17-2017

You might want to remove the prestats=t from the second tstats.

woodcock · ‎12-04-2017

It is required when using append=t.

moystard · ‎05-12-2017

Thank you very much for your answer.

Unfortunately, your query does not work, when I search DATASET=2, it produces no result (it looks like only the first dataset is kept). I obviously tried to run the query separately and it outputs results properly, only when appendpipe is used that it does not work.

I have also tried to use prestats like below but it does not output any other properties than _time and mount_id:

| tstats prestats=t values(main.valid_event) as is_valid_hour from datamodel=hour_validity where nodename=main AND main.is_valid_hour=true groupby _time span=1h, main.entity 
| tstats prestats=t append=t count as true_events from datamodel=true_events where nodename=main groupby _time span=1h, main.entity 
| table _time, entity, is_valid_hour, true_events

woodcock · ‎07-07-2017

Your situation makes no sense to me. Are you sure that you are describing everything accurately?

Issue when joining results of large tstats queries over time.

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

What’s New in Splunk Observability – September 2025

Fun with Regular Expression - multiples of nine

Are you a member of the Splunk Community?

Issue when joining results of large tstats queries over time.

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

What’s New in Splunk Observability – September 2025

Fun with Regular Expression - multiples of nine