topic Re: Issue when joining results of large tstats queries over time. in Splunk Search

Issue when joining results of large tstats queries over time.

moystard — Thu, 11 May 2017 23:28:58 GMT

Hello everyone,

I am currently struggling with Splunk limitations when it comes to joining two queries handling very large datasets (the two sets are two large to hold within the limits.conf limitations and I only get partial results).

I am trying to find out for each entity if a first property is true for each hour, and if this first property is true, find out in these valid hours the value of a second property to finally perform its sum per entity. Each query information is present in a dedicated datamodel, which I need to join using the entity id + the time (to get the hourly result). Unfortunately, the query gives me only partial results since the first data model contains more than 2 millions events, while the second data model contains approximately 35 millions. The computation is done over a week.

The result of these two queries is then summed up to provide the count of hours that contain the second property per entity.

Below is what I am trying to achieve:

| tstats values(main.is_valid) as is_valid from datamodel=validity where nodename=main AND main.is_valid=true groupby _time span=1h, main.entity
| rename main.entity as entity
| eval is_valid=if(is_valid LIKE "%true%", 1, 0)
| fields _time, entity, is_valid
| join _time, entity type=left [
    | tstats count as second_events from datamodel=secondary where nodename=main groupby _time span=1h, main.entity
    | rename main.entity as entity
    | eval is_true=if(second_events > 0, 1, 0)
    | fields _time, entity, is_true
]
| fillnull is_true
| fields _time, entity, is_valid, is_true
| stats sum(is_true) by entity

I might not be using the best possible approach or might be missing something. I would be extremely glad if you could provide me with suggestions to achieve what I am trying to do.

Thank you very much!

Re: Issue when joining results of large tstats queries over time.

woodcock — Fri, 12 May 2017 03:34:54 GMT

Have you tried tstats append=t?
If that won't work for you (I am not sure about limits there), then there is only one way to losslessly join datasets, do it like this:

| tstats values(main.is_valid) as is_valid from datamodel=validity where nodename=main AND main.is_valid=true groupby _time span=1h, main.entity
| rename main.entity as entity
| eval is_valid=if(is_valid LIKE "%true%", 1, 0)
| eval DATASET="1"
| fields _time, entity, is_valid DATASET
| appendpipe [
    | tstats count as second_events from datamodel=secondary where nodename=main groupby _time span=1h, main.entity
    | rename main.entity as entity
    | eval is_true=if(second_events > 0, 1, 0)
    | eval DATASET="2"
    | fields _time, entity, is_valid DATASET
]
| fillnull is_true
| stats values(*) AS * dc(DATASET) AS numDatasets BY _time entity
| rename COMMENT AS "put your join logic here: use '|search DATASET=1' for left join, '|search DATASET=2' for right join, '|search numDatasets=1' for disunion, '|search numDatasets=2 for intersection"
| fields _time, entity, is_valid, is_true
| stats sum(is_true) by entity

Re: Issue when joining results of large tstats queries over time.

moystard — Tue, 29 Sep 2020 14:02:49 GMT

Thank you very much for your answer.

Unfortunately, your query does not work, when I search DATASET=2, it produces no result (it looks like only the first dataset is kept). I obviously tried to run the query separately and it outputs results properly, only when appendpipe is used that it does not work.

I have also tried to use prestats like below but it does not output any other properties than _time and mount_id:

| tstats prestats=t values(main.valid_event) as is_valid_hour from datamodel=hour_validity where nodename=main AND main.is_valid_hour=true groupby _time span=1h, main.entity 
| tstats prestats=t append=t count as true_events from datamodel=true_events where nodename=main groupby _time span=1h, main.entity 
| table _time, entity, is_valid_hour, true_events

Re: Issue when joining results of large tstats queries over time.

woodcock — Fri, 07 Jul 2017 21:17:08 GMT

Your situation makes no sense to me. Are you sure that you are describing everything accurately?

Re: Issue when joining results of large tstats queries over time.

prats84 — Tue, 18 Jul 2017 02:35:17 GMT

You might want to remove the prestats=t from the second tstats.

Re: Issue when joining results of large tstats queries over time.

woodcock — Tue, 05 Dec 2017 05:55:54 GMT

It is required when using append=t.