Splunk Search

Combining Fields into a Table

TylerJVitale
Explorer

I want to combine the data from a prediction algorithm on crashed applications with additional data about the crashed applications, namely the count of affected users. Right now, index=index sourcetype="sourcetype" APPLICATION="app*"| timechart span=1d sum(VOLUME) | predict "sum(VOLUME)" as prediction algorithm="LLP5" future_timespan="30" holdback="0" period=7 lower"95"=lower"95" upper"95"=upper"95" | eval isOutlier = if(prediction!="" AND 'sum(VOLUME)' !="" AND ('sum(VOLUME)' < 'lower95(prediction)' OR 'sum(VOLUME)' > 'upper95(prediction)'), 1, 0) | where isOutlier=1 | fields - isOutlier|rename sum(VOLUME) as "Actual Crashes", prediction as "Predicted Crashes" gives me what I want for the prediction data,

and index=index sourcetype="sourcetype" APPLICATION="app*"| timechart span=1d dc(USERNAME) as "Users"gives me the count of users,

but index=index sourcetype="sourcetype" APPLICATION="app*"| timechart span=1d sum(VOLUME) | predict "sum(VOLUME)" as prediction algorithm="LLP5" future_timespan="30" holdback="0" period=7 lower"95"=lower"95" upper"95"=upper"95" | eval isOutlier = if(prediction!="" AND 'sum(VOLUME)' !="" AND ('sum(VOLUME)' < 'lower95(prediction)' OR 'sum(VOLUME)' > 'upper95(prediction)'), 1, 0) | where isOutlier=1 | fields - isOutlier|rename sum(VOLUME) as "Actual Crashes", prediction as "Predicted Crashes"|join[search index=index sourcetype="index" APPLICATION="app*"| timechart span=1d dc(USERNAME) as "Users"] doesn't give me the correct count of users in the table.

Any ideas on how to combine these searches properly?

Tags (2)
0 Karma
1 Solution

rbechtold
Communicator

Hey Tyler!

I've recreated what I believe your dataset may look like (copy and paste into any Splunk instance):

| gentimes start=6/18/2019 end=7/18/2019 increment=10m 
| eval modifier = random()%5 
| eval VOLUME = random()%400 + 100 
| eval VOLUME = if(modifier = 1 OR modifier = 2, VOLUME * -1, VOLUME) 
| eval _time = starttime 
| timechart span=1d sum(VOLUME) AS sumv 
| predict sumv as prediction algorithm="LLP5" future_timespan=30 holdback=0 period=7 lower95=lower95 upper95=upper95 
| streamstats count as test 
| eval sumv = if(test = 20 OR test = 13, sumv + 20000, if(test = 4 OR test = 25, sumv - 24000 , sumv)) 
| fields - test 
| eval isOutlier = if(prediction!="" AND 'sumv' !="" AND ('sumv' < 'lower95(prediction)' OR 'sumv' > 'upper95(prediction)'), 1, 0) 
| where isOutlier=1 
| fields - isOutlier 
| rename sumv as "Actual Crashes", prediction as "Predicted Crashes"
| join [| gentimes start=6/18/2019 end=7/18/2019 increment=1d
| eval USERNAME = random()%200
| eval _time = starttime
| timechart span=1d values(USERNAME) AS "Users"]

If this is correct, I think I understand your problem.

Without specifying a field to join on or a join type, the automatically defaults to an inner join and joins on any fields present in both datasets. For some reason, I don't believe the join command is using the "_time" field as default field to look at when joining the two searches. My best guess is that maybe Splunk doesn't automatically look at internal fields when using the join command.

In order to get around this, we can explicitly state that we want to join on the _time field like this:

index=index sourcetype="sourcetype" APPLICATION="app*" 
| timechart span=1d sum(VOLUME) 
| predict "sum(VOLUME)" as prediction algorithm="LLP5" future_timespan="30" holdback="0" period=7 lower"95"=lower"95" upper"95"=upper"95" 
| eval isOutlier = if(prediction!="" AND 'sum(VOLUME)' !="" AND ('sum(VOLUME)' < 'lower95(prediction)' OR 'sum(VOLUME)' > 'upper95(prediction)'), 1, 0) 
| where isOutlier=1 
| fields - isOutlier 
| rename sum(VOLUME) as "Actual Crashes", prediction as "Predicted Crashes" 
| join _time
    [ search index=index sourcetype="index" APPLICATION="app*" 
    | timechart span=1d dc(USERNAME) as "Users"]

Although I would strongly recommend adding a join type= argument to your joins in the future as they can greatly effect your results.

If you would like to know more about join types, check out the documentation on the join command here under the type subheading:
https://docs.splunk.com/Documentation/Splunk/7.3.0/SearchReference/Join

Let me know if you're still having any problems!

View solution in original post

0 Karma

rbechtold
Communicator

Hey Tyler!

I've recreated what I believe your dataset may look like (copy and paste into any Splunk instance):

| gentimes start=6/18/2019 end=7/18/2019 increment=10m 
| eval modifier = random()%5 
| eval VOLUME = random()%400 + 100 
| eval VOLUME = if(modifier = 1 OR modifier = 2, VOLUME * -1, VOLUME) 
| eval _time = starttime 
| timechart span=1d sum(VOLUME) AS sumv 
| predict sumv as prediction algorithm="LLP5" future_timespan=30 holdback=0 period=7 lower95=lower95 upper95=upper95 
| streamstats count as test 
| eval sumv = if(test = 20 OR test = 13, sumv + 20000, if(test = 4 OR test = 25, sumv - 24000 , sumv)) 
| fields - test 
| eval isOutlier = if(prediction!="" AND 'sumv' !="" AND ('sumv' < 'lower95(prediction)' OR 'sumv' > 'upper95(prediction)'), 1, 0) 
| where isOutlier=1 
| fields - isOutlier 
| rename sumv as "Actual Crashes", prediction as "Predicted Crashes"
| join [| gentimes start=6/18/2019 end=7/18/2019 increment=1d
| eval USERNAME = random()%200
| eval _time = starttime
| timechart span=1d values(USERNAME) AS "Users"]

If this is correct, I think I understand your problem.

Without specifying a field to join on or a join type, the automatically defaults to an inner join and joins on any fields present in both datasets. For some reason, I don't believe the join command is using the "_time" field as default field to look at when joining the two searches. My best guess is that maybe Splunk doesn't automatically look at internal fields when using the join command.

In order to get around this, we can explicitly state that we want to join on the _time field like this:

index=index sourcetype="sourcetype" APPLICATION="app*" 
| timechart span=1d sum(VOLUME) 
| predict "sum(VOLUME)" as prediction algorithm="LLP5" future_timespan="30" holdback="0" period=7 lower"95"=lower"95" upper"95"=upper"95" 
| eval isOutlier = if(prediction!="" AND 'sum(VOLUME)' !="" AND ('sum(VOLUME)' < 'lower95(prediction)' OR 'sum(VOLUME)' > 'upper95(prediction)'), 1, 0) 
| where isOutlier=1 
| fields - isOutlier 
| rename sum(VOLUME) as "Actual Crashes", prediction as "Predicted Crashes" 
| join _time
    [ search index=index sourcetype="index" APPLICATION="app*" 
    | timechart span=1d dc(USERNAME) as "Users"]

Although I would strongly recommend adding a join type= argument to your joins in the future as they can greatly effect your results.

If you would like to know more about join types, check out the documentation on the join command here under the type subheading:
https://docs.splunk.com/Documentation/Splunk/7.3.0/SearchReference/Join

Let me know if you're still having any problems!

0 Karma
Get Updates on the Splunk Community!

Maximize the Value from Microsoft Defender with Splunk

<P style=" text-align: center; "><span class="lia-inline-image-display-wrapper lia-image-align-center" ...

This Week's Community Digest - Splunk Community Happenings [6.27.22]

<FONT size="5"><FONT size="5" color="#FF00FF">Get the latest news and updates from the Splunk Community ...