Solved: Re: Finding devices that are "off" at a date, and ...

ALevin123 · ‎08-27-2016

Hi,

I'm a first time splunk user trying to figure out how to do the following:
I have data describing devices, the devices are either "on" or "off" on any given day. I want to search for the ID's of the devices that are "off" on a specific day (let's call them my "new_set") and then track those (and only those) devices over time and output a count of how many of those devices in new_set are "on" and how many are "off" on each subsequent day.

Illustration, on day 0 I might find 200,000 devices that are "off" out of a total of 500,000 devices; then I want to count on each subsequent day how many of those specific 200,000 devices are "on" and how many are "off". It may look like this
Day 0: 200,000 off; 300,000 on
Then, I find the ID's of those 200,000 and call this new_set and then track only them ( I no longer care about the devices that are not in "new_set"):
Day 0: 200,000 off; 0 on
Day 1: 199,000 off: 1,000 on
Day 2: 197,000 off: 3,000 on
Day n: .....

I have tried using a Join command (inner), which eventually works, however the search is very slow / inefficient and I think there must be a more efficient way?

Does anyone have some recommendations or thoughts?

sundareshr · ‎08-28-2016

I think you have couple of option

*Option 1 (preferred/more efficient)*
Get a list of all devices off on Day 0 and store them in a lookup csv (http://docs.splunk.com/Documentation/Splunk/6.4.3/SearchTutorial/Usefieldlookups). Your search for this will look something like this

index=xyz earliest=day0 latest=day0 state="off" | stats count by id | fields - count | outputlookup listofdevicesoffonday0.csv

Once you have this file, you can use it to filter, like this

index=xyz earliest=day1 [| inputlookup listofdevicesoffonday0.csv ] | timechart span=1d count by state | fields - on

*Option 2, less efficient*
Use sub-search (http://docs.splunk.com/Documentation/Splunk/6.4.3/SearchTutorial/Useasubsearch) to get the filter list. So your search will look like this

index=xyz earliest=day1 [search  index=xyz earliest=day0 latest=day0 state="off" | stats count by id | table id ] | timechart span=1d count by state | fields - on

View solution in original post

sundareshr · ‎08-28-2016

I think you have couple of option

*Option 1 (preferred/more efficient)*
Get a list of all devices off on Day 0 and store them in a lookup csv (http://docs.splunk.com/Documentation/Splunk/6.4.3/SearchTutorial/Usefieldlookups). Your search for this will look something like this

index=xyz earliest=day0 latest=day0 state="off" | stats count by id | fields - count | outputlookup listofdevicesoffonday0.csv

Once you have this file, you can use it to filter, like this

index=xyz earliest=day1 [| inputlookup listofdevicesoffonday0.csv ] | timechart span=1d count by state | fields - on

*Option 2, less efficient*
Use sub-search (http://docs.splunk.com/Documentation/Splunk/6.4.3/SearchTutorial/Useasubsearch) to get the filter list. So your search will look like this

index=xyz earliest=day1 [search  index=xyz earliest=day0 latest=day0 state="off" | stats count by id | table id ] | timechart span=1d count by state | fields - on

ALevin123 · ‎08-28-2016

Hey,

Thank you for the help, that definitely got me further. However, it appears that my output is giving me the total "On"/"Off" counts from the original population ( the 500,000 in the above example) rather than from the sample of "new_set" (where "new_set" are those devices that were "off" on day=0 i.e. the 200,000). Any idea what I'm doing wrong?

Also, in my data there appear to be duplicates of the device ID's - so I would need to remove duplicates, would I use a "dedup" command and if so, where in the query?

sundareshr · ‎08-28-2016

Can you share your query

ALevin123 · ‎08-28-2016

Thank you!

I changed some of your recommendations; for example, timechart was giving me "0" for weekend days, so I used the bin and sort instead. And "fields - on" wasn't giving me the comparison I needed of seeing "off" and "on" together for a day.

sundareshr · ‎08-28-2016

To limit your search, you need at add 'earliest=' in your base search. So in your example you should add day1 as the earliest OR timestamp>day0

Also, I notice you have two subsearches, not sure you need both. Combine them for faster searches

To eliminate duplicates, you can do dedup, or stats dc(id) as distinc_ids

ALevin123 · ‎08-28-2016

Can I ask why you recommended field lookup.csv rather than a KV store in this scenario?

sundareshr · ‎08-28-2016

KV store will work as well. .csv is the first thing that came to mind.

If this helped you with your query, please mark it as answered to close it out.

ALevin123 · ‎08-28-2016

Thank you.

ALevin123 · ‎08-28-2016

Thank you, appreciate your help.

Finding devices that are "off" at a date, and then counting how many of them turn "on" on each subsequent day

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

.conf24 | Session Scheduler is Live!!

Introducing the Splunk Community Dashboard Challenge!