Splunk Search

How to search the count of adds/removes (new hosts vs host decoms) month on month?

smudge797
Path Finder

What I want is to many adds/removes (new hosts vs host decoms) month on month

index=* | stats dc(Host_Name) by date_month 

So take Jan ... there are 160226 distinct hosts
Feb number is 162359
That could be something like 190 hosts were removed and 2323 were added in Feb

Thanks

Tags (3)
0 Karma

woodcock
Esteemed Legend

First of all, DO NOT use the "free" (but wrong) date_* values; calculate your own or use | bucket _time span=1mon.

You can only calculate changes in distinctness by comparing each entire list against other lists (not by subtracting dc values) so I would use a monthly Summary Index to store a list of distinct hosts and then build your search to pull from that aggregate store. You can use sistats to help you with this:

http://docs.splunk.com/Documentation/Splunk/6.0.4/SearchReference/Sistats

0 Karma

smudge797
Path Finder

Hi Greg,
Still struggling with this can you provide anymore instructions? The data is summarized but im struggling with the query.

| bucket _time span=1mon | sistats dc(Host_Name) by _time

Thanks

0 Karma

woodcock
Esteemed Legend

That should be pretty much it. What problem are you having, exactly? What is your SI-populating search?

0 Karma

smudge797
Path Finder

Here is the SI search:

... source=*.csv | eval IMT = substr(Cost,2)| convert rmcomma(IMT) | eval Total=round(IMT,2) | stats sum("Total") as Cost by Date "Cost Center" "Org L5" "Org L6" Org_Description IIGL1 IIGL2 IIGL3 IIGL4 Product Feed_Name
0 Karma

woodcock
Esteemed Legend

I do not see sistats in your search and this search looks nothing like your original search.. If you are planning to us the dc function coming back out of your SI, you really need to use sistats (like I said in my answer) because it does all the crazy magic for you:

http://docs.splunk.com/Documentation/Splunk/6.4.1/SearchReference/sistats

 index=* | sistats dc(Host_Name) by date_month 

index=YourSummaryIndex | stats dc(Host_Name) by date_month

That is the answer for THIS question. However it appears that your needs have morphed completely away from the original question so perhaps you should start another question.

0 Karma

woodcock
Esteemed Legend

do you just need the delta between the columns or do you need an accurate number? If you need accuracy, then you need to track the distinct values of Host_Name kept for each column and you need to compare these sets. The former is pretty easy, the latter, depending on the order of magnitude of your sets, much less so.

0 Karma

smudge797
Path Finder

an accurate number so movement + or - is tracked.

0 Karma

sundareshr
Legend

Try this

index=* earliest=-1mon@mon | eval when=if(_time<relative_time(now(), "@mon"), "Prev", "Current")  | chart dc(host) as h over host by when | eval action=case(Current>Prev, "Added", Current<Prev, "Removed", 1=1, "No Change") | stats count by action
0 Karma

Richfez
SplunkTrust
SplunkTrust

Try the following:

index=* | timechart span=1m dc(Host_Name) as Count_Of_Hosts 
| streamstats window=2 last(Count_Of_Hosts) AS Last, first(Count_Of_Hosts) AS First 
| eval Delta=Last-First

So, we have our base search then feed that to timechart splitting things by month. You can TOTALLY use bin+stats, which would be more efficient, but I think as an example this is more clear yet is only slightly slower.

That goes into streamstats which takes those events two at a time ( window=2 ) and sets Last and First to the count from the existing event and the preceding event. This makes the last line a simple subtraction on each event.

As it stands, all those fields will be output, so you can see what it does yourself and confirm. If you don't need those extra fields, you can always | table ... the ones you want or | fields - ... to get rid of the ones you don't want.

For reference in case I fat fingered the changes to fit your own use case better, here's the test I ran on my laptop (which has only local logs and miscellaneous stuff I'm testing for answers):

index=* | timechart span=1w dc(eventtype) as EventTypes | streamstats window=2 last(EventTypes) AS Last, first(EventTypes) AS First | eval Delta=Last-First
0 Karma

smudge797
Path Finder

This seems to be getting there!

| timechart span=1m dc(Host_Name) as Count_Of_Hosts 
 | streamstats window=2 last(Count_Of_Hosts) AS Last, first(Count_Of_Hosts) AS First 
 | eval Delta=Last-First

I'm seeing this in the UI

The specified span would result in too many (>50000) rows.

Could be faster without span and timechart? Maybe just stats?
Thanks!

0 Karma

smudge797
Path Finder

Also if I add 1 host and remove another host in a month, the stats will be the same and the delta zero but we had movement...

0 Karma

smudge797
Path Finder

Hi Rich,
What do you think? Is there away to show that movement of host?

Cheers

0 Karma

somesoni2
Revered Legend

Assuming you're just comparing host count for (last) two consecutive month, give this a shot

(Based on your example query, you're not using splunk's default host field, so this will be slower)
Updated

index=* earliest=-2mon@mon latest=@mon | stats values(date_month) as month by host | eval Type=case(mvcount(month)=1 AND month=relative_time(now(),"-2mon@mon"),"Removed",mvcount(month)=1 AND month=relative_time(now(),"@mon"),"Added",1=1,"Same" ) | chart count by Type
0 Karma

smudge797
Path Finder

Error in 'eval' command: The expression is malformed. Expected ).

0 Karma
Get Updates on the Splunk Community!

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...