Reporting

Can you create an accelerated report that doesn't include the _time field, but can then also be used in a timechart wit?

thisissplunk
Builder

I have a simple accelerated report that looks like this:

 

index=hosts
| stats count by hostname ip

 

I now want to dashboard that but use a timechart. However, timechart doesn't work because _time wasn't included in the stats count by section.

index=hosts
| stats count by hostname ip
| timechart span=1d count

I can't just add _time to the stats section in the accelerated report because it increase the amount of rows 1000x fold. I assume creating the accelerated report with timechart will cause the same 1000x issue.

So is there a way around this with accelerated reports that won't cause my accelerated summary to grow 1000x? How an I access the _time?

I'm feeling like it's not possible and that this is a better scenario for traditional summary indexing. Am I wrong?

Labels (1)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

In your initial search, you are getting a table with hostname and ip.

What is your time range for the search and what should _time represent if it is to store that time as

_time hostname ip

If your search is only spanning 1d, then adding _time to the results will not add any rows to the data, but you say adding _time will increase the number of rows 1000 fold.

If you are searching more than 1 day, what time do you want to be stored?

If you just want the search time to be added to the data, you can just do

index=hosts
| stats count by hostname ip
| eval _time=now()

which will add the same time value to every row if your results, but is that what you want?

View solution in original post

bowesmana
SplunkTrust
SplunkTrust

In your initial search, you are getting a table with hostname and ip.

What is your time range for the search and what should _time represent if it is to store that time as

_time hostname ip

If your search is only spanning 1d, then adding _time to the results will not add any rows to the data, but you say adding _time will increase the number of rows 1000 fold.

If you are searching more than 1 day, what time do you want to be stored?

If you just want the search time to be added to the data, you can just do

index=hosts
| stats count by hostname ip
| eval _time=now()

which will add the same time value to every row if your results, but is that what you want?

thisissplunk
Builder

Let's say the timeframe is 24h.

So let's say that "stats count by hostname ip" would give me 10 rows: 10 unique hostnames with their IPs

Now if I do "stats count by hostname ip _time" I will end up with 100,000 rows because there are multiple entries coming in for each hostname + ip combo over the day.

The whole point of summarizing or accelerating this search was to hold less data on disk, longer. Speed was a nice addition too.

What I'm thinking I can add is "stats earliest(_time) as earliestTime" to give all of the (low amount) of results an initial timestamp. However, if I run this every minute or so, it's not very different than "stats count by ... _time"... I think. 

Unrelated, but does it matter what my schedule and timespan is in the scheduled report? Should I run it every minute? Every day? Does it matter like it does for summary indexing?

0 Karma

bowesmana
SplunkTrust
SplunkTrust

You just need to use the 'bin' command to bin by time, so your 24 hour time range is just showing a time of 1 day

index=hosts
| bin _time span=1d
| stats count by _time hostname ip

in this case you will get your 10 rows with the date (no time because you have binned by 1d)

NB:  that you if your search time range is -24h then it will have two days as it will have yesterday's timed occurrences as yesterday's date and today's as today, so you should be running the search the time window set to yesterday, or 

earliest=-d@d latest=@d

 

thisissplunk
Builder

Is there a way to summarize data to an index and have it be up to date as of the last minute, while deduping itself as we move forward? For the sake of lower size on disk and speed?

As far as I understand, if I want to use dedup to lower the overall data, I'm going to need to run the summary index saved report a couple times a day instead of every minute. Meaning it will always be behind.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

The only way to have the summary index update to the last minute is to run the summary every minute. 

What is your goal and what outputs do you want to show?

Are you looking to show a rolling 24 hour window of counts rather than specific day based. Are you looking to show finer grained output than 1 day or 24 hours, i.e. if you want up to the minute liveliness, do you want to report on that data at 1 minute granularity.

How important is disk/speed of search in the equation, i.e. what volumes of data are you talking about - do you have millions of hosts/ips per day or just a handful? High or low cardinality of hosts/ips to the number of events?

Is disk space really an issue?

You can always run the saved search with more frequency than 12 hours, but less than 1 minute and then in your reporting search combine the results of data from the summary and data from the main index to get your 1 minute liveliness, e.g.

search_from_summary earliest=-24h@h latest=@h
| append [
  | search_from_main earliest=@h latest=now
  | stats count by hostname ip
]
| stats count by hostname ip

Insert any bin options you may want for _time in there

Append is never the best option, as it's not so efficient and will always force the search to run on the search head, but this would work.

There's almost always a solution to everything in Splunk, but the best one will depend on what your main goals are.

thisissplunk
Builder

I've had it set to 1 minute just to generate data constantly while I test. Ultimately, it will probably be around 4h time chunks which will also help with speed because we can dedup more events with each feed into the summary index.

So yeah, our requirements are:

  1. Save data for a year (vs 3 months on the normal index)
  2. Make querying a year fast vs the full data on the normal index
  3. Have it be as up to date as possible (100% up to date if possible, hence 1 minute runs right now)
  4. Nice to have is self repairing like accelerated reports do

What I'd really want is an accelerated report summary that also happens to stick around longer than the actual indexed data does. Because that would only hold specific data, be self repairing,  be 100% accurate and be fast.

That said, summary indexing is our only option as accelerated report data ages off with the index age off as I'm told.

That stinks because I really liked the speed and self healing of accelerated reports. If we accelerate teh summary index we can help with speed, but we will never get the self healing and possibly drop data over time.

What happens if we run a summary index feeding report every 4 hours but look back 48 hours? Is it smart enough to not replicate data already in the summary index or not?

0 Karma

bowesmana
SplunkTrust
SplunkTrust

I am assuming the historical data cannot change, so after 3 months, the data from 3-12 months is forever static, so another option is to manually (via search) roll the data from the accelerated report to a lookup (file or kv store) at 90 days. 

So each day, this search would run and search the time period -90d@d --> -89d@d and add this to the lookup file or kvstore. Again, much depends on your data volume as to whether this is practical and of course will make searches more complex and add to things like search head replication or kv store replication.

You could also use a summary index, dealing with the 3-12 month aged data.

Gives you the best of both worlds, but with added complexity. YMMV.

 

 

 

 

0 Karma

thisissplunk
Builder

I think we'll just summary index the data immediately and then accelerate that, if that's possible.

I still don't understand the difference between running a saved search to fill a summary index every 4 hours and -4h back, vs every 4 hours and -48h back. Does the second one take up more space or is the data somehow deduped? I assume it's the former, but we be great if it was the latter for filling in missing data due to downtime.

0 Karma
Get Updates on the Splunk Community!

Improve Your Security Posture

Watch NowImprove Your Security PostureCustomers are at the center of everything we do at Splunk and security ...

Maximize the Value from Microsoft Defender with Splunk

 Watch NowJoin Splunk and Sens Consulting for this Security Edition Tech TalkWho should attend:  Security ...

This Week's Community Digest - Splunk Community Happenings [6.27.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...