Other Usage

Summary index for non aggregated data: How to read only delta data each time?

maayan
Path Finder

Hi,

I'm working with a large amount of data.
I wrote a main report that extracts all events (let's call them events A,B,C,D) from the last 30 days and do some manipulations for fields.
And then i wrote 5 reports that filter the main saved report by events type and get only the relevant fields for each event:
For example- the report for event A contain all fields relevant for event A, 
report for event B contains all fields relevant for event B and etc. 

My dashboard contains 5 tabs, one for each event (tab 1 for report A, tab 2 for report B,..), and triggers the relevant saved search report (reports A/B/C,..)

Problems- all the reports run very slow

My questions:
1. How to read only delta data each time? i mean, how to not read 30 days each time at once, if the query was already run today and i execute it one more time it should read only new data and use the history data that have already read in the previous run.

2. i read a bit about summary index. my reports extract all fields and not aggregate data. how to create my 6 reports (main+5 others) with summary index? As i said, - i use table command and not functions like top,count,.. in my query (my reports just extract relevant fields with some naming manipulations)

* in case that you would recommend to use summary index i will appreciate if you could provide me example code, because i have 6 reports and not sure how work with summary index 

thanks,
Maayan

Labels (1)
Tags (1)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @maayan,

yes the solutions could be summary indexes or data models.

In bothe cases, you have to schedule a search, e.g. for  report 1:

index=index1
| table _time field 1 field2, field3
| collect index=summary1

the frequency dependa on your requirements.

then you can run a search on this index, e.g. calculate sum of field 2 for each field1:

index=summary1
| stats sum(field2) AS field2_sum BY field1

Ciao.

Giuseppe

maayan
Path Finder

ok, I will try,thanks!

And regarding my first question, is it something that I can do in Splunk? (read delta data)

And which method do you recommend to use in my case? data model or summary index?

thanks

0 Karma

maayan
Path Finder

Hi, i tried to implement the summary index as you suggested but i had a problem to extract the original fields from the main query. i read that i might use stats and stats. i posted a new post. maybe you can help. thanks

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @maayan,

yes, you can calculate delta, global and partial sum, etc...

the main job is building the scheduled search to extract the requested data.

in my opinion, I'd use summary index, scheduling the population search with the frequency you need (e.g. every month or every night.

Ciao.

Giuseppe

0 Karma

maayan
Path Finder

Thanks Gcusello!
Can you explain more about: "you can calculate delta, global and partial sum, etc..."  ?
I didn't find documentation and also asked in other communities and nobody knows.

 

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @maayan,

it all depends on the data you have (that I don't know), so e.g. if field1 is the hostname and field2 is the CPU utilization, you save with the scheduled search the CPU utilization min, max and avg day by day.

index=index1
| stats 
   min(CPU) AS min_CPU 
   max(CPU) AS max_CPU 
   avg(CPU) AS avg_CPU 
   BY host
| collect index=summary1

then you can calculate (using the normal commands as stats or timechart) the max, the avg and the min in a month  

index=summary1
| stats 
   min(min_CPU) AS min_CPU 
   max(max_CPU) AS max_CPU 
   avg(avg_CPU) AS avg_CPU 
   BY host

As I said, it depends on the data that you added to you summary index.

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...

Observability protocols to know about

Observability protocols define the specifications or formats for collecting, encoding, transporting, and ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...