Hi,
I'm working with a large amount of data.
I wrote a main report that extracts all events (let's call them events A,B,C,D) from the last 30 days and do some manipulations for fields.
And then i wrote 5 reports that filter the main saved report by events type and get only the relevant fields for each event:
For example- the report for event A contain all fields relevant for event A,
report for event B contains all fields relevant for event B and etc.
My dashboard contains 5 tabs, one for each event (tab 1 for report A, tab 2 for report B,..), and triggers the relevant saved search report (reports A/B/C,..)
Problems- all the reports run very slow
My questions:
1. How to read only delta data each time? i mean, how to not read 30 days each time at once, if the query was already run today and i execute it one more time it should read only new data and use the history data that have already read in the previous run.
2. i read a bit about summary index. my reports extract all fields and not aggregate data. how to create my 6 reports (main+5 others) with summary index? As i said, - i use table command and not functions like top,count,.. in my query (my reports just extract relevant fields with some naming manipulations)
* in case that you would recommend to use summary index i will appreciate if you could provide me example code, because i have 6 reports and not sure how work with summary index
thanks,
Maayan
Hi @maayan,
yes the solutions could be summary indexes or data models.
In bothe cases, you have to schedule a search, e.g. for report 1:
index=index1
| table _time field 1 field2, field3
| collect index=summary1
the frequency dependa on your requirements.
then you can run a search on this index, e.g. calculate sum of field 2 for each field1:
index=summary1
| stats sum(field2) AS field2_sum BY field1
Ciao.
Giuseppe
ok, I will try,thanks!
And regarding my first question, is it something that I can do in Splunk? (read delta data)
And which method do you recommend to use in my case? data model or summary index?
thanks
Hi, i tried to implement the summary index as you suggested but i had a problem to extract the original fields from the main query. i read that i might use stats and stats. i posted a new post. maybe you can help. thanks
Hi @maayan,
yes, you can calculate delta, global and partial sum, etc...
the main job is building the scheduled search to extract the requested data.
in my opinion, I'd use summary index, scheduling the population search with the frequency you need (e.g. every month or every night.
Ciao.
Giuseppe
Thanks Gcusello!
Can you explain more about: "you can calculate delta, global and partial sum, etc..." ?
I didn't find documentation and also asked in other communities and nobody knows.
Hi @maayan,
it all depends on the data you have (that I don't know), so e.g. if field1 is the hostname and field2 is the CPU utilization, you save with the scheduled search the CPU utilization min, max and avg day by day.
index=index1
| stats
min(CPU) AS min_CPU
max(CPU) AS max_CPU
avg(CPU) AS avg_CPU
BY host
| collect index=summary1
then you can calculate (using the normal commands as stats or timechart) the max, the avg and the min in a month
index=summary1
| stats
min(min_CPU) AS min_CPU
max(max_CPU) AS max_CPU
avg(avg_CPU) AS avg_CPU
BY host
As I said, it depends on the data that you added to you summary index.
Ciao.
Giuseppe