This search is looking back one month over a large dataset. I would like it to be accelerated, and run once a month on its own, but atm it times out.
I have added comments to clarify my thinking. Thanks for any comments!
index="myIndex" sourcetype=mySourcetype source!=*test* (msg=REQ* OR msg=RSP)
>getting total time: I can eliminate this if this causes a lot of overhead
| addinfo
| eval totalSeconds = round(max(info_max_time) - min(info_min_time))
| fields data.CompanyNumber, data.ElapsedSeconds, HostName, data.IsApproved, totalSeconds, _time, cid
| stats values(data.CompanyNumber) as CompanyNumber values(data.ElapsedSeconds) as ElapsedSeconds values(HostName) as HostName values(data.IsApproved) as isApproved max(totalSeconds) as totalSeconds by _time,cid
| eval declines=if(isApproved="false",1,0)
| stats dc(cid) as txnsPerHost avg(ElapsedSeconds) as avgElapsedSeconds max(ElapsedSeconds) as maxElapsedSeconds sum(declines) as declines max(totalSeconds) as totalSeconds by HostName
| eval tps=round((txnsPerHost/totalSeconds),2)
> These next two are obvi just prettyprinting-- can be eliminated if it's too much overhead
| eval avgElapsedSeconds = round(avgElapsedSeconds,2)
| eval maxElapsedSeconds = round(maxElapsedSeconds,2)
I've not renamed anything for fear of overhead-- txnsPerHost as "transactions per host" would be nice
| table HostName txnsPerHost tps avgElapsedSeconds maxElapsedSeconds declines
First, if you want to end up with data by host, you cannot stats
without including the host
in the by
field list. As long as a cid
only occurs on one host
, then it would still work, but it's better practice to break it out. also, values
can incur some overhead, so if you only expect one value, then either include it in the by
clause or use max()
or min()
, first()
or last()
.
Here are my assumptions:
1) cid
is a transaction id of some kind, that uniquely identifies a transaction that takes place on one Hostname
, for one CompanyNumber
, and has one IsApproved
result.
2) You aren't using _time
for anything except your initial aggregation, so it has no other significance to you.
3) You aren't using isApproved
for anything except calculating the number of declines.
Try this over a very short period, and see if it produces the expected counts...
index="myIndex" sourcetype=mySourcetype source!=test (msg=REQ* OR msg=RSP)
| fields data.CompanyNumber, data.ElapsedSeconds, HostName, data.IsApproved, cid
| stats
max(data.ElapsedSeconds) as ElapsedSeconds,
max(eval(if(data.IsApproved="false",1,0)) as declines,
by cid, Hostname, data.CompanyNumber
| stats dc(cid) as txnsPerHost
avg(ElapsedSeconds) as avgElapsedSeconds
max(ElapsedSeconds) as maxElapsedSeconds
sum(declines) as declines
by HostName
| addinfo
| eval totalSeconds = round(max(info_max_time) - min(info_min_time))
| eval tps=round((txnsPerHost/totalSeconds),2)
The following version of this search runs over 7-10 days successfully. I want to make a summary index instead, so I can reliably get a month's data. Am I doing this correctly?
index="myIndex" sourcetype=mySourcetype (msg=REQ* OR msg=RSP) (source="E:\\Logs\\Path-*" source!="E:\\Logs\\Path-Test*") (data.HostType<1000 AND data.HostType!=0)
| fields data.CompanyNumber, data.ElapsedSeconds, HostName, data.IsApproved, cid, msg
| stats
max(data.ElapsedSeconds) as ElapsedSeconds,
max(eval(if('data.IsApproved'="false",1,0))) as declines,
by cid, HostName
| stats dc(cid) as txnsPerHost
avg(ElapsedSeconds) as avgElapsedSeconds
max(ElapsedSeconds) as maxElapsedSeconds
sum(declines) as declines
by HostName
| addinfo
| eval totalSeconds = round(info_max_time - info_min_time)
| eval tps=round((txnsPerHost/totalSeconds),2)
| eval avgElapsedSeconds = round(avgElapsedSeconds,2)
| eval maxElapsedSeconds = round(maxElapsedSeconds,2)
| fields HostName txnsPerHost tps avgElapsedSeconds maxElapsedSeconds declines
| sort 0 -txnsPerHost
Summary creating search:
index="myIndex" sourcetype=mySourcetype (msg=REQ* OR msg=RSP) (source="E:\\Logs\\Path-*" source!="E:\\Logs\\Path-Test*") (data.HostType<1000 AND data.HostType!=0)
| fields data.CompanyNumber, data.ElapsedSeconds, HostName, data.IsApproved, cid, msg
| stats
max(data.ElapsedSeconds) as ElapsedSeconds,
max(eval(if('data.IsApproved'="false",1,0))) as declines,
by cid, HostName
| collect index=mySummaryIndex
Search using the summary:
index=mySummaryIndex
| stats dc(cid) as txnsPerHost
avg(ElapsedSeconds) as avgElapsedSeconds
max(ElapsedSeconds) as maxElapsedSeconds
sum(declines) as declines
by HostName
| addinfo
| eval totalSeconds = round(info_max_time - info_min_time)
| eval tps=round((txnsPerHost/totalSeconds),2)
| eval avgElapsedSeconds = round(avgElapsedSeconds,2)
| eval maxElapsedSeconds = round(maxElapsedSeconds,2)
| fields HostName txnsPerHost tps avgElapsedSeconds maxElapsedSeconds declines
| sort 0 -txnsPerHost
Do post a screenshot of your job inspector.
First, if you want to end up with data by host, you cannot stats
without including the host
in the by
field list. As long as a cid
only occurs on one host
, then it would still work, but it's better practice to break it out. also, values
can incur some overhead, so if you only expect one value, then either include it in the by
clause or use max()
or min()
, first()
or last()
.
Here are my assumptions:
1) cid
is a transaction id of some kind, that uniquely identifies a transaction that takes place on one Hostname
, for one CompanyNumber
, and has one IsApproved
result.
2) You aren't using _time
for anything except your initial aggregation, so it has no other significance to you.
3) You aren't using isApproved
for anything except calculating the number of declines.
Try this over a very short period, and see if it produces the expected counts...
index="myIndex" sourcetype=mySourcetype source!=test (msg=REQ* OR msg=RSP)
| fields data.CompanyNumber, data.ElapsedSeconds, HostName, data.IsApproved, cid
| stats
max(data.ElapsedSeconds) as ElapsedSeconds,
max(eval(if(data.IsApproved="false",1,0)) as declines,
by cid, Hostname, data.CompanyNumber
| stats dc(cid) as txnsPerHost
avg(ElapsedSeconds) as avgElapsedSeconds
max(ElapsedSeconds) as maxElapsedSeconds
sum(declines) as declines
by HostName
| addinfo
| eval totalSeconds = round(max(info_max_time) - min(info_min_time))
| eval tps=round((txnsPerHost/totalSeconds),2)
Your two job inspector images are missing a crucial part in between, where command.stats
lives. That's probably taking up a big chunk of time.
Small improvement: Drop data.CompanyNumber, totalSeconds, msg
from the initial fields
... in fact, you can drop the initial fields
altogether and let Splunk figure out what fields it needs to load based on the first stats
.
Potential small improvement: Assuming data.isApproved
only contains "true"
or "false"
, speed up your first stats
by replacing max(eval(if(...)))
with min('data.isApproved') as isApproved
and translating true/false to 1/0 once per cid after the first stats.
Potential big improvement: Why are you splitting the first stats
by _time
? I see no reason to, and if there are many _time
values per cid
then this will make the first stats
run extremely slow if it crosses the threshold for in-memory sorting, or if large thresholds for in-memory sorting blow your available memory.
Thanks to @DalJeanis I came up with the following version of the search-- runs much faster for a while, but still seems to die before all the values come back (specifically, returns incompletely for tps)
index="myIndex" sourcetype=mySourcetype (source="pathToSource*" source!="pathToSource-Test*") (msg=REQ* OR msg=RSP)
| fields data.CompanyNumber, data.ElapsedSeconds, HostName, data.IsApproved, totalSeconds, _time, cid, msg
| stats
max(data.ElapsedSeconds) as ElapsedSeconds,
max(eval(if('data.IsApproved'="false",1,0))) as declines,
by cid,_time, HostName
| stats dc(cid) as txnsPerHost
avg(ElapsedSeconds) as avgElapsedSeconds
max(ElapsedSeconds) as maxElapsedSeconds
sum(declines) as declines
by HostName
| addinfo
| eval totalSeconds = round(max(info_max_time) - min(info_min_time))
| eval tps=round((txnsPerHost/totalSeconds),2)
| eval avgElapsedSeconds = round(avgElapsedSeconds,2)
| eval maxElapsedSeconds = round(maxElapsedSeconds,2)
| fields HostName txnsPerHost tps avgElapsedSeconds maxElapsedSeconds declines
Current status of JI:
![alt text][https://imgur.com/a/24aEuwI]
At this point, one of my SH's will max out memory, and 2 others will "complete" the search but the TPS column is not fully populated ( a bunch of 0.00s)