I have market data feed indexing into splunk.
The logs look like following -
Security: "HDFC", FIELDS: {"PRICE", "ASK", "HIGH"}, receivedTime: <time-string>
Security "YESBANK", FIELDS= {"PRICE", "HIGH"}, receivedTime: <time-string>
Security: "HDFC", FIELDS: {"ASK", "HIGH"}, receivedTime: <time-string>
Security: "HDFC", FIELDS: {"PRICE"}, receivedTime: <time-string>
Security: a single value filed
FIELDS: a multi value field
receivedTime: sting, can be different from _time
We want to calculate the SECUIRTY:FIELD pairs that are logging less frequency than their usual input frequency.
so, for a SECURITY:FIELD pair -
diff_time = recievedTime (previous) - receivedTime (current)
this diff time varies from each SECURITY:FIELD pair from other. Some log in every second, others log in only once a day.
The challenge is to come up with an alert/alerts that dynamically calculates the optimum frequency (diff_time) for each SECURITY:FIELD pair and then compares it with it's current value.
Now let's say we assume that an optimum frequency will be the average of last 7 frequency of inputs of same SECURITY:FIELD pair.
In order calculate this value I will have to run the query for last 7 days (cause some log only once a day), and with large amount of data and use of mvexpand command, this is not viable.
How do you suggest I achieve this goal? Please suggest an algorithm for it.
Hi @iparitosh,
Your algorithm should be something like this :
1- Fetch all the data you need --> index=yourindex sourcetype=yoursourcetype filter=yourfilter
2- Make sure your multi value field is extracted, either via props/transforms or using rex
command the max_match
option.
more info here https://docs.splunk.com/Documentation/Splunk/7.2.6/SearchReference/Rex
3- To avoid using mvexpand
for that multi-value field run a stats command to convert your data into tabular form :
...|stats values(requiredFields) as requiredFields by SECURITY,FIELD,RECEIVEDTIME
4- Once you have that table use it for calculating the delta and frequency, shouldn't be too resource intensive at this point anymore.
Cheers,
David
you probably need to use a dynamic outlier model. try using this - https://docs.splunk.com/Documentation/MLApp/4.2.0/User/DNOlegacyassist
Thank you for your response. I am reading more about it to check if it cam solve my problem.