Deployment Architecture

How to find low variation in a field which has unique values?

howyagoin
Contributor

Hi,

I've got a field in some sales data which is a serial number. I'd like to look for "clusters" of serial numbers sold in the data to determine if a large number of nearly sequential serial numbers were sold - I'm aware of the "cluster" command, but that's not quite telling me what I am looking for.

Assume that each event logged has a time/date of sale, a product name, maybe the customer's name, and the serial number of the product sold.

During the average day, the serial number of products sold would be pretty distributed across time. What I'm trying to find is if someone came in and bought, say, a dozen hair dryers, all of which happen to have serial numbers more or less near one another in value. All of the serial numbers are simple integers, such as 7366391743, 7366391745, 7366391749, 7366391755, and so on.

I can't rely upon a small time window, I need to look at the data over a 24 hour period, typically, but it seems to me there must be a way to find events where the variation in the value of the serial number is relatively low. Nearly sequential, for example, but not quite. So a value of 2865549318 would not show up from the list of above, but the other four would...

Thanks for any thoughts on this...

0 Karma

sundareshr
Legend

Try this

index=foo sourcetype=bar earliest=@d | streamstats window=1 current=f values(serialnum) as next_sno by productname | where abs(serialnum-next_sno)<=10 | table _time productname serialnum
0 Karma

inventsekar
SplunkTrust
SplunkTrust

regarding the variations, not able to find a good solution at this time.

lets check this idea... Update previously i thought rex, but regex is the correct idea.
maybe, we use regex to match on "serial-number", lets say, first 8 digits (7366391743,
*
7366391745, ***7366391749, *73663917**55)
and list down the events.. that will approximately give us the time-frame and the events.

assuming logs with this format
SerialNumber=7366391743,
SerialNumber=7366391745,
SerialNumber=7366391749,
SerialNumber=7366391755

your base search | regex SerialNumber="(\d{8})"

second thought -
you could use delta command and find out the differences between serial numbers -

your base search | delta SerialNumber as SerialNumberVariation p=1 | where SerialNumberVariation < 10 | table _time _raw SerialNumberVariation

i tested this for windows event log EventCodes and its working fine -

sourcetype="WinEventLog:System" | delta EventCode as EventCodeDifference p=1 | where EventCodeDifference < 5 | table _raw EventCode EventCodeDifference

alt text

thanks and best regards,
Sekar

PS - If this or any post helped you in any way, pls consider upvoting, thanks for reading !
0 Karma
Get Updates on the Splunk Community!

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...