Deployment Architecture

How to find low variation in a field which has unique values?

howyagoin
Contributor

Hi,

I've got a field in some sales data which is a serial number. I'd like to look for "clusters" of serial numbers sold in the data to determine if a large number of nearly sequential serial numbers were sold - I'm aware of the "cluster" command, but that's not quite telling me what I am looking for.

Assume that each event logged has a time/date of sale, a product name, maybe the customer's name, and the serial number of the product sold.

During the average day, the serial number of products sold would be pretty distributed across time. What I'm trying to find is if someone came in and bought, say, a dozen hair dryers, all of which happen to have serial numbers more or less near one another in value. All of the serial numbers are simple integers, such as 7366391743, 7366391745, 7366391749, 7366391755, and so on.

I can't rely upon a small time window, I need to look at the data over a 24 hour period, typically, but it seems to me there must be a way to find events where the variation in the value of the serial number is relatively low. Nearly sequential, for example, but not quite. So a value of 2865549318 would not show up from the list of above, but the other four would...

Thanks for any thoughts on this...

0 Karma

sundareshr
Legend

Try this

index=foo sourcetype=bar earliest=@d | streamstats window=1 current=f values(serialnum) as next_sno by productname | where abs(serialnum-next_sno)<=10 | table _time productname serialnum
0 Karma

inventsekar
Super Champion

regarding the variations, not able to find a good solution at this time.

lets check this idea... Update previously i thought rex, but regex is the correct idea.
maybe, we use regex to match on "serial-number", lets say, first 8 digits (7366391743,
*
7366391745, ***7366391749, *73663917**55)
and list down the events.. that will approximately give us the time-frame and the events.

assuming logs with this format
SerialNumber=7366391743,
SerialNumber=7366391745,
SerialNumber=7366391749,
SerialNumber=7366391755

your base search | regex SerialNumber="(\d{8})"

second thought -
you could use delta command and find out the differences between serial numbers -

your base search | delta SerialNumber as SerialNumberVariation p=1 | where SerialNumberVariation < 10 | table _time _raw SerialNumberVariation

i tested this for windows event log EventCodes and its working fine -

sourcetype="WinEventLog:System" | delta EventCode as EventCodeDifference p=1 | where EventCodeDifference < 5 | table _raw EventCode EventCodeDifference

alt text

0 Karma
Get Updates on the Splunk Community!

.conf23 | Get Your Cybersecurity Defense Analyst Certification in Vegas

We’re excited to announce a new Splunk certification exam being released at .conf23! If you’re going to Las ...

Streamline Data Ingestion With Deployment Server Essentials

REGISTER NOW!Every day the list of sources Admins are responsible for gets bigger and bigger, often making the ...

Remediate Threats Faster and Simplify Investigations With Splunk Enterprise Security ...

REGISTER NOW!Join us for a Tech Talk around our latest release of Splunk Enterprise Security 7.2! We’ll walk ...