Deployment Architecture

How to find low variation in a field which has unique values?

howyagoin
Contributor

Hi,

I've got a field in some sales data which is a serial number. I'd like to look for "clusters" of serial numbers sold in the data to determine if a large number of nearly sequential serial numbers were sold - I'm aware of the "cluster" command, but that's not quite telling me what I am looking for.

Assume that each event logged has a time/date of sale, a product name, maybe the customer's name, and the serial number of the product sold.

During the average day, the serial number of products sold would be pretty distributed across time. What I'm trying to find is if someone came in and bought, say, a dozen hair dryers, all of which happen to have serial numbers more or less near one another in value. All of the serial numbers are simple integers, such as 7366391743, 7366391745, 7366391749, 7366391755, and so on.

I can't rely upon a small time window, I need to look at the data over a 24 hour period, typically, but it seems to me there must be a way to find events where the variation in the value of the serial number is relatively low. Nearly sequential, for example, but not quite. So a value of 2865549318 would not show up from the list of above, but the other four would...

Thanks for any thoughts on this...

0 Karma

sundareshr
Legend

Try this

index=foo sourcetype=bar earliest=@d | streamstats window=1 current=f values(serialnum) as next_sno by productname | where abs(serialnum-next_sno)<=10 | table _time productname serialnum
0 Karma

inventsekar
SplunkTrust
SplunkTrust

regarding the variations, not able to find a good solution at this time.

lets check this idea... Update previously i thought rex, but regex is the correct idea.
maybe, we use regex to match on "serial-number", lets say, first 8 digits (7366391743,
*
7366391745, ***7366391749, *73663917**55)
and list down the events.. that will approximately give us the time-frame and the events.

assuming logs with this format
SerialNumber=7366391743,
SerialNumber=7366391745,
SerialNumber=7366391749,
SerialNumber=7366391755

your base search | regex SerialNumber="(\d{8})"

second thought -
you could use delta command and find out the differences between serial numbers -

your base search | delta SerialNumber as SerialNumberVariation p=1 | where SerialNumberVariation < 10 | table _time _raw SerialNumberVariation

i tested this for windows event log EventCodes and its working fine -

sourcetype="WinEventLog:System" | delta EventCode as EventCodeDifference p=1 | where EventCodeDifference < 5 | table _raw EventCode EventCodeDifference

alt text

0 Karma
Get Updates on the Splunk Community!

Financial Services Industry Use Cases, ITSI Best Practices, and More New Articles ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Splunk Federated Analytics for Amazon Security Lake

Thursday, November 21, 2024  |  11AM PT / 2PM ET Register Now Join our session to see the technical ...

Splunk With AppDynamics - Meet the New IT (And Engineering) Couple

Wednesday, November 20, 2024  |  10AM PT / 1PM ET Register Now Join us in this session to learn all about ...