Splunk Search

Find rare Transaction flow

indeed_2000
Motivator

Hi

Is there anyway to find transaction flow like this

i have log file contain 50 million transactions like this

16:30:53:002 moduleA:[C1]L[143]F[10]ID[123456]
16:30:54:002 moduleA:[C2]L[143]F[20]ID[123456]
16:30:55:002 moduleB:[C5]L[143]F[02]ID[123456]
16:30:56:002 moduleC:[C12]L[143]F[30]ID[123456]
16:30:57:002 moduleD:[C5]L[143]F[7]ID[123456]
16:30:58:002 moduleE:[C1]L[143]F[10]ID[123456]
16:30:59:002 moduleF:[C1]L[143]F[11]ID[123456]
16:30:60:002 moduleZ:[C1]L[143]F[11]ID[123456]

 
need to find module flow for each transaction and find rare flow.

 

challenges:

1- there is no specific “key value” exist on all lines belong that transaction. 

2-only key value that exist on all line is ID[123456]. 

3-ID might be duplicated and might belong several transactions.

4-module name not have specific name and there are lots of module names.

5-ID not fixed position (end of each line)

any idea?

Thanks

Labels (2)
Tags (1)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

Ok, from what I understand you need something like

<your_search>
| stats list(module) as modules last(T) as T by transactionID
| eval modules=mvjoin(modules," ")
| stats count by modules T

Depending on your sort order you might want first(T) instead of last(T)

View solution in original post

PickleRick
SplunkTrust
SplunkTrust

Ok. So you have some data which might have some format but then again it might not and you want us to find for you something but you don't tell us what it is. How are we supposed to be able to do so if we neither understand the data nor know what you're looking for?

0 Karma

indeed_2000
Motivator

@PickleRick You right, after several workarounds finally figure out how extract list of modules, and solved all challenges.

 

Now i have list of modules like this (groupby them by id):

Txn1
16:30:53:002 moduleA 16:30:54:002 moduleA 16:30:55:002 moduleB 16:30:56:002 moduleC 16:30:57:002 moduleD 16:30:58:002 moduleE 16:30:59:002 moduleF 16:30:60:002 moduleZ
Txn2
16:30:54:002 moduleD
16:30:55:002 moduleE
16:30:56:002 moduleY

 
how can i use splunk to find patterns(flow) of modules? Find most patterns and rare patterns?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

What do you mean by "patterns"? The answer will greatly depend on how you define it. Because depending on your needs, you can just flatten the module list to a string and do a summary which string happens most often or you can try some othe techniques ending with using MLTK app.

indeed_2000
Motivator

@PickleRick i have lots of microservice that work together.

 

when user search on my product log something like this that show flow of what modules processing user request:

e.x front-end > search > db > report

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

The easiest (although I still have no idea if that's what you need) approach will probably be something like this

<your_search>
| stats list(module) as modules by transactionID
| eval modules=mvjoin(modules," ")
| stats count by modules

indeed_2000
Motivator

@PickleRick 

It work great thanks, i have another key value that call T[001] means “Type” on each line. 

in last line need to add it, to show in result, so try

1-to add to last stats but it returns nothing for T (because remove in first stats)

2-try to add after “by” in first stats not work,

3-use evenstream but it will count all lines that contain module, while it should return 1 

<your_search>
| stats list(module) as modules by transactionID
| eval modules=mvjoin(modules," ")
| stats count by modules

Any idea? 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Can't help you without knowing what is the relationship between this field and the rest of the data.

0 Karma

indeed_2000
Motivator

@PickleRick

About this “relationship between this field”

 

i have three fields:  id, type, node(or module)

1-id is unique numeric field.

2-type is category of each transaction.

3-module is name of each module that transactions pass through it.

 

these fields exist on all lines and separate by “id”

also each transaction has it’s own “type”

each transactions might be several lines


Here is the example:

16:30:53:002 moduleA:[C1]L[143]T[10]ID[123456]
16:30:54:002 moduleA:[C2]L[143]T[10]ID[123456]
16:30:59:002 moduleF:[C1]L[143]T[11]ID[123456]
16:30:60:002 moduleZ:[C1]L[143]T[11]ID[123456]

 

16:30:53:002 moduleB:[C1]L[143]T[20]ID[987654]

16:30:54:002 moduleD:[C2]L[143]T[20]ID[987654]

16:30:59:002 moduleE:[C1]L[143]T[21]ID[987654]

 

Expected output:

flow                                                                 Id          T     C

moduleA > moduleF > moduleZ     123456   11    1

moduleB > moduleD > moduleE     987654 21    1

 

FYI: latest value of T=11 important for me.

FYI: C meant count of number of detected this flow.

 

FYI: like APM that draw trace of transaction need something like that without create graph just find rare transaction patterns or flow.

 

Any idea?

Thanks

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Ok, from what I understand you need something like

<your_search>
| stats list(module) as modules last(T) as T by transactionID
| eval modules=mvjoin(modules," ")
| stats count by modules T

Depending on your sort order you might want first(T) instead of last(T)

indeed_2000
Motivator

@PickleRick Thanks work perfectly.

but on some lines because of poor logging issue, i can see another transaction with same transactionID!


“transactionID” Not unique in some transactions, but it is possible to differentiate from each other with “timestamp” and “Type”

e.g. i can see transactionID 12345 detected on 00:00:01:000

after second detect another transaction with same transactionID 12345 on 00:00:02:000

 

FYI: it’s not a lot but affecting on result, is there any way to separate them in some way in splunk? 

Thanks

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Yeah, you probably can fiddle with groupping by the T value or binning _time to some value (I suppose not all parts of a single transaction will have the same exact timestamp - they would probably differ by some fraction of a second or even whole seconds) so you'd have to bin the _time and then use it for groupping.

0 Karma
Get Updates on the Splunk Community!

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...

Explore the Latest Educational Offerings from Splunk [January 2025 Updates]

At Splunk Education, we are committed to providing a robust learning experience for all users, regardless of ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...