Solved: Find rare Transaction flow

indeed_2000 · ‎10-30-2023

Hi

Is there anyway to find transaction flow like this

i have log file contain 50 million transactions like this

16:30:53:002 moduleA:[C1]L[143]F[10]ID[123456]
16:30:54:002 moduleA:[C2]L[143]F[20]ID[123456]
16:30:55:002 moduleB:[C5]L[143]F[02]ID[123456]
16:30:56:002 moduleC:[C12]L[143]F[30]ID[123456]
16:30:57:002 moduleD:[C5]L[143]F[7]ID[123456]
16:30:58:002 moduleE:[C1]L[143]F[10]ID[123456]
16:30:59:002 moduleF:[C1]L[143]F[11]ID[123456]
16:30:60:002 moduleZ:[C1]L[143]F[11]ID[123456]

need to find module flow for each transaction and find rare flow.

challenges:

1- there is no specific “key value” exist on all lines belong that transaction.

2-only key value that exist on all line is ID[123456].

3-ID might be duplicated and might belong several transactions.

4-module name not have specific name and there are lots of module names.

5-ID not fixed position (end of each line)

any idea?

Thanks

PickleRick · ‎11-02-2023

Ok, from what I understand you need something like

<your_search>
| stats list(module) as modules last(T) as T by transactionID
| eval modules=mvjoin(modules," ")
| stats count by modules T

Depending on your sort order you might want first(T) instead of last(T)

View solution in original post

PickleRick · ‎10-30-2023

Ok. So you have some data which might have some format but then again it might not and you want us to find for you something but you don't tell us what it is. How are we supposed to be able to do so if we neither understand the data nor know what you're looking for?

indeed_2000 · ‎10-31-2023

@PickleRick You right, after several workarounds finally figure out how extract list of modules, and solved all challenges.

Now i have list of modules like this (groupby them by id):

Txn1
16:30:53:002 moduleA
16:30:54:002 moduleA
16:30:55:002 moduleB
16:30:56:002 moduleC
16:30:57:002 moduleD
16:30:58:002 moduleE
16:30:59:002 moduleF
16:30:60:002 moduleZ

Txn2
16:30:54:002 moduleD 
16:30:55:002 moduleE
16:30:56:002 moduleY

how can i use splunk to find patterns(flow) of modules? Find most patterns and rare patterns?

PickleRick · ‎10-31-2023

What do you mean by "patterns"? The answer will greatly depend on how you define it. Because depending on your needs, you can just flatten the module list to a string and do a summary which string happens most often or you can try some othe techniques ending with using MLTK app.

indeed_2000 · ‎10-31-2023

@PickleRick i have lots of microservice that work together.

when user search on my product log something like this that show flow of what modules processing user request:

e.x front-end > search > db > report

PickleRick · ‎10-31-2023

The easiest (although I still have no idea if that's what you need) approach will probably be something like this

<your_search>
| stats list(module) as modules by transactionID
| eval modules=mvjoin(modules," ")
| stats count by modules

indeed_2000 · ‎11-01-2023

@PickleRick

It work great thanks, i have another key value that call T[001] means “Type” on each line.

in last line need to add it, to show in result, so try

1-to add to last stats but it returns nothing for T (because remove in first stats)

2-try to add after “by” in first stats not work,

3-use evenstream but it will count all lines that contain module, while it should return 1

<your_search>
| stats list(module) as modules by transactionID
| eval modules=mvjoin(modules," ")
| stats count by modules

Any idea?

PickleRick · ‎11-02-2023

Can't help you without knowing what is the relationship between this field and the rest of the data.

indeed_2000 · ‎11-02-2023

@PickleRick

About this “relationship between this field”

i have three fields: id, type, node(or module)

1-id is unique numeric field.

2-type is category of each transaction.

3-module is name of each module that transactions pass through it.

these fields exist on all lines and separate by “id”

also each transaction has it’s own “type”

each transactions might be several lines

Here is the example:

16:30:53:002 moduleA:[C1]L[143]T[10]ID[123456]
16:30:54:002 moduleA:[C2]L[143]T[10]ID[123456]
16:30:59:002 moduleF:[C1]L[143]T[11]ID[123456]
16:30:60:002 moduleZ:[C1]L[143]T[11]ID[123456]

16:30:53:002 moduleB:[C1]L[143]T[20]ID[987654]

16:30:54:002 moduleD:[C2]L[143]T[20]ID[987654]

16:30:59:002 moduleE:[C1]L[143]T[21]ID[987654]

Expected output:

flow Id T C

moduleA > moduleF > moduleZ 123456 11 1

moduleB > moduleD > moduleE 987654 21 1

FYI: latest value of T=11 important for me.

FYI: C meant count of number of detected this flow.

FYI: like APM that draw trace of transaction need something like that without create graph just find rare transaction patterns or flow.

Any idea?

Thanks

PickleRick · ‎11-02-2023

Ok, from what I understand you need something like

<your_search>
| stats list(module) as modules last(T) as T by transactionID
| eval modules=mvjoin(modules," ")
| stats count by modules T

Depending on your sort order you might want first(T) instead of last(T)

indeed_2000 · ‎11-03-2023

@PickleRick Thanks work perfectly.

but on some lines because of poor logging issue, i can see another transaction with same transactionID!

“transactionID” Not unique in some transactions, but it is possible to differentiate from each other with “timestamp” and “Type”

e.g. i can see transactionID 12345 detected on 00:00:01:000

after second detect another transaction with same transactionID 12345 on 00:00:02:000

FYI: it’s not a lot but affecting on result, is there any way to separate them in some way in splunk?

Thanks

PickleRick · ‎11-03-2023

Yeah, you probably can fiddle with groupping by the T value or binning _time to some value (I suppose not all parts of a single transaction will have the same exact timestamp - they would probably differ by some fraction of a second or even whole seconds) so you'd have to bin the _time and then use it for groupping.

Find rare Transaction flow

field extraction

rex

Join Us for Splunk University and Get Your Bootcamp Game On!

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

Announcing Scheduled Export GA for Dashboard Studio