Solved: How do i create a 3-event transaction, starting wi...

djklitz · ‎11-18-2021

I have 2 types of events that come in the following, random, format:

AAAAAAABAAAAAABAAAAAAAAABAABAAA

B's never repeat, and they are always surrounded by an A.

The A prior to B has source information, and the A after the B has result/destination information. The information in B has command information. I am trying to compare the values of fields in the A events both before and after the B, as well as correlate with what the B event contains.

Pseudo example:
Event A: mac_address, ip_address, new_session_flag
Event B: mac_address, new_ip_address
Event A: mac_address, ip_address, new_session_flag

I need to know the source IP address (a.ip_address before), the IP address it was attempted to move to (b.ip_address), and the resulting IP address (a.ip_address after) and session flag status (a.new_session_flag after)

How can i do this? i am working w/ millions of records so i cannot use append/join/etc.

Example of a search where the new_session_flag begins with 0 and ends with 1, but I dont want to filter based on new_session_flag. I want all events where B is the 2nd event, regardless of new_session_flag status

(event=A OR event=B)
| transaction mac_address startswith=new_session_flag=0 endswith=new_session_flag=1 maxevents=3 unifyends=true mvlist=true
| fields event mac_address new_ip_address ip_address new_session_flag
| where mvindex(event,1) == "B"

ITWhisperer · ‎11-23-2021

This seems to work too, it effectively creates the reset_after and is giving exactly the same results as my solution without the use of reset_after which, to be fair, can be tricky to get right. I changed random() to row to get the same dummy data on every run

| gentimes start=-1 increment=1h 
| rename starttime as _time 
| streamstats count as row
| eval event=mvindex(split("B",""),(row+2)%5)
| eval mac="mac".mvindex(split("ABC",""),floor((row-1)/10)%3)
| fillnull value="A" event
| eval flag=if(event="A",row%10,null())
| eval ip="10.".(row%10).".".(row%10).".".(row%256)
| table event mac ip flag
| streamstats count as row 
| append
    [| gentimes start=-1 increment=1h 
| rename starttime as _time 
| streamstats count as row
| eval event=mvindex(split("B",""),(row+2)%5)
| eval mac="mac".mvindex(split("DEF",""),floor((row-1)/10)%3)
| fillnull value="A" event
| eval flag=if(event="A",row%10,null())
| eval ip="10.".(row%10).".".(row%10).".".(row%256)
| table event mac ip flag
| streamstats count as row 
]
| sort 0 row
| fields - row




| eval aip=if(event="A",ip,null())
| eval bip=if(event="B",ip,null())
| streamstats last(aip) as previousaip last(flag) as previousflag last(bip) as previousbip by mac current=f
| streamstats count(eval(event=="B")) as reset_count by mac
| streamstats count as row by mac reset_count
| where row=2 AND isnotnull(previousaip) AND event="A" AND reset_count != 0
| table event mac ip previousaip previousbip previousflag

View solution in original post

djklitz · ‎11-22-2021

This issue seems to be directly related https://community.splunk.com/t5/Splunk-Search/Why-is-streamstats-resetting-incorrectly-twice-in-the-...

ITWhisperer · ‎11-19-2021

Using random data generated by the first part, you could try something like this

| gentimes start=-1 increment=1h 
| rename starttime as _time 
| streamstats count as row
| eval event=mvindex(split("B",""),(row+2)%5)
| eval mac="mac".mvindex(split("ABC",""),floor((row-1)/10)%3)
| fillnull value="A" event
| eval flag=if(event="A",random()%10,null())
| eval ip="10.".(random()%10).".".(random()%10).".".(random()%256)
| table event mac ip flag




| eval aip=if(event="A",ip,null())
| streamstats last(aip) as previousaip by mac
| reverse
| streamstats last(aip) as resultantaip last(flag) as resultantflag by mac
| where event="B"
| table event mac ip previousaip resultantaip resultantflag

djklitz · ‎11-19-2021

Thank you ITWhisperer.

This does seem to produce the correct behavior, however it breaks down when i want to do things at scale. If i look at 7 days worth of data it is a 2GB search and looks like it hits data limits somewhere and gives me "partial results"

Is there a way to break this down to be more efficient?

ITWhisperer · ‎11-19-2021

Try it this way without the reverse (assuming that this is what is causing the issue)

| gentimes start=-1 increment=1h 
| rename starttime as _time 
| streamstats count as row
| eval event=mvindex(split("B",""),(row+2)%5)
| eval mac="mac".mvindex(split("ABC",""),floor((row-1)/10)%3)
| fillnull value="A" event
| eval flag=if(event="A",random()%10,null())
| eval ip="10.".(random()%10).".".(random()%10).".".(random()%256)
| table event mac ip flag




| eval aip=if(event="A",ip,null())
| eval bip=if(event="B",ip,null())
| streamstats last(aip) as previousaip last(bip) as previousbip by mac
| streamstats count as row by mac reset_after="("match(event,\"B\")")"
| streamstats values(previousaip) as prioraip window=1 current=f by mac
| where row=1 AND isnotnull(prioraip)
| table event mac ip prioraip previousbip flag

djklitz · ‎11-19-2021

I forgot to mention, in your first example, the i had to flip the streamstats order around as it was picking up the previous as the resultant and vice versa.

Anyways, regarding the second example. Something is not adding up. When I use your first example, i get 3000 results in the last 60 minutes. When i use the 2nd example, i only get three total results on the same data. The 3000 is the more accurate result.

ITWhisperer · ‎11-19-2021

Since your events are already in latest first order, try it this way (essentially the same as before except the flag is copied from the A event before the B event, which actually happens after in time)

| gentimes start=-1 increment=1h 
| rename starttime as _time 
| streamstats count as row
| eval event=mvindex(split("B",""),(row+2)%5)
| eval mac="mac".mvindex(split("ABC",""),floor((row-1)/10)%3)
| fillnull value="A" event
| eval flag=if(event="A",random()%10,null())
| eval ip="10.".(random()%10).".".(random()%10).".".(random()%256)
| table event mac ip flag




| eval aip=if(event="A",ip,null())
| eval bip=if(event="B",ip,null())
| streamstats last(aip) as previousaip last(flag) as previousflag last(bip) as previousbip by mac
| streamstats count as row by mac reset_after="("match(event,\"B\")")"
| streamstats values(previousaip) as prioraip values(previousflag) as priorflag window=1 current=f by mac
| where row=1 AND isnotnull(prioraip)
| table event mac ip prioraip previousbip priorflag

djklitz · ‎11-19-2021

ITWhisperer. I should also add, and I'm not sure if this affects your search.

The IP's of all A events prior to event B are repeating for each mac address.

Essentially I am tracking IP addresses of a modem (event A). Moving the modem to a new IP address (event B) and checking the resulting IP address after the move (event A that is immediately after event B). If there was a new session that occurred, the session flag will be either a 1 (new session) or 0 (successful move).

I need to know what the IP address was before the move, the IP address it was intended to move to (event B) and i am trying to quantify the number of successful moves versus the ones that required a new session.

djklitz · ‎11-19-2021

I am not getting the desired results. I'm only getting 3-6 results when i should be getting several thousand.

EDIT: If i count the number of B's i have right after the equivalent of this:

| table event mac ip flag

i have 2800 Bs, i would expect a row record for every instance of B

ITWhisperer · ‎11-20-2021

OK It looks like my dummy data was too bland - this caused the windowing to work when it reality it doesn't - I have interleaved another set of dummy results to and reworked without the windowing. Does this work any better?

| gentimes start=-1 increment=1h 
| rename starttime as _time 
| streamstats count as row
| eval event=mvindex(split("B",""),(row+2)%5)
| eval mac="mac".mvindex(split("ABC",""),floor((row-1)/10)%3)
| fillnull value="A" event
| eval flag=if(event="A",random()%10,null())
| eval ip="10.".(random()%10).".".(random()%10).".".(random()%256)
| table event mac ip flag
| streamstats count as row 
| append
    [| gentimes start=-1 increment=1h 
| rename starttime as _time 
| streamstats count as row
| eval event=mvindex(split("B",""),(row+2)%5)
| eval mac="mac".mvindex(split("DEF",""),floor((row-1)/10)%3)
| fillnull value="A" event
| eval flag=if(event="A",random()%10,null())
| eval ip="10.".(random()%10).".".(random()%10).".".(random()%256)
| table event mac ip flag
| streamstats count as row 
]
| sort 0 row
| fields - row




| eval aip=if(event="A",ip,null())
| eval bip=if(event="B",ip,null())
| streamstats last(aip) as previousaip last(flag) as previousflag last(bip) as previousbip by mac current=f
| streamstats count as row by mac reset_after="("match(event,\"B\")")"
| where row=1 AND isnotnull(previousaip) AND event="A"
| table event mac ip previousaip previousbip previousflag

djklitz · ‎11-22-2021

@ITWhisperer I think I got it. I'm getting exactly as many records as I have # of B events. Can you just confirm that my logic makes sense?

I added 1 more streamstats and changed the one with the reset_after:

| streamstats count(eval(event=="B")) as reset_count by mac
| streamstats count as row by mac reset_count
| where row=2 AND isnotnull(previousaip) AND event="A" AND reset_count != 0

Making the full search, using your dummy data:

| gentimes start=-1 increment=1h 
| rename starttime as _time 
| streamstats count as row
| eval event=mvindex(split("B",""),(row+2)%5)
| eval mac="mac".mvindex(split("ABC",""),floor((row-1)/10)%3)
| fillnull value="A" event
| eval flag=if(event="A",random()%10,null())
| eval ip="10.".(random()%10).".".(random()%10).".".(random()%256)
| table event mac ip flag
| streamstats count as row 
| append
    [| gentimes start=-1 increment=1h 
| rename starttime as _time 
| streamstats count as row
| eval event=mvindex(split("B",""),(row+2)%5)
| eval mac="mac".mvindex(split("DEF",""),floor((row-1)/10)%3)
| fillnull value="A" event
| eval flag=if(event="A",random()%10,null())
| eval ip="10.".(random()%10).".".(random()%10).".".(random()%256)
| table event mac ip flag
| streamstats count as row 
]
| sort 0 row
| fields - row




| eval aip=if(event="A",ip,null())
| eval bip=if(event="B",ip,null())
| streamstats last(aip) as previousaip last(flag) as previousflag last(bip) as previousbip by mac current=f
| streamstats count(eval(event=="B")) as reset_count by mac
| streamstats count as row by mac reset_count
| where row=2 AND isnotnull(previousaip) AND event="A" AND reset_count != 0
| table event mac ip previousaip previousbip previousflag

ITWhisperer · ‎11-23-2021

This seems to work too, it effectively creates the reset_after and is giving exactly the same results as my solution without the use of reset_after which, to be fair, can be tricky to get right. I changed random() to row to get the same dummy data on every run

| gentimes start=-1 increment=1h 
| rename starttime as _time 
| streamstats count as row
| eval event=mvindex(split("B",""),(row+2)%5)
| eval mac="mac".mvindex(split("ABC",""),floor((row-1)/10)%3)
| fillnull value="A" event
| eval flag=if(event="A",row%10,null())
| eval ip="10.".(row%10).".".(row%10).".".(row%256)
| table event mac ip flag
| streamstats count as row 
| append
    [| gentimes start=-1 increment=1h 
| rename starttime as _time 
| streamstats count as row
| eval event=mvindex(split("B",""),(row+2)%5)
| eval mac="mac".mvindex(split("DEF",""),floor((row-1)/10)%3)
| fillnull value="A" event
| eval flag=if(event="A",row%10,null())
| eval ip="10.".(row%10).".".(row%10).".".(row%256)
| table event mac ip flag
| streamstats count as row 
]
| sort 0 row
| fields - row




| eval aip=if(event="A",ip,null())
| eval bip=if(event="B",ip,null())
| streamstats last(aip) as previousaip last(flag) as previousflag last(bip) as previousbip by mac current=f
| streamstats count(eval(event=="B")) as reset_count by mac
| streamstats count as row by mac reset_count
| where row=2 AND isnotnull(previousaip) AND event="A" AND reset_count != 0
| table event mac ip previousaip previousbip previousflag

djklitz · ‎11-29-2021

Thank you @ITWhisperer . i've marked your solution as solved. It is still a very slow query, but I think that's due to the number of records I'm using. No way around that. I'm able to look much farther back than I was with Transaction though (3 weeks versus 3 hours) so thank you for that ! I'm now hitting my 2GB user data limit on the search.

ITWhisperer · ‎11-20-2021

Alternatively, add the global=f option to the windowing streamstats

| eval aip=if(event="A",ip,null())
| eval bip=if(event="B",ip,null())
| streamstats last(aip) as previousaip last(flag) as previousflag last(bip) as previousbip by mac
| streamstats count as row by mac reset_after="("match(event,\"B\")")"
| streamstats values(previousaip) as prioraip values(previousflag) as priorflag window=1 global=f current=f by mac
| where row=1 AND isnotnull(prioraip)
| table event mac ip prioraip previousbip priorflag

djklitz · ‎11-22-2021

I've highlighted in red the issue. The 1 should be a 3.

time	event	mac	ip	flag	aip	bip	previousaip	previousbip	previousflag	row
11/21/21 1:00	A	macD	10.6.4.174	1	10.6.4.174		10.1.2.116		1	2
11/21/21 2:00	B	macA	10.7.9.74			10.7.9.74	10.6.2.170		1	3
11/21/21 2:00	B	macD	10.9.6.21			10.9.6.21	10.6.4.174		1	1
11/21/21 3:00	A	macA	10.3.5.116	1	10.3.5.116		10.6.2.170	10.7.9.74	1	1

If the next row after the first B was an A event, it's count would falsely be 1 for that mac address. That is what I'm seeing in my data. You dont see it in your data because all of the B's are back to back, or surrounded by A events of the same mac address.

This is causing the

| where row=1 AND isnotnull(previousaip) AND event="A"

to return incomplete/incorrect rows where previousbip is empty or referring to an older B event. Is there a way to guarantee the reset_after occurs on the correct sequence? Ie key reset_after on mac address + event B?

djklitz · ‎11-22-2021

Thank you @ITWhisperer Unfortunately, I'm still not getting the desired results. I have been breaking down your search and I am uncertain if it is actually working in the correct manner. I think the reset_after needs to match both the mac address and event B. I am seeing the count reset for macD when it sees the event B for macA. I'm getting character limited for some reason. I'll try to show an example in a 2nd reply

How do i create a 3-event transaction, starting with event A, ending with event A, but always has event B inbetween?

transaction

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey

Announcing the Migration of the Splunk Add-on for Microsoft Azure Inputs to ...

Splunk Observability for AI

Join the Conversation

How do i create a 3-event transaction, starting with event A, ending with event A, but always has event B inbetween?

transaction

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey

Announcing the Migration of the Splunk Add-on for Microsoft Azure Inputs to ...

Splunk Observability for AI