Splunk Search

Count as 1 if two rows have two identical field but in the opposite order

ngerosa
Path Finder

Hello!
I extract everyday from db a list of events that have the following fields:

NODO_A NODO_Z DELTA TIMESTAMP

I want to count as 1 event if there is a couple of NODO_A NODO_Z but in the opposite order with a max difference of 5 minutes in TIMESTAMP field.

Example:

MILAN ROME 14.6 2017-06-28 11:32:02
ROME MILAN 17.2 2017-06-28 11:32:15
TURIN MILAN 11.1 2017-06-21 17:11:19
MILAN TURIN 10.4 2017-06-21 17:12:03

As the result I want:

MILAN ROME 1
MILAN TURIN 1

Can anyone help me?

Thanks!

Tags (1)
0 Karma
1 Solution

somesoni2
Revered Legend

Give this a try
Updated Turn out there were several typo in my original answer. Try this one.

Your current search giving field NODO_A NODO_Z DELTA TIMESTAMP
| eval commonfield=mvsort(split(NODO_A."#".NODO_Z,"#")) | nomv commonfield | eval epochtime=round(strptime(TIMESTAMP,"%Y-%m-%d %H:%M:%S"))
 | sort commonfield epochtime | streamstats current=f window=1 values(epochtime) as prevtime by commonfield
 | eval duration=epochtime-prevtime | where duration<300 
 | stats count by commonfield

This is runanywhere sample search with your sample data.

| gentimes start=-1 | eval temp="MILAN ROME 14.6 2017-06-28 11:32:02#ROME MILAN 17.2 2017-06-28 11:32:15#TURIN MILAN 11.1 2017-06-21 17:11:19#MILAN TURIN 10.4 2017-06-21 17:12:03" | table temp | makemv temp delim="#" | mvexpand temp | rex field=temp "(?<NODO_A>\S+) (?<NODO_Z>\S+) (?<DELTA>\S+) (?<TIMESTAMP>.+)" | fields - temp | rename COMMENT as "This is just to generate sample data" 
| eval commonfield=mvsort(split(NODO_A."#".NODO_Z,"#")) | nomv commonfield | eval epochtime=round(strptime(TIMESTAMP,"%Y-%m-%d %H:%M:%S"))
 | sort commonfield epochtime | streamstats current=f window=1 values(epochtime) as prevtime by commonfield
 | eval duration=epochtime-prevtime | where duration<300 
 | stats count by commonfield

View solution in original post

0 Karma

somesoni2
Revered Legend

Give this a try
Updated Turn out there were several typo in my original answer. Try this one.

Your current search giving field NODO_A NODO_Z DELTA TIMESTAMP
| eval commonfield=mvsort(split(NODO_A."#".NODO_Z,"#")) | nomv commonfield | eval epochtime=round(strptime(TIMESTAMP,"%Y-%m-%d %H:%M:%S"))
 | sort commonfield epochtime | streamstats current=f window=1 values(epochtime) as prevtime by commonfield
 | eval duration=epochtime-prevtime | where duration<300 
 | stats count by commonfield

This is runanywhere sample search with your sample data.

| gentimes start=-1 | eval temp="MILAN ROME 14.6 2017-06-28 11:32:02#ROME MILAN 17.2 2017-06-28 11:32:15#TURIN MILAN 11.1 2017-06-21 17:11:19#MILAN TURIN 10.4 2017-06-21 17:12:03" | table temp | makemv temp delim="#" | mvexpand temp | rex field=temp "(?<NODO_A>\S+) (?<NODO_Z>\S+) (?<DELTA>\S+) (?<TIMESTAMP>.+)" | fields - temp | rename COMMENT as "This is just to generate sample data" 
| eval commonfield=mvsort(split(NODO_A."#".NODO_Z,"#")) | nomv commonfield | eval epochtime=round(strptime(TIMESTAMP,"%Y-%m-%d %H:%M:%S"))
 | sort commonfield epochtime | streamstats current=f window=1 values(epochtime) as prevtime by commonfield
 | eval duration=epochtime-prevtime | where duration<300 
 | stats count by commonfield
0 Karma

ngerosa
Path Finder

Hi somesoni2,
the search retrieve no results.

0 Karma

somesoni2
Revered Legend

Try the updated answer.

0 Karma

ngerosa
Path Finder

It partially works.
The search have basically two problems:
1) If there is a couple of NODO_A NODO_Z that don't have any value in common with other couple, the duration field is null. If in the where clause I include the null value I also include duplicate in commonfield value
2)With where duration<300 I exclude also the couple NODO_A NODO_Z with a difference in timestamp more than 300s

MILAN ROME 14.6 2017-06-28 11:32:02
MILAN TURIN 10.4 2017-06-21 17:12:03
ROME MILAN 17.2 2017-06-28 13:32:15
TURIN MILAN 10.4 2017-06-21 17:18:03

In this case I want
MILAN ROME 2
MILAN TURIN 1

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to January Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...

[Puzzles] Solve, Learn, Repeat: Reprocessing XML into Fixed-Length Events

This challenge was first posted on Slack #puzzles channelFor a previous puzzle, I needed a set of fixed-length ...

Data Management Digest – December 2025

Welcome to the December edition of Data Management Digest! As we continue our journey of data innovation, the ...