Splunk Search

Rex problem

New Member

Hi, I have the following data set:

(x,y,z could be any number in the following data sets)

(All IPs are in the IPField, separated by blank)

NameField IPField

A 10.x.x.x 10.y.y.y

B 10.x.x.x 10.y.y.y 10.z.z.z 

C 10.x.x.x 10.y.y.y 12.z.z.z 13.x.x.x

D 166.z.z.z 166.x.x.x

E 20.y.y.y 35.x.x.x

Only C and E on the above-mentioned list have different first numbers in their data sets. I want to extract C and E from this example.

With that said, the desired output is:

C 10.x.x.x 10.y.y.y 12.z.z.z 13.x.x.x

E 20.y.y.y 35.x.x.x

How do I do that? Thanks!

// NEW UPDATED

Thanks for answering! Sorry for any confusion made.

To make it more clear:

There are several events (results).

Each event has 2 fields: NameField and IPField

NameField has a single value: A or B or C....

Each NameField is linked to an IPField.

An IPField has multiple IP values,separated by blank. Here are 3 examples:

IPField 1: 10.1.1.1 10.2.2.2 10.3.3.3

IPField 2: 10.1.1.1 20.1.1.1 30.1.1.1 10.2.2.2

IPField 3: 50.1.1.1 50.2.2.2 

From the 3 examples above, if we check the first field of every ip addresses in every IPfield, we'll get

IPField 1: 10 10 10

IPField 2: 10 20 30 10

IPField 3: 50 50

I want to find events similar to IPField 2, those with different values in IPField. Thanks!

Tags (1)
0 Karma

SplunkTrust
SplunkTrust

Hehe. Can you add more clarity to the question? Gerald and I managed to read it in very different ways.

0 Karma

Splunk Employee
Splunk Employee

Eh, you can also do this with a plain old regex. Note that regex isn't a mathematical regular expression, so you can simply say:

... | regex _raw="^\S+\s+(?\d++)\.\d+\.\d+\.\d+.*?\g{1}\.\d+\.\d+\.\d+"

or:

... | where match(_raw,"^\S+\s+(?\d++)\.\d+\.\d+\.\d+.*?\g{1}\.\d+\.\d+\.\d+")

where \g{1} matches the first capture group.


Oops. Re-read the question. The regex is a little more complicated than that:

... | where match(_raw,"^\S+\s+(?\d++)\.\d+\.\d+\.\d+.*?\s(?!\g{1}\.)\d+\.\d+\.\d+\.\d+")

SplunkTrust
SplunkTrust

This is an interesting search language question, but I dont think it has anything to do with the rex command. Let me see if I can restate your question to make sure I have it right.

You need to find all the values of NameField, for which there is an IPField value that does not occur against any other NameField value. Stated backwards, find the IPField values that are only present in a single NameField value across theset, and you want all those NameField values.

To do this, you would run a subsearch to find all those NameField values -- the ones where there's at least one IPField value that is ever associated with any other NameField. The subsearch will then return out into the 'main search' a sort of dynamic search term that will look like ( NameField="C" OR NameField="E").

the subsearch syntax uses square brackets, and although normally the subsearch yields it's results out into a 'main search' that has some other terms, in your particular case there are no other terms out there.

I assume also that the IPField is NOT already a multivalue field, but rather that it is an ordinary single-value field that has literal space characters in it. If it is already a multivalue field then you should take out the 'makemv' command below.

[ <your search> | makemv IPField delim=" " | mvexpand IPField | stats dc(NameField) as nameCount values(NameField) as NameField by IPField | where nameCount=1 | fields NameField | mvexpand Namefield ]

This is pretty complex so let me talk through it a bit in english

makemv IPField delim=" "

Take the big "10.x.x.x 10.y.y.y" field, and turn it into a multivalued field by splitting on space.

mvexpand IPField

For each of the N values of the multivalued field IPField, replace the entire result row with N result rows, where the other values are cloned, and the IPField value is now that one single value.

stats dc(NameField) as nameCount values(NameField) as NameField by IPField

for each IPField, count the distinct number of names associated with it as 'namecount', and keep all the values of NameField as a multivalued field called NameField.

where nameCount=1

now throw away any rows where there is more than one distinct value for nameCount

fields NameField

throw away all of the fields except for NameField.

mvexpand Namefield

For each result row, take the multivalued field NameField, and take the N values and replace the entire row with N rows, each with just one of the values.

So when this subsearch comes out of the square brackets, it'll get turned into

NameField=C OR NameField=E

0 Karma

SplunkTrust
SplunkTrust

No, I just completely misinterpreted what he was asking and answered a much harder question. 😃 Thanks for catching.

0 Karma

Splunk Employee
Splunk Employee

Less complicated solution can be done purely with regexes. see other answer.

0 Karma