Splunk Search

Regex for multi-value field in which some values are listed and then aren't

jwalzerpitt
Motivator

I am trying to create a regex for a multivalue field (Message) in which some values are listed and sometimes aren't listed depending on the event. We are ingesting Shibboleth logs via _json format, and I am trying to extract three values from the Message field: URL, username, and src_ip (in bold in each event).

There are three different events for Shibboleth.
alt text

Is it possible to create a regex that would apply to all three events?

I have one regex that covers the first event and extracts the three fields.

alt text

Thx

0 Karma

inventsekar
Super Champion

maybe, the user name and src_ip are looking good. maybe, club these two in a single rex and use a separate rex for URL.

(photo is fine for reading) maybe, Can you please copy the logs and your rex as a text, so that we test it.

0 Karma

jwalzerpitt
Motivator

I've been struggling with the frustrating code tag markdown as I selected the code button, which adds the tickmarks to the beginning and end of the code, but the page still yells at me when I go to post it

0 Karma

inventsekar
Super Champion

posting it on the comment would be difficult. maybe, please post it as a separate answer or edit your question and add the text please.

0 Karma

jwalzerpitt
Motivator

I believe I figured it out - I had to create three separate regexes, one for each field, and when evaluating, I did not see any Non-Matches for each regex. Regexes are as follows:

^(?:[^|\n]|){13}(?P[^|]+)
^(?:[^|\n]
|){3}(?P[^|]+)
^(?:[^|\n]*|){8}(?P[^|]+)

0 Karma

sundareshr
Legend

It would be safer to create three separate regexes. That way extraction is not affected by minor changes to the log format. I would suggest, in the field extraction UI, create three separate field extraction rules.

*UPDATED*

props.conf
[stanza_name]
REPORT-extract_mv_fields: extract_url extract_src_ip extract_user

transforms.conf
[extract_url]
REGEX=(?<url>http[^\|]+)
MV_ADD=true

[extract_src_ip]
REGEX=(?<url>\d+\.\d+\.\d+\.\d+)
MV_ADD=true

[extract_user]
REGEX=\|{4}(?<user>\w+)\|{5}"
MV_ADD=true
0 Karma

jwalzerpitt
Motivator

I tried to do three separate regexes, but Splunk yelled when I tried to reuse the field extracted names (url, username, src_ip) for the second regex (logout with username).

Thx

0 Karma

sundareshr
Legend

Try the updated ans

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!