Need to understand Regular Expression

abilann · ‎04-07-2020

Team,

Can anyone please help me to understand the below regular expression used in field extraction?

(?i)CPU_COUNT\s+(?P[^ \n]*)?

Thanks,
Abilan

MuS · ‎04-08-2020

Hi abilann,

The regex is looking for a case insensitive match for CPU_COUNT followed by one or more whitespace and puts the following characters that are not a new line in a field called cpu_cores(in a greedy mode).

This is a literal translation of the regex.

Hope this helps ...

cheers, MuS

MuS · ‎04-08-2020

Hi abilann,

Can you also please post some sample events, because with just the regex it hard to answer.
Also, this posted regex is not correct because you have an incomplete group structure and the last ? does not have a preceding token.

cheers, MuS

abilann · ‎04-08-2020

Hi ,

Actually this is the default field extract (hardware : EXTRACT-cpu_cores) used in "Splunk App for AWS". Am trying to understand how they are extracting CPU_Cores from the events. Because I could not find any keyword like "CPU" in the events.

Thanks,
Abilan

gcusello · ‎04-07-2020

Hi @abilann,
your regex isn't readable, please use Code Sample button (the one with 101010) to display your regex.

In addition, I suggest to put your regex and a sample of your logs in regex101.com site, you can test your regex and there's (on the right side) a description of the regex.

Ciao.
Giuseppe

abilann · ‎04-07-2020

(?i)CPU_COUNT\s+(?P<cpu_cores>[^ \n]*)?

gcusello · ‎04-08-2020

Hi @abilann,
This is the explanation of your regex by regex101.com:

(?i) match the remainder of the pattern with the following effective flags: gmi
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
CPU_COUNT matches the characters CPU_COUNT literally (case insensitive)
\s+ matches any whitespace character (equal to [\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
Named Capture Group cpu_cores (?P<cpu_cores>[^ \n]*)?
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
Match a single character not present in the list below [^ \n]*
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
  matches the character   literally (case insensitive)
\n matches a line-feed (newline) character (ASCII 10)
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

That you can see by yourself at https://regex101.com/r/xfUL8y/1 .

Ciao.
Giuseppe

Need to understand Regular Expression

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Casting Call: Compete in Cyber Games

Announcing Modern Navigation: A New Era of Splunk User Experience

How Edge Processor's Durable Queue Works

Join the Conversation

Need to understand Regular Expression

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Casting Call: Compete in Cyber Games

Announcing Modern Navigation: A New Era of Splunk User Experience

How Edge Processor's Durable Queue Works