like in the subject, i am looking at events with different fields and delimeters
i want to say if the event contains thisword then rex blah blah blah elseif the event contains thisotherword then rex blah blah blah
i suspect this is simple but thought to ask
Just include X in the rex pattern with the correct relationship to the anchors for your field extraction
| rex "X.*anchor1(?<field1>pattern1)"
| rex "Y.*anchor2(?<field2>pattern2)"
ok i get this, but little experience with rex and especially anchors
is the anchor the word i am looking to match?
No, the anchor is the pattern for the place in the text that you want to appear before and/or after the field you want extract. For example, if your event contain "Event of type X with user id: abc123" and you wanted to extract the user id, you regex might be something like "X.* user id: (?<userid>\w+)". The "user id: " part would be the anchor for the field you are going to extract. You could also argue that the "X" is also an anchor as it ensures that the pattern will only match if the field being extracted from contains "X".
what i want to say is:
if _raw contains the word "Dog" then rex "(?<field1>([^\s]+))\s(?<field2>([^\s]+))\s(?<field3>([^\s]+))\s"
if _raw contains the word "Cat" then rex "(?<field1>([^\|]+))\|(?<field2>([^\|]+))\|(?<field3>([^\|]+))\|"
because if the line contians Dog, fields are delimited by spaces but if it contains Cat, fields are delimited by pipe symbol. I want the same field names just need to use a different rex based on delimiters. I cant formulate one rex that contains both delimiters
A streaming language generally do not use command branching. However, SPL has plenty of instruments to obtain the result you want. So, let me rephrase your requirement.
What I want is to extract from events is a vector of three components, field1, field2, field3. The method of extraction is based on whether the event contains dog or cat.
To illustrate, given this dataset
I want the following results
_raw a b c |i|j|k| Dog woofs l m n |x|y|z| Cat meows e f g |o|p|q| What does fox say? (This is based on reverse engineering your regex. As I do not know your real data, I have to make the format more rigid to make the illustration simpler.)
_raw field1 field2 field3 a b c |i|j|k| Dog woofs a b c l m n |x|y|z| Cat meows x y z e f g |o|p|q| What does fox say?
Let me demonstrate a conventional method to achieve this in SPL.
| rex "(?<field1_dog>\S+)\s(?<field2_dog>\S+)\s(?<field3_dog>\S+)\s"
| rex "\|(?<field1_cat>[^\|]+)\|(?<field2_cat>[^|]+)\|(?<field3_cat>[^|]+)\|"
| foreach field1 field2 field3
[eval <<FIELD>> = case(searchmatch("Dog"), <<FIELD>>_dog, searchmatch("Cat"), <<FIELD>>_cat)]
| fields - *_dog *_cat
As you can see, the idea is to apply both regex's, then use case function to selectively populate the final vector. This idea can be implemented in many ways.
Here is the emulation that generates my mock data. Play with it and compare with real data.
| makeresults format=csv data="_raw
a b c |i|j|k| Dog woofs
l m n |x|y|z| Cat meows
e f g |o|p|q| What does fox say?"
In many traditional languages, the requirement can also be expressed as conditional evaluation. While this is less conventional, you can also do this in SPL, usually with more cumbersome code.
Hi @darkins ,
It is actually simple, as long as you are comfortable with regex syntax.
It will be like this:
| eval condition=case(match(_raw, "thisword"), "first_condition", match(_raw, "thisotherword"), "second_condition", 1=1,"default_condition")
| rex field=_raw "<rex_pattern>" if condition=="first_condition"
| rex field=_raw "<rex_pattern>" if condition=="second_condition"
| rex field=_raw "<rex_pattern>" if condition=="default_condition"
Give it a try and let me know how it goes.
@victor_menezes Which version of Splunk are you using that supports this syntax of rex?
yeah i am getting syntax error Invalid Argument on rex