Re: Regex from source

thinksplunk · ‎09-23-2013

if i need to extract "num" from source=c:/documents/app/test1/test12/controlnum34/12.log and tag as field, how to go abt doing it? thks

thinksplunk · ‎09-30-2013

If i need to extract two fields from below string
"source=/app/cups-drink/test/iron13-machine5a-43machine.log"
The first field name is "item" and value is "cups"
The second field name is "system" and value is "43machine"

dwaddle · ‎09-26-2013

This really isn't an answer, but more of a comment that applies to all of these great solutions. An approach using the rex command will work great. But, if you try to put this into a configuration file as a permanent field extraction ( props.conf or transforms.conf ) and want to use it in a base search, you will probably not get the result you're looking for. The reason for this is when you do a search for something like

sourcetype=mysourcetype myfieldfromsource=123

splunk will look for the token "123" within the raw text of the event - it will not look in the source field.

If you want to extract a regular expression from source and have it searchable as a field name in a base search then you will need to make it an indexed field. Indexed fields are not recommended for a variety of very good reasons, not the least of which is they are must be defined in advance and are very inflexible. But if this is what you need to solve your problem, it is available to you.

lukejadamec · ‎09-26-2013

If regex was that easy, then I would have answered.:)

kristian_kolb · ‎09-26-2013

... | rex field=source "^/[^/]+/(?<animal>[a-zA-Z]+)"

Which means, from the start of the string in the field called source, find a single slash, followed by one or more non-slash characters, followed by a single slash - then take all (but at least one) uppercase or lowercase letters you find, and put them in the field 'animal'.

As you'll find, the field will only contain 'dog' in this scenario, as the dash between 'dog' and 'focus' is not a letter.

You can probably benefit from reading up on regular expressions if you want to make more dynamic extractions.

/K

lukejadamec · ‎09-23-2013

How about:

Search | rex field=_raw .*capture(?<NUM>num)34/12.log.*$

kristian_kolb · ‎09-24-2013

faster 🙂

... | eval num="num" | ...

thinksplunk · ‎09-23-2013

i am trying to extract the word "NUM" from source=c:/documents/app/test1/test12/controlNUM34/12.log.

kristian_kolb · ‎09-23-2013

You can do field extractions dynamically in the search with the rex command;

your_base_search | rex field=source "your regex with a capture group here"

to capture "34" an put it in a field called num;

your_base_search | rex field=source "(?<num>\d+)/[^/]+$"

which is to be read as, capture one or more digits (and call them num) that are followed by one slash, which is followed by one or more non-slash characters, followed by the end-of-line.

Once you're happy with your regex field extraction, you should probably make it 'permanent' by adding the extraction rule to props.conf as an EXTRACT.

See more here:

http://docs.splunk.com/Documentation/Splunk/5.0.4/Knowledge/Addfieldsatsearchtime
http://docs.splunk.com/Documentation/Splunk/5.0.4/Knowledge/Createandmaintainsearch-timefieldextract...
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Rex

/K

kristian_kolb · ‎10-01-2013

Given your question here, and in other posts I suggest that you read up on regex in general.

e.g. http://www.regular-expressions.info
http://gskinner.com/RegExr/

In this case (one of) the answer(s) is;

rex field=source "/app/(?<item>[a-z]+)([^/]+/){2}.+(?<system>[^-]+)\.log$

Which is; find '/app/', then take any a-z characters and call them item. Then jump over any non-slash characters followed by a slash, twice. Then skip through any characters, until you find a set of non-dash characters followed by .log at the end of the string. Call these non-dash characters system.

/K

kristian_kolb · ‎09-24-2013

I'm guessing that you want to extract XXX in the following scenario, where XXX is a string that follows 'control' and 'yy' is one or more digits. Not the literal string 'num', right?

/controlXXXyy/zzz.log

In that case;

rex field=source "/control(?<XXX>[a-zA-Z]+)\d+/[^/]+$"

rturk · ‎09-23-2013

Hi Thinksplunk - can you give a few more samples? Are you trying to extract:

source=c:/documents/app/test1/test12/control*num*34/12.log

or:

source=c:/documents/app/test1/test12/controlnum*34*/12.log?

Regex from source

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Deep Dive: Accelerate threat investigation with Splunk’s AI Assistant in Security

Announcing Modern Navigation: A New Era of Splunk User Experience

Detection Engineering Office Hours: Real-World Troubleshooting & Q&A

Join the Conversation