Hi All,
I need a regex that can extract particular bits from proxy events equally e.g. there are different types of events with similar KVs and I am looking for a unified rex that can work for each individual event and extract the following:
| rex mode=sed "s/(?<cip>c-ip=\S+)\s.*(?<csbytes>cs-bytes=\S+)\s.*(?<cscategories>cs-categories=\S+)\s.*(?<cshost>cs-host=\S+)\s.*(?<csip>cs-ip=\S+)\s.*(?<csmethod>cs-method=\S+)\s.*(?<csuriport>cs-uri-port=\S+)\s.*(?<csurischeme>cs-uri-scheme=\S+)\s.*(?<csusername>cs-username=\S+)\s.*(?<action>s-action=\S+)\s.*(?<sip>s-ip=\S+)\s.*(?<scbytes>sc-bytes=\S+)\s.*(?<status>sc-status=\S+)\s.*(?<timetaken>time-taken=\S+)\s.*(?<url>c-url=\S+)\s.*(?<csreferer>cs-Referer=\S+)\s.*(?<rip>r-ip=\S+)\s.*(?<ssourceport>s-source-port=\S+)\s.*/\1 \2 \3 \4 \5 \6 \7 \8 \9 \10 \11 \12 \13 \14 \15 \16 \17 \18/g"
Events of Action=Allowed work to some extent but as soon as any of the fields is missing for example cs-Referer, the event does not get stipped as expected and it ignores the regex.
Any help is much appreciated.
Thank you!
Try this:
| rex mode=sed "s/(?<cip>c-ip=\S+).*(?<csbytes>cs-bytes=\S+).*(?<cscategories>cs-categories=\S+).*(?<cshost>cs-host=\S+).*(?<csip>cs-ip=\S+).*(?<csmethod>cs-method=\S+).*(?<csuriport>cs-uri-port=\S+).*(?<csurischeme>cs-uri-scheme=\S+).*(?<csusername>cs-username=\S+).*(?<action>s-action=\S+).*(?<sip>s-ip=\S+).*(?<scbytes>sc-bytes=\S+).*(?<status>sc-status=\S+).*(<timetaken>time-taken=\S+).*(?<url>c-url=\S+).*(?<csreferer>cs-Referer=\S+|).*(?<rip>r-ip=\S+).*(?<ssourceport>s-source-port=\S+).*/\1 \2 \3 \4 \5 \6 \7 \8 \9 \10 \11 \12 \13 \14 \15 \16 \17 \18/g"Because you are using \S+ you don't need the \s anchors as well as the .* - apart from removing the \s, you can add | to the match strings you want to be optional (if you didn't remove the \s, your anchor could have equated to two spaces instead of one)
Hi @DanAlexander,
could you share some sample of your logs, it's difficoult to check your regex!
anyway, you don't need the "mode=sed" for fields extraction.
In addition, are you shure that the structure of your logs is always the same?
because if there's something different you loose all the field extraction, so it could be better to use a different rex copmand for each field extraction, so you can understand what's the regex that isn't correct.
Ciao.
Giuseppe
Thanks for the reply @gcusello,
The "Allow" proxy events have a similar structure. However, a small difference like the cs-Referer pair missing will render the regex useless. I cannot imagine I have to build regex for each individual event which differs slightly.
Looking for a unified regex that would work and strip pairs as defined in the regex for each event.
Regards,
Dan
Hi @DanAlexander ,
regexes depend on the logs you have, and they must adapt to them or most of them.
if you share a sample of your logs I could try to analyze your regex.
Ciao.
Giuseppe
Hi
I you are just extracting kv paired data, can you use command extrac / kv for that?
If you need to do it with rex then in your current approach all events must be on the same order. You could add ? after group to tell that this is optional if needed, but then this will be quite hard to read. Maybe better option is use several rex rows and extract always only one value per rex?
r. Ismo
Hi, thanks for the reply @isoutamo
Can please be able to provide me with some tips on how to implement all you suggested based on the below, please?
SEDCMD-Cleaning_allowed_logs=s/(?<cip>c-ip=\S+)\s.*(?<csbytes>cs-bytes=\S+)\s.*(?<cscategories>cs-categories=\S+)\s.*(?<cshost>cs-host=\S+)\s.*(?<csip>cs-ip=\S+)\s.*(?<csmethod>cs-method=\S+)\s.*(?<csuriport>cs-uri-port=\S+)\s.*(?<csurischeme>cs-uri-scheme=\S+)\s.*(?<csusername>cs-username=\S+)\s.*(?<action>s-action=\S+)\s.*(?<sip>s-ip=\S+)\s.*(?<scbytes>sc-bytes=\S+)\s.*(?<status>sc-status=\S+)\s.*(?<timetaken>time-taken=\S+)\s.*(?<url>c-url=\S+)\s.*(?<csreferer>cs-Referer=\S+)\s.*(?<rip>r-ip=\S+)\s.*(?<ssourceport>s-source-port=\S+)\s.*/\1 \2 \3 \4 \5 \6 \7 \8 \9 \10 \11 \12 \13 \14 \15 \16 \17 \18/g
The ? optional might be the solution for that. The SEDCMD and its rex are designed to strip and index the selected data only and this is why this regex is more specific.
Thank you!