rex - matching everything until a tab

wsw70 — Mon, 28 Nov 2011 10:54:51 GMT

Hello,

I am trying to parse a log from a Tipping Point IPS. An example of the log I get is (the log is cut for clarity, there is normally more on the line)

Nov 28 07:37:50 10.22.250.151 8 4   dab8b814-b100-11e0-06b9-e527e93f10b7    00000001-0001-0001-0001-000000004270    4270: HTTP: PHP Code Injection  4270

Everything is OK when parsing it via

rex "[a-zA-Z]+\\s+\\d+\\s+\\d+:\\d+:\\d+\\s+\\d+\\.\\d+\\.\\d+\\.\\d+\\s+(?P<ACTION>\\d+)\\s+(?P<CRIT>\\d+)\\s+[0-9-]+\\s+[0-9-]+\\s+(?P<ATTACKID>\\d+):"

and I get the ACTION, CRIT and ATTACKID fields. So far so good.

I then wanted to get the next piece of information which is the attack description (HTTP: PHP Code Injection). Fields are separated by a TAB. I therefore tried

rex "[a-zA-Z]+\\s+\\d+\\s+\\d+:\\d+:\\d+\\s+\\d+\\.\\d+\\.\\d+\\.\\d+\\s+(?P<ACTION>\\d+)\\s+(?P<CRIT>\\d+)\\s+[0-9-]+\\s+[0-9-]+\\s+(?P<ATTACKID>\\d+):\s+(?P<ATTACKNAME>.+)\\t\\d+"

the idea being to match every character up to the tab one. I end up catching the remaining of the line (ie. the match does not stop at the tab).

I tried to run this through Rubular with the source data copied/pasted from Splunk and it works (this is to say that there is indeed a tab as a separator, I also see this in the search window). Looks like there is a specific way to catch the tab character, or that \.+ catches everything until the end of the line.

Thanks a lot for any pointer (and sorry as my question must be obvious to someone used to regex) -- WoJ

Re: rex - matching everything until a tab

Ayn — Mon, 28 Nov 2011 11:39:52 GMT

You need to use a non-greedy match. The current greedy one looks like this:

(?P<ATTACKNAME>.+)\t

which tells the regex engine to return the longest possible match that satisfies the conditions. The corresponding non-greedy match would be (note the "?"):

(?P<ATTACKNAME>.+?)\t

This tells the regex engine to return the shortest possible match, i.e. only match up until the first tab character it finds.

Re: rex - matching everything until a tab

wsw70 — Mon, 28 Nov 2011 11:44:34 GMT

Thanks Ayn for the answer.
I also managed to do the same replacing \.+ by [^\t]+

topic Re: rex - matching everything until a tab in Splunk Search

rex - matching everything until a tab

Re: rex - matching everything until a tab

Re: rex - matching everything until a tab