Splunk Search

Transforms REGEX will not search past first line of multi line event. What gives?

neusse
Path Finder

I am trying to match text inside a large multi line Event. I have the index working ok. But in transforms.conf it fails to match anything past the first line. I was able to verify this my matching . and then a regex that matched the first line. Then I tried to match the last line and it fails yet the data ends up in the index.

Here are my props.conf and my transforms.conf:


props.conf

[mod_security]
LEARN_MODEL = false
sourcetype = mod_security
TRUNCATE = 0
#SHOULD_LINEMERGE     = true
MUST_NOT_BREAK_AFTER = (--[a-z0-9]+-A--)
MUST_BREAK_AFTER     = (--[a-z0-9]+-Z--)
TRANSFORMS-nomore     = nomore


transforms.conf

[nomore]
REGEX=(m?)--[a-z0-9]+-Z--
DEST_KEY=queue
FORMAT=nullQueue

As a test I tried to match the last line. It fails. But if I match the first line of the event or . it works. I makes no sense. Also I have tried both with and without SHOULD_LINEMERGE.

Thanks in advance to any help.

Here is an example Event as requested.


--6c7cd57c-A--
[12/Jan/2011:10:59:29 --0600] TS3d8UijBKywIAAABu  53047 99.99.99.99 80
--6c7cd57c-B--
GET /pcgi-bin/sreg2/register/0 HTTP/1.1
X-NATPath: 99.99.99.99:53047, 99.99.99.99, 99.99.99.99:60136
Host: www.neusse.com
X-Forwarded-For: 99.99.99.99, 99.99.99.99
Accept: */*
Accept-Language: en-us
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.1; MS-RTC LM 8)
True-Client-IP: 99.99.99.99
Pragma: no-cache
X-Akamai-CONFIG-LOG-DETAIL: true
Accept-Encoding: gzip
Akamai-Origin-Hop: 2
Via: 1.1 v1-akamaitech.net(ghost) (AkamaiGHost), 1.1 akamai.net(ghost) (AkamaiGHost)
Cache-Control: no-cache, max-age=0
SM_AUTHTYPE: Auto
SM_SDOMAIN: .neusse.com
Max-Forwards: 10
X-Forwarded-Host: www.neusse.com
X-Forwarded-Server: www.neusse.com
--6c7cd57c-F--
HTTP/1.1 404 Not Found
Last-Modified: Wed, 25 Oct 2006 03:10:45 GMT
ETag: "3cdd"
Accept-Ranges: bytes
Content-Length: 15581
CDCHOST: neusse-prod1-203
Content-Type: text/html
X-Pad: avoid browser bug
CDCWPB: neusse-prod1-02
CDCXRP: neusse-prod1-01
Connection: close
--6c7cd57c-H--
Apache-Handler: proxy-server
Stopwatch: 1294851569931732 32492 (168 9176 -)
Producer: ModSecurity v2.1.7 (Apache 2.x)
Server: Apache/2.2
--6c7cd57c-Z--

Tags (3)
1 Solution

neusse
Path Finder

Yes there is a solution. I found nothing anywhere describing this in splunk.com.

My issue of not being able to search into the event deep enough at index time was solved by using the simple command LOOKAHEAD in transforms.conf. Turns out splunk does not look far ahead related to REGEX at all when indexing. It seems to only be looking for the end ofthe transaction as a priority.

Here are my working props.conf and transforms.conf:


transforms.conf

[nomore]
LOOKAHEAD = 100000
REGEX=(?m)(404\sNot\sFound)
DEST_KEY=queue
FORMAT=nullQueue


props.conf

[mod_security]
SHOULD_LINEMERGE = true
MUST_NOT_BREAK_AFTER = (--[a-z0-9]+-A--)
MUST_BREAK_AFTER = (--[a-z0-9]+-Z--)
TRUNCATE = 0
TRANSFORMS-notfounderror = nomore

At least it is working now!

View solution in original post

wojtek_emca
New Member

I confirm this solution.

recently I have faced same problem with REGEX in transforms.conf.

I had to move all events that have "PROD" string, below is the stanza:

[index_f5]
SOURCE_KEY = _raw
REGEX = (PROD)
DEST_KEY = _MetaData:Index
FORMAT = f5

The above was working but not for the all events.
The problem were big events (i.e. > 10000 characters), then REGEX rule didn't work and was no transform.

According to this thread I added:
LOOKAHEAD=..

And it is working now.

Below inf. from spec:

LOOKAHEAD =
* NOTE: This option is valid for all index time transforms, such as
index-time field creation, or DEST_KEY modifications.
* Optional. Specifies how many characters to search into an event.
* Defaults to 4096.
* You may want to increase this value if you have event line lengths that
exceed 4096 characters (before linebreaking).

Thanks and regards,
Wojtek

0 Karma

neusse
Path Finder

Yes there is a solution. I found nothing anywhere describing this in splunk.com.

My issue of not being able to search into the event deep enough at index time was solved by using the simple command LOOKAHEAD in transforms.conf. Turns out splunk does not look far ahead related to REGEX at all when indexing. It seems to only be looking for the end ofthe transaction as a priority.

Here are my working props.conf and transforms.conf:


transforms.conf

[nomore]
LOOKAHEAD = 100000
REGEX=(?m)(404\sNot\sFound)
DEST_KEY=queue
FORMAT=nullQueue


props.conf

[mod_security]
SHOULD_LINEMERGE = true
MUST_NOT_BREAK_AFTER = (--[a-z0-9]+-A--)
MUST_BREAK_AFTER = (--[a-z0-9]+-Z--)
TRUNCATE = 0
TRANSFORMS-notfounderror = nomore

At least it is working now!

View solution in original post

cesaccenturefed
Path Finder

You are a lifesaver. I tried depth_limit and match_limit were no help. Had to do a little google foo to get this answer. Seem silly to pick 4k as a limit vs something like 8k is default for syslog-ng and 10k for other splunk limits.

having our events cascading through multiple meta data modification has not been an issue, then we noticed long events were being singled out and not routing where they are supposed to be going.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

You don't need all this. You props.conf should just need to be:

[mod_security]
SHOULD_LINEMERGE = false
TRUNCATE = 0
LINE_BREAKER = ([\r\n]+)--\w+-A--
TRANSFORMS-notfound = four_oh_four

and transforms:

[four_oh_four]
REGEX = \vHTTP/1.1 404
DEST_KEY = queue
FORMAT = nullQueue


Also, I don't know about your line breaking rules (I kind of think the above is simpler, and it is definitely more efficient), but if the events are breaking correctly, your problem is simply that your REGEX is wrong, and should probably start with (?m) instead of (m?)

neusse
Path Finder

I replaced all my stuff with yours. No go. It still indexes ALL Events. It is not regexing past the first line. If I change the match from 404 to -A-- like the first line, then Nothing indexes. But anything other than the first line and it never matches.

I wish it was as simple as this or I would have been done a week ago.

0 Karma

e2eadmin
Explorer

Can you please explain the MUST_NOT_BREAK_AFTER and MUST_BREAK_AFTER? I'm not sure how Splunk natively ends lines. What do the flags mean? Thank you.

0 Karma

neusse
Path Finder

The above was a test. If it could match the last section then I would have no data indexed. But it is indexing ALL data.

0 Karma

neusse
Path Finder

I want to throw away the entire even. What I really want is to eliminate events that have "404 Not Found" in side them. But I cant get the regex to match past the first line.

The events are Mod_Security logs.
Each event has sections that start with "--ab123cd-A--" A-Z for all sections.
The last section has no data but marks the end of the event "-ab123cd-Z--"

Within section H there is the error for a "404 Not Found" I would like to match.

But I thought I could just match the error text. But the regex only matches against the first line. (m?) has no effect.

I am stumped at this point.

0 Karma

bwooden
Splunk Employee
Splunk Employee

Are you trying to break your events & throw away just the last line of each event?

0 Karma

bwooden
Splunk Employee
Splunk Employee

Could you share a few events worth of data?

0 Karma
.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!