I am trying to match text inside a large multi line Event. I have the index working ok. But in transforms.conf it fails to match anything past the first line. I was able to verify this my matching . and then a regex that matched the first line. Then I tried to match the last line and it fails yet the data ends up in the index.
Here are my props.conf and my transforms.conf:
props.conf
[mod_security]
LEARN_MODEL = false
sourcetype = mod_security
TRUNCATE = 0
#SHOULD_LINEMERGE = true
MUST_NOT_BREAK_AFTER = (--[a-z0-9]+-A--)
MUST_BREAK_AFTER = (--[a-z0-9]+-Z--)
TRANSFORMS-nomore = nomore
transforms.conf
[nomore]
REGEX=(m?)--[a-z0-9]+-Z--
DEST_KEY=queue
FORMAT=nullQueue
As a test I tried to match the last line. It fails. But if I match the first line of the event or . it works. I makes no sense. Also I have tried both with and without SHOULD_LINEMERGE.
Thanks in advance to any help.
Here is an example Event as requested.
--6c7cd57c-A--
[12/Jan/2011:10:59:29 --0600] TS3d8UijBKywIAAABu 53047 99.99.99.99 80
--6c7cd57c-B--
GET /pcgi-bin/sreg2/register/0 HTTP/1.1
X-NATPath: 99.99.99.99:53047, 99.99.99.99, 99.99.99.99:60136
Host: www.neusse.com
X-Forwarded-For: 99.99.99.99, 99.99.99.99
Accept: */*
Accept-Language: en-us
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.1; MS-RTC LM 8)
True-Client-IP: 99.99.99.99
Pragma: no-cache
X-Akamai-CONFIG-LOG-DETAIL: true
Accept-Encoding: gzip
Akamai-Origin-Hop: 2
Via: 1.1 v1-akamaitech.net(ghost) (AkamaiGHost), 1.1 akamai.net(ghost) (AkamaiGHost)
Cache-Control: no-cache, max-age=0
SM_AUTHTYPE: Auto
SM_SDOMAIN: .neusse.com
Max-Forwards: 10
X-Forwarded-Host: www.neusse.com
X-Forwarded-Server: www.neusse.com
--6c7cd57c-F--
HTTP/1.1 404 Not Found
Last-Modified: Wed, 25 Oct 2006 03:10:45 GMT
ETag: "3cdd"
Accept-Ranges: bytes
Content-Length: 15581
CDCHOST: neusse-prod1-203
Content-Type: text/html
X-Pad: avoid browser bug
CDCWPB: neusse-prod1-02
CDCXRP: neusse-prod1-01
Connection: close
--6c7cd57c-H--
Apache-Handler: proxy-server
Stopwatch: 1294851569931732 32492 (168 9176 -)
Producer: ModSecurity v2.1.7 (Apache 2.x)
Server: Apache/2.2
--6c7cd57c-Z--
Yes there is a solution. I found nothing anywhere describing this in splunk.com.
My issue of not being able to search into the event deep enough at index time was solved by using the simple command LOOKAHEAD in transforms.conf. Turns out splunk does not look far ahead related to REGEX at all when indexing. It seems to only be looking for the end ofthe transaction as a priority.
Here are my working props.conf and transforms.conf:
transforms.conf
[nomore]
LOOKAHEAD = 100000
REGEX=(?m)(404\sNot\sFound)
DEST_KEY=queue
FORMAT=nullQueue
props.conf
[mod_security]
SHOULD_LINEMERGE = true
MUST_NOT_BREAK_AFTER = (--[a-z0-9]+-A--)
MUST_BREAK_AFTER = (--[a-z0-9]+-Z--)
TRUNCATE = 0
TRANSFORMS-notfounderror = nomore
At least it is working now!
I confirm this solution.
recently I have faced same problem with REGEX in transforms.conf.
I had to move all events that have "PROD" string, below is the stanza:
[index_f5]
SOURCE_KEY = _raw
REGEX = (PROD)
DEST_KEY = _MetaData:Index
FORMAT = f5
The above was working but not for the all events.
The problem were big events (i.e. > 10000 characters), then REGEX rule didn't work and was no transform.
According to this thread I added:
LOOKAHEAD=..
And it is working now.
Below inf. from spec:
LOOKAHEAD =
* NOTE: This option is valid for all index time transforms, such as
index-time field creation, or DEST_KEY modifications.
* Optional. Specifies how many characters to search into an event.
* Defaults to 4096.
* You may want to increase this value if you have event line lengths that
exceed 4096 characters (before linebreaking).
Thanks and regards,
Wojtek
Yes there is a solution. I found nothing anywhere describing this in splunk.com.
My issue of not being able to search into the event deep enough at index time was solved by using the simple command LOOKAHEAD in transforms.conf. Turns out splunk does not look far ahead related to REGEX at all when indexing. It seems to only be looking for the end ofthe transaction as a priority.
Here are my working props.conf and transforms.conf:
transforms.conf
[nomore]
LOOKAHEAD = 100000
REGEX=(?m)(404\sNot\sFound)
DEST_KEY=queue
FORMAT=nullQueue
props.conf
[mod_security]
SHOULD_LINEMERGE = true
MUST_NOT_BREAK_AFTER = (--[a-z0-9]+-A--)
MUST_BREAK_AFTER = (--[a-z0-9]+-Z--)
TRUNCATE = 0
TRANSFORMS-notfounderror = nomore
At least it is working now!
You are a lifesaver. I tried depth_limit and match_limit were no help. Had to do a little google foo to get this answer. Seem silly to pick 4k as a limit vs something like 8k is default for syslog-ng and 10k for other splunk limits.
having our events cascading through multiple meta data modification has not been an issue, then we noticed long events were being singled out and not routing where they are supposed to be going.
You don't need all this. You props.conf should just need to be:
[mod_security]
SHOULD_LINEMERGE = false
TRUNCATE = 0
LINE_BREAKER = ([\r\n]+)--\w+-A--
TRANSFORMS-notfound = four_oh_four
and transforms:
[four_oh_four]
REGEX = \vHTTP/1.1 404
DEST_KEY = queue
FORMAT = nullQueue
Also, I don't know about your line breaking rules (I kind of think the above is simpler, and it is definitely more efficient), but if the events are breaking correctly, your problem is simply that your REGEX is wrong, and should probably start with (?m)
instead of (m?)
I replaced all my stuff with yours. No go. It still indexes ALL Events. It is not regexing past the first line. If I change the match from 404 to -A-- like the first line, then Nothing indexes. But anything other than the first line and it never matches.
I wish it was as simple as this or I would have been done a week ago.
Can you please explain the MUST_NOT_BREAK_AFTER and MUST_BREAK_AFTER? I'm not sure how Splunk natively ends lines. What do the flags mean? Thank you.
The above was a test. If it could match the last section then I would have no data indexed. But it is indexing ALL data.
I want to throw away the entire even. What I really want is to eliminate events that have "404 Not Found" in side them. But I cant get the regex to match past the first line.
The events are Mod_Security logs.
Each event has sections that start with "--ab123cd-A--" A-Z for all sections.
The last section has no data but marks the end of the event "-ab123cd-Z--"
Within section H there is the error for a "404 Not Found" I would like to match.
But I thought I could just match the error text. But the regex only matches against the first line. (m?) has no effect.
I am stumped at this point.
Are you trying to break your events & throw away just the last line of each event?
Could you share a few events worth of data?