I want to remove a string from _raw that appears as a field in Splunk say host. For example if I have the _raw message:
<ConMan> Console [hype33] log at 2013-08-15 00:00:00 PDT.
2013-08-15 14:25:48 Setting hostname hype362: [ OK ]
The following search gets rid of date, time and any digits in _raw
|rex mode=sed "s/\d{1,}//g" |rex mode=sed "s/(Jan|January|Feb|Febuary|Mar|March|Apr|April|May|Jun|June|Jul|July|Aug|August|Sep|September|Oct|October|Nov|November|Dec|December|Mon|Tue|Wed|Thu|Fri|Sat|Sun)//g" | rename _raw AS msgdigest
So the msgdigest then becomes:
<ConMan> Console [hype] log at -- :: PDT.
-- :: Setting hostname hype: [ OK ]
As my _raw message and say hype is a type of host I want to have
<ConMan> Console [] log at -- :: PDT.
-- :: Setting hostname: [ OK ]
The final goal here is to create a digest of _raw that has more detail than punct as I find that sometimes errors that are not actually similar have the same punct. So I am making hybrid of _raw and punct so to speak. I may try to make this available as an app in the long run.
I feel as though Splunk needs to have an easy way to identify values of a field inside of regex (added on to just perl re). This would make it easier to do a lot of things or at least give us more options.
You will need to use a transform.conf and props.conf. You you will do a capture and exclude the values you don't want and apply it at search time with REPORT. I didnt check my regex but this should give you some ideas.
#transform.conf
[data-anonymizer]
REGEX = (?m)^(.*\[\w[^\d])\d+(\].*)
FORMAT = $1$2
DEST_KEY = _raw
#props.conf
[yoursource]
REPORT-anonymizer=data-anonymizer
Hope this helps or gives you some ideas. Dont forget to vote and accept answers that help.
Cheers
I tried to create two capture groups .
$1 =
2013-08-15 14:25:48 Setting hostname hype362: [ OK ]. The two capture groups exclude the 33 value. using format = $1$1 to replace _raw the event should contain the whole event excluding 33. I know this works during indexing phase and should doing search.
Can you explain a little more what you are doing with the regex?
Fear not, I'm in the process of getting access to those files so it may take a day or two.
If I assume correctly, you want to remove whatever's between the []. In your example of
Console [hype] log at -- :: PDT.
You want to get rid of the word hype.
I ran the following regex on for you on http://gskinner.com/RegExr/ on the above line
(?<=\[).*?(?=\])
This uses a positive lookbehind and a positive lookahead to search for the first [ and the first ] symbol and select everything in between. You could use this to do a find/replace and replace the text selected by the regex with nothing to get rid of it.
I'm afraid this will only cover a few cases the [] do not always have anything to do with the field I also want to get rid of stuff like user names with aren't ever in brackets. Thanks for trying.