I have a directory which is full of .html webpages. I'd like Splunk to index those html files, but only a specific string of text (if the file contains it). I got as far as having Splunk index the entire file if it contains the string, but now, how can I get Splunk to only index a portion of that file? I've done this in the past but can't seem to remember how it was done. I remember using SEDCMD to remove everything but the specific portion. What am I missing?
Trying to parse out these:
props.conf
[html]
TRANSFORMS-set = setnull,keepBuildFiles
SEDCMD-removeLines = s/[\r\n]+//g
transforms.conf
[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue
[keepBuildFiles]
REGEX = (Total\stime\:\s.*?\<br\>)
DEST_KEY = queue
FORMAT = indexQueue
data -- *answers.splunk.com is decoding the html. Here is raw (https://pastebin.com/ee3srkPM)
======================================================================<br/>
2019-05-07_02:58:29 --- Makefile@abcbuild2 (): src(for_release)<br/>
make[1]: Entering directory `/abc/builds/accrued/zzz/accu_ws/ZZZ_19.4.0_CM_CI/src'<br/>
2019-05-07_02:58:29 --- src/Makefile@abcbuild2 (src): idl(for_release)<br/>
make[2]: Entering directory `/abc/builds/accrued/zzz/accu_ws/ZZZ_19.4.0_CM_CI/src/idl'<br/>
2019-05-07_02:58:29 --- src/idl/Makefile@abcbuild2 (src/idl): idl(for_release)<br/>
Buildfile: /abc/builds/accrued/zzz/accu_ws/ZZZ_19.4.0_CM_CI/src/java/build.xml<br/>
<br/>
build_idl:<br/>
<br/>
find_modified_idl:<br/>
[exec] New/Updated IDL:<br/>
[echo] No Modified IDL detected... skipping code generation<br/>
<br/>
BUILD SUCCESSFUL<br/>
Total time: 2 seconds<br/>
Like this:
props.conf:
[html]
SEDCMD-removeLines = s/\<br\>//g
SHOULD_LINEMERGE = false
LINE_BREAKER = ((?:\<br\>)*[\r\n\s]+)(?=\d{4}-\d{2}-\d{2}_\d{2}:\d{2}:\d{2})
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d_%H:%M:%S
TRANSFORMS-set = setnull,keepBuildFiles
transforms.conf
[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue
[keepBuildFiles]
REGEX = (Total\stime\:\s.*?\<br\>)
DEST_KEY = queue
FORMAT = indexQueue
I was able to get SED working during search time with:
| rex field=_raw mode=sed "s/.*?(\<br\>\d+-\d+-\d+_\d+\:\d+\:\d+\s\-\-\-\sMakefile.*?Buildfile\:\s.*?Total\stime\:\s\d+\s\w+\<br\>).*/\1/g"
But it doesn't seem to work during index-time with props/transforms. Any one see anything wrong with my confs?