Getting Data In

How do I filter data with props and transforms, and how can I only index a specific string?

dpanych
Communicator

I have a directory which is full of .html webpages. I'd like Splunk to index those html files, but only a specific string of text (if the file contains it). I got as far as having Splunk index the entire file if it contains the string, but now, how can I get Splunk to only index a portion of that file? I've done this in the past but can't seem to remember how it was done. I remember using SEDCMD to remove everything but the specific portion. What am I missing?

Trying to parse out these:
alt text

props.conf

[html]
TRANSFORMS-set = setnull,keepBuildFiles
SEDCMD-removeLines = s/[\r\n]+//g

transforms.conf

[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[keepBuildFiles]
REGEX = (Total\stime\:\s.*?\<br\>)
DEST_KEY = queue
FORMAT = indexQueue

data -- *answers.splunk.com is decoding the html. Here is raw (https://pastebin.com/ee3srkPM)

======================================================================&lt;br/&gt;
2019-05-07_02:58:29 --- Makefile@abcbuild2 (): src(for_release)&lt;br/&gt;
make[1]: Entering directory `/abc/builds/accrued/zzz/accu_ws/ZZZ_19.4.0_CM_CI/src'&lt;br/&gt;
2019-05-07_02:58:29 --- src/Makefile@abcbuild2 (src): idl(for_release)&lt;br/&gt;
make[2]: Entering directory `/abc/builds/accrued/zzz/accu_ws/ZZZ_19.4.0_CM_CI/src/idl'&lt;br/&gt;
2019-05-07_02:58:29 --- src/idl/Makefile@abcbuild2 (src/idl): idl(for_release)&lt;br/&gt;
Buildfile: /abc/builds/accrued/zzz/accu_ws/ZZZ_19.4.0_CM_CI/src/java/build.xml&lt;br/&gt;
&lt;br/&gt;
build_idl:&lt;br/&gt;
&lt;br/&gt;
find_modified_idl:&lt;br/&gt;
     [exec] New/Updated IDL:&lt;br/&gt;
     [echo] No Modified IDL detected... skipping code generation&lt;br/&gt;
&lt;br/&gt;
BUILD SUCCESSFUL&lt;br/&gt;
Total time: 2 seconds&lt;br/&gt;
0 Karma

woodcock
Esteemed Legend

Like this:

props.conf:

[html]
SEDCMD-removeLines = s/\<br\>//g
SHOULD_LINEMERGE = false
LINE_BREAKER = ((?:\<br\>)*[\r\n\s]+)(?=\d{4}-\d{2}-\d{2}_\d{2}:\d{2}:\d{2})
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d_%H:%M:%S
TRANSFORMS-set = setnull,keepBuildFiles

transforms.conf

[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[keepBuildFiles]
REGEX = (Total\stime\:\s.*?\<br\>)
DEST_KEY = queue
FORMAT = indexQueue
0 Karma

dpanych
Communicator

I was able to get SED working during search time with:

| rex field=_raw mode=sed "s/.*?(\<br\>\d+-\d+-\d+_\d+\:\d+\:\d+\s\-\-\-\sMakefile.*?Buildfile\:\s.*?Total\stime\:\s\d+\s\w+\<br\>).*/\1/g"

But it doesn't seem to work during index-time with props/transforms. Any one see anything wrong with my confs?

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

.conf25 Global Broadcast: Don’t Miss a Moment

Hello Splunkers, .conf25 is only a click away.  Not able to make it to .conf25 in person? No worries, you can ...

Observe and Secure All Apps with Splunk

 Join Us for Our Next Tech Talk: Observe and Secure All Apps with SplunkAs organizations continue to innovate ...

What's New in Splunk Observability - August 2025

What's New We are excited to announce the latest enhancements to Splunk Observability Cloud as well as what is ...