Getting Data In

How do I filter data with props and transforms, and how can I only index a specific string?

dpanych
Communicator

I have a directory which is full of .html webpages. I'd like Splunk to index those html files, but only a specific string of text (if the file contains it). I got as far as having Splunk index the entire file if it contains the string, but now, how can I get Splunk to only index a portion of that file? I've done this in the past but can't seem to remember how it was done. I remember using SEDCMD to remove everything but the specific portion. What am I missing?

Trying to parse out these:
alt text

props.conf

[html]
TRANSFORMS-set = setnull,keepBuildFiles
SEDCMD-removeLines = s/[\r\n]+//g

transforms.conf

[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[keepBuildFiles]
REGEX = (Total\stime\:\s.*?\<br\>)
DEST_KEY = queue
FORMAT = indexQueue

data -- *answers.splunk.com is decoding the html. Here is raw (https://pastebin.com/ee3srkPM)

======================================================================&lt;br/&gt;
2019-05-07_02:58:29 --- Makefile@abcbuild2 (): src(for_release)&lt;br/&gt;
make[1]: Entering directory `/abc/builds/accrued/zzz/accu_ws/ZZZ_19.4.0_CM_CI/src'&lt;br/&gt;
2019-05-07_02:58:29 --- src/Makefile@abcbuild2 (src): idl(for_release)&lt;br/&gt;
make[2]: Entering directory `/abc/builds/accrued/zzz/accu_ws/ZZZ_19.4.0_CM_CI/src/idl'&lt;br/&gt;
2019-05-07_02:58:29 --- src/idl/Makefile@abcbuild2 (src/idl): idl(for_release)&lt;br/&gt;
Buildfile: /abc/builds/accrued/zzz/accu_ws/ZZZ_19.4.0_CM_CI/src/java/build.xml&lt;br/&gt;
&lt;br/&gt;
build_idl:&lt;br/&gt;
&lt;br/&gt;
find_modified_idl:&lt;br/&gt;
     [exec] New/Updated IDL:&lt;br/&gt;
     [echo] No Modified IDL detected... skipping code generation&lt;br/&gt;
&lt;br/&gt;
BUILD SUCCESSFUL&lt;br/&gt;
Total time: 2 seconds&lt;br/&gt;
0 Karma

woodcock
Esteemed Legend

Like this:

props.conf:

[html]
SEDCMD-removeLines = s/\<br\>//g
SHOULD_LINEMERGE = false
LINE_BREAKER = ((?:\<br\>)*[\r\n\s]+)(?=\d{4}-\d{2}-\d{2}_\d{2}:\d{2}:\d{2})
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d_%H:%M:%S
TRANSFORMS-set = setnull,keepBuildFiles

transforms.conf

[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[keepBuildFiles]
REGEX = (Total\stime\:\s.*?\<br\>)
DEST_KEY = queue
FORMAT = indexQueue
0 Karma

dpanych
Communicator

I was able to get SED working during search time with:

| rex field=_raw mode=sed "s/.*?(\<br\>\d+-\d+-\d+_\d+\:\d+\:\d+\s\-\-\-\sMakefile.*?Buildfile\:\s.*?Total\stime\:\s\d+\s\w+\<br\>).*/\1/g"

But it doesn't seem to work during index-time with props/transforms. Any one see anything wrong with my confs?

0 Karma
Get Updates on the Splunk Community!

October Community Champions: A Shoutout to Our Contributors!

As October comes to a close, we want to take a moment to celebrate the people who make the Splunk Community ...

Community Content Calendar, November Edition

Welcome to the November edition of our Community Spotlight! Each month, we dive into the Splunk Community to ...

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...