Getting Data In

How do I filter data with props and transforms, and how can I only index a specific string?

dpanych
Communicator

I have a directory which is full of .html webpages. I'd like Splunk to index those html files, but only a specific string of text (if the file contains it). I got as far as having Splunk index the entire file if it contains the string, but now, how can I get Splunk to only index a portion of that file? I've done this in the past but can't seem to remember how it was done. I remember using SEDCMD to remove everything but the specific portion. What am I missing?

Trying to parse out these:
alt text

props.conf

[html]
TRANSFORMS-set = setnull,keepBuildFiles
SEDCMD-removeLines = s/[\r\n]+//g

transforms.conf

[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[keepBuildFiles]
REGEX = (Total\stime\:\s.*?\<br\>)
DEST_KEY = queue
FORMAT = indexQueue

data -- *answers.splunk.com is decoding the html. Here is raw (https://pastebin.com/ee3srkPM)

======================================================================&lt;br/&gt;
2019-05-07_02:58:29 --- Makefile@abcbuild2 (): src(for_release)&lt;br/&gt;
make[1]: Entering directory `/abc/builds/accrued/zzz/accu_ws/ZZZ_19.4.0_CM_CI/src'&lt;br/&gt;
2019-05-07_02:58:29 --- src/Makefile@abcbuild2 (src): idl(for_release)&lt;br/&gt;
make[2]: Entering directory `/abc/builds/accrued/zzz/accu_ws/ZZZ_19.4.0_CM_CI/src/idl'&lt;br/&gt;
2019-05-07_02:58:29 --- src/idl/Makefile@abcbuild2 (src/idl): idl(for_release)&lt;br/&gt;
Buildfile: /abc/builds/accrued/zzz/accu_ws/ZZZ_19.4.0_CM_CI/src/java/build.xml&lt;br/&gt;
&lt;br/&gt;
build_idl:&lt;br/&gt;
&lt;br/&gt;
find_modified_idl:&lt;br/&gt;
     [exec] New/Updated IDL:&lt;br/&gt;
     [echo] No Modified IDL detected... skipping code generation&lt;br/&gt;
&lt;br/&gt;
BUILD SUCCESSFUL&lt;br/&gt;
Total time: 2 seconds&lt;br/&gt;
0 Karma

woodcock
Esteemed Legend

Like this:

props.conf:

[html]
SEDCMD-removeLines = s/\<br\>//g
SHOULD_LINEMERGE = false
LINE_BREAKER = ((?:\<br\>)*[\r\n\s]+)(?=\d{4}-\d{2}-\d{2}_\d{2}:\d{2}:\d{2})
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d_%H:%M:%S
TRANSFORMS-set = setnull,keepBuildFiles

transforms.conf

[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[keepBuildFiles]
REGEX = (Total\stime\:\s.*?\<br\>)
DEST_KEY = queue
FORMAT = indexQueue
0 Karma

dpanych
Communicator

I was able to get SED working during search time with:

| rex field=_raw mode=sed "s/.*?(\<br\>\d+-\d+-\d+_\d+\:\d+\:\d+\s\-\-\-\sMakefile.*?Buildfile\:\s.*?Total\stime\:\s\d+\s\w+\<br\>).*/\1/g"

But it doesn't seem to work during index-time with props/transforms. Any one see anything wrong with my confs?

0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

REGISTER NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If ...

Observability | Use Synthetic Monitoring for Website Metadata Verification

If you are on Splunk Observability Cloud, you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...