topic Re: Ingest only rows containing certain text from log file in Getting Data In

Ingest only rows containing certain text from log file

joesrepsolc — Wed, 30 Sep 2020 03:10:30 GMT

Have a very large log file (20,000+ lines per log file) and I only need the rows that contain "tell_group.pl" in them. Some start the line with that text, others have a "+ " before it. Hoping to build a props.conf that only ingest these lines from the log into a single event (1 log file = 1 event). So for each source file, I need all the lines (full line) that contain "tell_group.pl"

ROWS
ROWS
ROWS
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                  : $MEDSA_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                 : 1245
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                  : $MEDSB_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                 : 350
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                  : $MEDSC_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                 : 164
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                  : $MEDSD_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                 : 0
ROWS
ROWS
ROWS

THANKS IN ADVANCE!

Joe

Re: Ingest only rows containing certain text from log file

darrenfuller — Mon, 02 Dec 2019 20:30:26 GMT

Is there a timestamp anywhere in the file or should the props just use the index time?

Re: Ingest only rows containing certain text from log file

woodcock — Mon, 02 Dec 2019 21:09:59 GMT

If this is a one-time effort, use the add oneshot command and filter it first, something like this:

grep "tell_group.pl" /Your/Source/Path/And/Filname/Here > /tmp/ERASEME.txt
$SPLUNK_HOME/bin/splunk add oneshot /tmp/ERASEME.txt -sourcetype YourSourcetypeHere -index YourIndexHere -rename-source "/Your/Source/Path/And/Filname/Here"
rm -f /tmp/ERASEME.txt

Re: Ingest only rows containing certain text from log file

darrenfuller — Mon, 02 Dec 2019 22:53:59 GMT

Try this:

[answers786699]
disabled = false
DATETIME_CONFIG = CURRENT
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)\*\*\*\*xxxfail
TRUNCATE = 10000

SEDCMD-01-Remove_lines_part_1 = s/[\r\n]+(?!.*(tell_group\.pl)).*//g
SEDCMD-02-Remove_lines_part_2 = s/^(?!.*(tell_group\.pl)).*[\r\n]//g

Explanation:

1) ingest the whole file as a single event...

This is done with this line:
LINE_BREAKER = ([\r\n]+)\*\*\*\*xxxfail

Which tells splunk to only break when it reaches a carriage return followed by the exact string "****xxxfail" . If your files could be larger than 10000 lines, then also adjust the "TRUNCATE =" to be larger than your largest file (and probably include a buffer above that)... In the unlikely event that you do have ****xxxfail in your data, just change this to be an even more ridiculous and unlikely string... like It\sturns\sout\sthat\sthe\searth\sis\sflat or something

2) Remove all lines that don't have "tell_group.pl" somewhere in the line.

This is accomplished with the three SEDCMD lines .. they operate as follows:

SEDCMD-01-Remove_lines_part_1 = s/[\r\n]+(?!.*(tell_group\.pl)).*//g

This removes all lines from the file that do not have tell_group.pl in them ... When this line is applied by itself, the above file ingests as so:

ROWS




tell_group.pl MSG "NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                  : $MEDSA_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                 : 1245


tell_group.pl MSG "NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                  : $MEDSB_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                 : 350


tell_group.pl MSG "NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                  : $MEDSC_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                 : 164


tell_group.pl MSG "NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                  : $MEDSD_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                 : 0

That first regex will work on all lines except the first line in the file (and it leaves a bunch of empty lines as well). To get rid of those, i used a variation of the first SEDCMD, only with the [\r\n]+ at the end of the match.

SEDCMD-02-Remove_lines_part_2 = s/^(?!.*(tell_group\.pl)).*[\r\n]//g

after this is done, we are left with:

tell_group.pl MSG "NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                  : $MEDSA_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                 : 1245
tell_group.pl MSG "NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                  : $MEDSB_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                 : 350
tell_group.pl MSG "NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                  : $MEDSC_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                 : 164
tell_group.pl MSG "NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                  : $MEDSD_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                 : 0

Which i believe answers your requirements. Hope this helps
./Darren

Re: Ingest only rows containing certain text from log file

supreet — Tue, 06 Feb 2024 16:52:02 GMT

Nice, I tried this and looks like it is working. Question: Does this mean only a part of my log file will be ingested so I am not using the whole log's disk space in my License ? Actually I only want to ingest a part of my debug logs (which are huge). Also, can we line break the events after this conversion so we have different events again after ingestion. @darrenfuller @woodcock

Re: Ingest only rows containing certain text from log file

supreet — Tue, 06 Feb 2024 16:52:49 GMT

For me, it is going to be ongoing thing and not a one time effort. So wondering if there is a way to achieve this