Getting Data In

Ingest only rows containing certain text from log file

joesrepsolc
Communicator

Have a very large log file (20,000+ lines per log file) and I only need the rows that contain "tell_group.pl" in them. Some start the line with that text, others have a "+ " before it. Hoping to build a props.conf that only ingest these lines from the log into a single event (1 log file = 1 event). So for each source file, I need all the lines (full line) that contain "tell_group.pl"

ROWS
ROWS
ROWS
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                  : $MEDSA_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                 : 1245
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                  : $MEDSB_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                 : 350
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                  : $MEDSC_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                 : 164
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                  : $MEDSD_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                 : 0
ROWS
ROWS
ROWS

THANKS IN ADVANCE!

Joe

0 Karma

darrenfuller
Contributor

Try this:

[answers786699]
disabled = false
DATETIME_CONFIG = CURRENT
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)\*\*\*\*xxxfail
TRUNCATE = 10000

SEDCMD-01-Remove_lines_part_1 = s/[\r\n]+(?!.*(tell_group\.pl)).*//g
SEDCMD-02-Remove_lines_part_2 = s/^(?!.*(tell_group\.pl)).*[\r\n]//g

Explanation:

1) ingest the whole file as a single event...

This is done with this line:
LINE_BREAKER = ([\r\n]+)\*\*\*\*xxxfail

Which tells splunk to only break when it reaches a carriage return followed by the exact string "****xxxfail" . If your files could be larger than 10000 lines, then also adjust the "TRUNCATE =" to be larger than your largest file (and probably include a buffer above that)... In the unlikely event that you do have ****xxxfail in your data, just change this to be an even more ridiculous and unlikely string... like It\sturns\sout\sthat\sthe\searth\sis\sflat or something

2) Remove all lines that don't have "tell_group.pl" somewhere in the line.

This is accomplished with the three SEDCMD lines .. they operate as follows:

SEDCMD-01-Remove_lines_part_1 = s/[\r\n]+(?!.*(tell_group\.pl)).*//g

This removes all lines from the file that do not have tell_group.pl in them ... When this line is applied by itself, the above file ingests as so:

ROWS




tell_group.pl MSG "NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                  : $MEDSA_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                 : 1245


tell_group.pl MSG "NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                  : $MEDSB_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                 : 350


tell_group.pl MSG "NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                  : $MEDSC_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                 : 164


tell_group.pl MSG "NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                  : $MEDSD_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                 : 0

That first regex will work on all lines except the first line in the file (and it leaves a bunch of empty lines as well). To get rid of those, i used a variation of the first SEDCMD, only with the [\r\n]+ at the end of the match.

SEDCMD-02-Remove_lines_part_2 = s/^(?!.*(tell_group\.pl)).*[\r\n]//g

after this is done, we are left with:

tell_group.pl MSG "NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                  : $MEDSA_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                 : 1245
tell_group.pl MSG "NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                  : $MEDSB_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                 : 350
tell_group.pl MSG "NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                  : $MEDSC_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                 : 164
tell_group.pl MSG "NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                  : $MEDSD_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                 : 0

Which i believe answers your requirements. Hope this helps
./Darren

0 Karma

supreet
Explorer

Nice, I tried this and looks like it is working. Question: Does this mean only a part of my log file will be ingested so I am not using the whole log's disk space in my License ? Actually I only want to ingest a part of my debug logs (which are huge). Also, can we line break the events after this conversion so we have different events again after ingestion. @darrenfuller @woodcock 

0 Karma

woodcock
Esteemed Legend

If this is a one-time effort, use the add oneshot command and filter it first, something like this:

grep "tell_group.pl" /Your/Source/Path/And/Filname/Here > /tmp/ERASEME.txt
$SPLUNK_HOME/bin/splunk add oneshot /tmp/ERASEME.txt -sourcetype YourSourcetypeHere -index YourIndexHere -rename-source "/Your/Source/Path/And/Filname/Here"
rm -f /tmp/ERASEME.txt
0 Karma

supreet
Explorer

For me, it is going to be ongoing thing and not a one time effort. So wondering if there is a way to achieve this

0 Karma

darrenfuller
Contributor

Is there a timestamp anywhere in the file or should the props just use the index time?

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...