Getting Data In

Ingest only rows containing certain text from log file

joesrepsolc
Communicator

Have a very large log file (20,000+ lines per log file) and I only need the rows that contain "tell_group.pl" in them. Some start the line with that text, others have a "+ " before it. Hoping to build a props.conf that only ingest these lines from the log into a single event (1 log file = 1 event). So for each source file, I need all the lines (full line) that contain "tell_group.pl"

ROWS
ROWS
ROWS
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                  : $MEDSA_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                 : 1245
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                  : $MEDSB_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                 : 350
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                  : $MEDSC_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                 : 164
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                  : $MEDSD_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                 : 0
ROWS
ROWS
ROWS

THANKS IN ADVANCE!

Joe

0 Karma

darrenfuller
Contributor

Try this:

[answers786699]
disabled = false
DATETIME_CONFIG = CURRENT
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)\*\*\*\*xxxfail
TRUNCATE = 10000

SEDCMD-01-Remove_lines_part_1 = s/[\r\n]+(?!.*(tell_group\.pl)).*//g
SEDCMD-02-Remove_lines_part_2 = s/^(?!.*(tell_group\.pl)).*[\r\n]//g

Explanation:

1) ingest the whole file as a single event...

This is done with this line:
LINE_BREAKER = ([\r\n]+)\*\*\*\*xxxfail

Which tells splunk to only break when it reaches a carriage return followed by the exact string "****xxxfail" . If your files could be larger than 10000 lines, then also adjust the "TRUNCATE =" to be larger than your largest file (and probably include a buffer above that)... In the unlikely event that you do have ****xxxfail in your data, just change this to be an even more ridiculous and unlikely string... like It\sturns\sout\sthat\sthe\searth\sis\sflat or something

2) Remove all lines that don't have "tell_group.pl" somewhere in the line.

This is accomplished with the three SEDCMD lines .. they operate as follows:

SEDCMD-01-Remove_lines_part_1 = s/[\r\n]+(?!.*(tell_group\.pl)).*//g

This removes all lines from the file that do not have tell_group.pl in them ... When this line is applied by itself, the above file ingests as so:

ROWS




tell_group.pl MSG "NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                  : $MEDSA_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                 : 1245


tell_group.pl MSG "NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                  : $MEDSB_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                 : 350


tell_group.pl MSG "NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                  : $MEDSC_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                 : 164


tell_group.pl MSG "NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                  : $MEDSD_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                 : 0

That first regex will work on all lines except the first line in the file (and it leaves a bunch of empty lines as well). To get rid of those, i used a variation of the first SEDCMD, only with the [\r\n]+ at the end of the match.

SEDCMD-02-Remove_lines_part_2 = s/^(?!.*(tell_group\.pl)).*[\r\n]//g

after this is done, we are left with:

tell_group.pl MSG "NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                  : $MEDSA_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                 : 1245
tell_group.pl MSG "NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                  : $MEDSB_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                 : 350
tell_group.pl MSG "NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                  : $MEDSC_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                 : 164
tell_group.pl MSG "NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                  : $MEDSD_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                 : 0

Which i believe answers your requirements. Hope this helps
./Darren

0 Karma

supreet
Explorer

Nice, I tried this and looks like it is working. Question: Does this mean only a part of my log file will be ingested so I am not using the whole log's disk space in my License ? Actually I only want to ingest a part of my debug logs (which are huge). Also, can we line break the events after this conversion so we have different events again after ingestion. @darrenfuller @woodcock 

0 Karma

woodcock
Esteemed Legend

If this is a one-time effort, use the add oneshot command and filter it first, something like this:

grep "tell_group.pl" /Your/Source/Path/And/Filname/Here > /tmp/ERASEME.txt
$SPLUNK_HOME/bin/splunk add oneshot /tmp/ERASEME.txt -sourcetype YourSourcetypeHere -index YourIndexHere -rename-source "/Your/Source/Path/And/Filname/Here"
rm -f /tmp/ERASEME.txt
0 Karma

supreet
Explorer

For me, it is going to be ongoing thing and not a one time effort. So wondering if there is a way to achieve this

0 Karma

darrenfuller
Contributor

Is there a timestamp anywhere in the file or should the props just use the index time?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...