Have a very large log file (20,000+ lines per log file) and I only need the rows that contain "tell_group.pl" in them. Some start the line with that text, others have a "+ " before it. Hoping to build a props.conf that only ingest these lines from the log into a single event (1 log file = 1 event). So for each source file, I need all the lines (full line) that contain "tell_group.pl"
ROWS
ROWS
ROWS
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN : $MEDSA_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN : 1245
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN : $MEDSB_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN : 350
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN : $MEDSC_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN : 164
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN : $MEDSD_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN : 0
ROWS
ROWS
ROWS
THANKS IN ADVANCE!
Joe
Try this:
[answers786699]
disabled = false
DATETIME_CONFIG = CURRENT
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)\*\*\*\*xxxfail
TRUNCATE = 10000
SEDCMD-01-Remove_lines_part_1 = s/[\r\n]+(?!.*(tell_group\.pl)).*//g
SEDCMD-02-Remove_lines_part_2 = s/^(?!.*(tell_group\.pl)).*[\r\n]//g
Explanation:
1) ingest the whole file as a single event...
This is done with this line:
LINE_BREAKER = ([\r\n]+)\*\*\*\*xxxfail
Which tells splunk to only break when it reaches a carriage return followed by the exact string "****xxxfail" . If your files could be larger than 10000 lines, then also adjust the "TRUNCATE =" to be larger than your largest file (and probably include a buffer above that)... In the unlikely event that you do have ****xxxfail in your data, just change this to be an even more ridiculous and unlikely string... like It\sturns\sout\sthat\sthe\searth\sis\sflat
or something
2) Remove all lines that don't have "tell_group.pl" somewhere in the line.
This is accomplished with the three SEDCMD lines .. they operate as follows:
SEDCMD-01-Remove_lines_part_1 = s/[\r\n]+(?!.*(tell_group\.pl)).*//g
This removes all lines from the file that do not have tell_group.pl in them ... When this line is applied by itself, the above file ingests as so:
ROWS
tell_group.pl MSG "NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN : $MEDSA_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN : 1245
tell_group.pl MSG "NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN : $MEDSB_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN : 350
tell_group.pl MSG "NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN : $MEDSC_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN : 164
tell_group.pl MSG "NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN : $MEDSD_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN : 0
That first regex will work on all lines except the first line in the file (and it leaves a bunch of empty lines as well). To get rid of those, i used a variation of the first SEDCMD, only with the [\r\n]+ at the end of the match.
SEDCMD-02-Remove_lines_part_2 = s/^(?!.*(tell_group\.pl)).*[\r\n]//g
after this is done, we are left with:
tell_group.pl MSG "NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN : $MEDSA_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN : 1245
tell_group.pl MSG "NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN : $MEDSB_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN : 350
tell_group.pl MSG "NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN : $MEDSC_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN : 164
tell_group.pl MSG "NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN : $MEDSD_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN : 0
Which i believe answers your requirements. Hope this helps
./Darren
Nice, I tried this and looks like it is working. Question: Does this mean only a part of my log file will be ingested so I am not using the whole log's disk space in my License ? Actually I only want to ingest a part of my debug logs (which are huge). Also, can we line break the events after this conversion so we have different events again after ingestion. @darrenfuller @woodcock
If this is a one-time effort, use the add oneshot
command and filter it first, something like this:
grep "tell_group.pl" /Your/Source/Path/And/Filname/Here > /tmp/ERASEME.txt
$SPLUNK_HOME/bin/splunk add oneshot /tmp/ERASEME.txt -sourcetype YourSourcetypeHere -index YourIndexHere -rename-source "/Your/Source/Path/And/Filname/Here"
rm -f /tmp/ERASEME.txt
For me, it is going to be ongoing thing and not a one time effort. So wondering if there is a way to achieve this
Is there a timestamp anywhere in the file or should the props just use the index time?