<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Ingest only rows containing certain text from log file in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Ingest-only-rows-containing-certain-text-from-log-file/m-p/495791#M84542</link>
    <description>&lt;P&gt;Try this: &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[answers786699]
disabled = false
DATETIME_CONFIG = CURRENT
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)\*\*\*\*xxxfail
TRUNCATE = 10000

SEDCMD-01-Remove_lines_part_1 = s/[\r\n]+(?!.*(tell_group\.pl)).*//g
SEDCMD-02-Remove_lines_part_2 = s/^(?!.*(tell_group\.pl)).*[\r\n]//g
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Explanation: &lt;/P&gt;

&lt;P&gt;1) ingest the whole file as a single event...&lt;BR /&gt;&lt;BR /&gt;
This is done with this line: &lt;BR /&gt;
   &lt;CODE&gt;LINE_BREAKER =  ([\r\n]+)\*\*\*\*xxxfail&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;Which tells splunk to only break when it reaches a carriage return followed by the exact string "****xxxfail" .    If your files could be larger than 10000 lines, then also adjust the "TRUNCATE =" to be larger than your largest file (and probably include a buffer above that)...   In the unlikely event that you do have ****xxxfail in your data, just change this to be an even more ridiculous and unlikely string...  like  &lt;CODE&gt;It\sturns\sout\sthat\sthe\searth\sis\sflat&lt;/CODE&gt; or something&lt;/P&gt;

&lt;P&gt;2) Remove all lines that don't have "tell_group.pl" somewhere in the line. &lt;/P&gt;

&lt;P&gt;This is accomplished with the three SEDCMD lines ..    they operate as follows:&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;SEDCMD-01-Remove_lines_part_1 = s/[\r\n]+(?!.*(tell_group\.pl)).*//g&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;This removes all lines from the file that do not have tell_group.pl in them ...    When this line is applied by itself, the above file ingests as so: &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;ROWS




tell_group.pl MSG "NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                  : $MEDSA_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                 : 1245


tell_group.pl MSG "NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                  : $MEDSB_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                 : 350


tell_group.pl MSG "NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                  : $MEDSC_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                 : 164


tell_group.pl MSG "NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                  : $MEDSD_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                 : 0
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;That first regex will work on all lines except the first line in the file (and it leaves a bunch of empty lines as well).  To get rid of those, i used a variation of the first SEDCMD, only with the [\r\n]+ at the end of the match.    &lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;SEDCMD-02-Remove_lines_part_2 = s/^(?!.*(tell_group\.pl)).*[\r\n]//g&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;after this is done, we are left with: &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;tell_group.pl MSG "NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                  : $MEDSA_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                 : 1245
tell_group.pl MSG "NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                  : $MEDSB_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                 : 350
tell_group.pl MSG "NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                  : $MEDSC_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                 : 164
tell_group.pl MSG "NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                  : $MEDSD_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                 : 0
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Which i believe answers your requirements.    Hope this helps &lt;BR /&gt;
./Darren&lt;/P&gt;</description>
    <pubDate>Mon, 02 Dec 2019 22:53:59 GMT</pubDate>
    <dc:creator>darrenfuller</dc:creator>
    <dc:date>2019-12-02T22:53:59Z</dc:date>
    <item>
      <title>Ingest only rows containing certain text from log file</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Ingest-only-rows-containing-certain-text-from-log-file/m-p/495788#M84539</link>
      <description>&lt;P&gt;Have a very large log file (20,000+ lines per log file) and I only need the rows that contain "tell_group.pl" in them. Some start the line with that text, others have a "+ " before it. Hoping to build a props.conf that only ingest these lines from the log into a single event (1 log file = 1 event). So for each source file, I need all the lines (full line) that contain "tell_group.pl"&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;ROWS
ROWS
ROWS
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                  : $MEDSA_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                 : 1245
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                  : $MEDSB_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                 : 350
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                  : $MEDSC_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                 : 164
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                  : $MEDSD_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                 : 0
ROWS
ROWS
ROWS
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;THANKS IN ADVANCE!  &lt;/P&gt;

&lt;P&gt;Joe&lt;/P&gt;</description>
      <pubDate>Wed, 30 Sep 2020 03:10:30 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Ingest-only-rows-containing-certain-text-from-log-file/m-p/495788#M84539</guid>
      <dc:creator>joesrepsolc</dc:creator>
      <dc:date>2020-09-30T03:10:30Z</dc:date>
    </item>
    <item>
      <title>Re: Ingest only rows containing certain text from log file</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Ingest-only-rows-containing-certain-text-from-log-file/m-p/495789#M84540</link>
      <description>&lt;P&gt;Is there a timestamp anywhere in the file or should the props just use the index time?&lt;/P&gt;</description>
      <pubDate>Mon, 02 Dec 2019 20:30:26 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Ingest-only-rows-containing-certain-text-from-log-file/m-p/495789#M84540</guid>
      <dc:creator>darrenfuller</dc:creator>
      <dc:date>2019-12-02T20:30:26Z</dc:date>
    </item>
    <item>
      <title>Re: Ingest only rows containing certain text from log file</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Ingest-only-rows-containing-certain-text-from-log-file/m-p/495790#M84541</link>
      <description>&lt;P&gt;If this is a one-time effort, use the &lt;CODE&gt;add oneshot&lt;/CODE&gt; command and filter it first, something like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;grep "tell_group.pl" /Your/Source/Path/And/Filname/Here &amp;gt; /tmp/ERASEME.txt
$SPLUNK_HOME/bin/splunk add oneshot /tmp/ERASEME.txt -sourcetype YourSourcetypeHere -index YourIndexHere -rename-source "/Your/Source/Path/And/Filname/Here"
rm -f /tmp/ERASEME.txt
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 02 Dec 2019 21:09:59 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Ingest-only-rows-containing-certain-text-from-log-file/m-p/495790#M84541</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2019-12-02T21:09:59Z</dc:date>
    </item>
    <item>
      <title>Re: Ingest only rows containing certain text from log file</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Ingest-only-rows-containing-certain-text-from-log-file/m-p/495791#M84542</link>
      <description>&lt;P&gt;Try this: &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[answers786699]
disabled = false
DATETIME_CONFIG = CURRENT
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)\*\*\*\*xxxfail
TRUNCATE = 10000

SEDCMD-01-Remove_lines_part_1 = s/[\r\n]+(?!.*(tell_group\.pl)).*//g
SEDCMD-02-Remove_lines_part_2 = s/^(?!.*(tell_group\.pl)).*[\r\n]//g
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Explanation: &lt;/P&gt;

&lt;P&gt;1) ingest the whole file as a single event...&lt;BR /&gt;&lt;BR /&gt;
This is done with this line: &lt;BR /&gt;
   &lt;CODE&gt;LINE_BREAKER =  ([\r\n]+)\*\*\*\*xxxfail&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;Which tells splunk to only break when it reaches a carriage return followed by the exact string "****xxxfail" .    If your files could be larger than 10000 lines, then also adjust the "TRUNCATE =" to be larger than your largest file (and probably include a buffer above that)...   In the unlikely event that you do have ****xxxfail in your data, just change this to be an even more ridiculous and unlikely string...  like  &lt;CODE&gt;It\sturns\sout\sthat\sthe\searth\sis\sflat&lt;/CODE&gt; or something&lt;/P&gt;

&lt;P&gt;2) Remove all lines that don't have "tell_group.pl" somewhere in the line. &lt;/P&gt;

&lt;P&gt;This is accomplished with the three SEDCMD lines ..    they operate as follows:&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;SEDCMD-01-Remove_lines_part_1 = s/[\r\n]+(?!.*(tell_group\.pl)).*//g&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;This removes all lines from the file that do not have tell_group.pl in them ...    When this line is applied by itself, the above file ingests as so: &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;ROWS




tell_group.pl MSG "NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                  : $MEDSA_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                 : 1245


tell_group.pl MSG "NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                  : $MEDSB_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                 : 350


tell_group.pl MSG "NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                  : $MEDSC_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                 : 164


tell_group.pl MSG "NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                  : $MEDSD_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                 : 0
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;That first regex will work on all lines except the first line in the file (and it leaves a bunch of empty lines as well).  To get rid of those, i used a variation of the first SEDCMD, only with the [\r\n]+ at the end of the match.    &lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;SEDCMD-02-Remove_lines_part_2 = s/^(?!.*(tell_group\.pl)).*[\r\n]//g&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;after this is done, we are left with: &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;tell_group.pl MSG "NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                  : $MEDSA_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                 : 1245
tell_group.pl MSG "NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                  : $MEDSB_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                 : 350
tell_group.pl MSG "NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                  : $MEDSC_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                 : 164
tell_group.pl MSG "NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                  : $MEDSD_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                 : 0
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Which i believe answers your requirements.    Hope this helps &lt;BR /&gt;
./Darren&lt;/P&gt;</description>
      <pubDate>Mon, 02 Dec 2019 22:53:59 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Ingest-only-rows-containing-certain-text-from-log-file/m-p/495791#M84542</guid>
      <dc:creator>darrenfuller</dc:creator>
      <dc:date>2019-12-02T22:53:59Z</dc:date>
    </item>
    <item>
      <title>Re: Ingest only rows containing certain text from log file</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Ingest-only-rows-containing-certain-text-from-log-file/m-p/676766#M113187</link>
      <description>&lt;P&gt;Nice, I tried this and looks like it is working. Question: Does this mean only a part of my log file will be ingested so I am not using the whole log's disk space in my License ? Actually I only want to ingest a part of my debug logs (which are huge). Also, can we line break the events after this conversion so we have different events again after ingestion.&amp;nbsp;@darrenfuller &lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/1406"&gt;@woodcock&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 06 Feb 2024 16:52:02 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Ingest-only-rows-containing-certain-text-from-log-file/m-p/676766#M113187</guid>
      <dc:creator>supreet</dc:creator>
      <dc:date>2024-02-06T16:52:02Z</dc:date>
    </item>
    <item>
      <title>Re: Ingest only rows containing certain text from log file</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Ingest-only-rows-containing-certain-text-from-log-file/m-p/676767#M113188</link>
      <description>&lt;P&gt;For me, it is going to be ongoing thing and not a one time effort. So wondering if there is a way to achieve this&lt;/P&gt;</description>
      <pubDate>Tue, 06 Feb 2024 16:52:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Ingest-only-rows-containing-certain-text-from-log-file/m-p/676767#M113188</guid>
      <dc:creator>supreet</dc:creator>
      <dc:date>2024-02-06T16:52:49Z</dc:date>
    </item>
  </channel>
</rss>

