<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Why are log events indexed from GZip archive with a specified source type missing extracted fields? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Why-are-log-events-indexed-from-GZip-archive-with-a-specified/m-p/439128#M124945</link>
    <description>&lt;P&gt;Yes you dont need to change the souretype&lt;/P&gt;

&lt;P&gt;Just put a extraction like this (expand it out)&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;EXTRACT-all = (?&amp;lt;id&amp;gt;[^\t]+)\t(?&amp;lt;generated_at&amp;gt;[^\t]+)\t(?&amp;lt;received_at&amp;gt;[^\t]+)\t(?&amp;lt;source_id&amp;gt;[^\t]+)\t&lt;/CODE&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 31 Jan 2019 19:53:35 GMT</pubDate>
    <dc:creator>chrisyounger</dc:creator>
    <dc:date>2019-01-31T19:53:35Z</dc:date>
    <item>
      <title>Why are log events indexed from GZip archive with a specified source type missing extracted fields?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-are-log-events-indexed-from-GZip-archive-with-a-specified/m-p/439122#M124939</link>
      <description>&lt;P&gt;Hi there,&lt;/P&gt;

&lt;P&gt;I have a custom source type (papertrail) that is a tab delimited source and have verified it works correctly. I manually imported a local directory of a month's worth of log data directly from a .tsv file - see the below screenshot:&lt;/P&gt;

&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="good"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/6473iE03999D925D60554/image-size/large?v=v2&amp;amp;px=999" role="button" title="good" alt="good" /&gt;&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;Since that worked as expected, I then set up an AWS SQS-Based S3 import to move Papertrail's nightly archives into Splunk automatically. These archives however are daily gzipped archives. Splunk does index the gzip file, and says the source type is papertrail, &lt;EM&gt;but&lt;/EM&gt; the fields aren't extracted like in the first picture. Any ideas?&lt;/P&gt;

&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="bad"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/6474i84C4DA48720D88FE/image-size/large?v=v2&amp;amp;px=999" role="button" title="bad" alt="bad" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 31 Jan 2019 16:08:16 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-are-log-events-indexed-from-GZip-archive-with-a-specified/m-p/439122#M124939</guid>
      <dc:creator>statmuse</dc:creator>
      <dc:date>2019-01-31T16:08:16Z</dc:date>
    </item>
    <item>
      <title>Re: Why are log events indexed from GZip archive with a specified source type missing extracted fields?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-are-log-events-indexed-from-GZip-archive-with-a-specified/m-p/439123#M124940</link>
      <description>&lt;P&gt;Can you please share the &lt;CODE&gt;props.conf&lt;/CODE&gt; setting for the sourcetype &lt;CODE&gt;paper trail&lt;/CODE&gt;?&lt;/P&gt;

&lt;P&gt;cheers, MuS&lt;/P&gt;</description>
      <pubDate>Thu, 31 Jan 2019 19:01:55 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-are-log-events-indexed-from-GZip-archive-with-a-specified/m-p/439123#M124940</guid>
      <dc:creator>MuS</dc:creator>
      <dc:date>2019-01-31T19:01:55Z</dc:date>
    </item>
    <item>
      <title>Re: Why are log events indexed from GZip archive with a specified source type missing extracted fields?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-are-log-events-indexed-from-GZip-archive-with-a-specified/m-p/439124#M124941</link>
      <description>&lt;P&gt;Yes, here you go:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[papertrail]
BREAK_ONLY_BEFORE_DATE =
DATETIME_CONFIG =
FIELD_DELIMITER = tab
FIELD_NAMES = id, generated_at, received_at, source_id, source_name, source_ip, facility_name, severity_name, program, message
HEADER_FIELD_DELIMITER = tab
INDEXED_EXTRACTIONS = tsv
KV_MODE = none
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
category = Structured
description = papertrail archive format
disabled = false
pulldown_type = 1
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 31 Jan 2019 19:06:55 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-are-log-events-indexed-from-GZip-archive-with-a-specified/m-p/439124#M124941</guid>
      <dc:creator>statmuse</dc:creator>
      <dc:date>2019-01-31T19:06:55Z</dc:date>
    </item>
    <item>
      <title>Re: Why are log events indexed from GZip archive with a specified source type missing extracted fields?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-are-log-events-indexed-from-GZip-archive-with-a-specified/m-p/439125#M124942</link>
      <description>&lt;P&gt;Hi @statmuse &lt;/P&gt;

&lt;P&gt;I am not sure why you are getting this problem because I can't see any problem.&lt;/P&gt;

&lt;P&gt;Are you using a sourcetype rename or something like that?  Anything unusual in your indexes.conf ?&lt;/P&gt;

&lt;P&gt;That said, If I had this problem I would personally fix it by moving away from &lt;CODE&gt;INDEXED_EXTRACTIONS&lt;/CODE&gt; and just doing regular search time extractions.  Splunk strength is in doing search time extractions so I always use those where possible. If you need a hand with this I would be happy to help.&lt;/P&gt;

&lt;P&gt;All the best&lt;/P&gt;</description>
      <pubDate>Thu, 31 Jan 2019 19:33:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-are-log-events-indexed-from-GZip-archive-with-a-specified/m-p/439125#M124942</guid>
      <dc:creator>chrisyounger</dc:creator>
      <dc:date>2019-01-31T19:33:57Z</dc:date>
    </item>
    <item>
      <title>Re: Why are log events indexed from GZip archive with a specified source type missing extracted fields?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-are-log-events-indexed-from-GZip-archive-with-a-specified/m-p/439126#M124943</link>
      <description>&lt;P&gt;Can you set the search mode to verbose and check fields &lt;/P&gt;</description>
      <pubDate>Thu, 31 Jan 2019 19:43:11 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-are-log-events-indexed-from-GZip-archive-with-a-specified/m-p/439126#M124943</guid>
      <dc:creator>ssadanala1</dc:creator>
      <dc:date>2019-01-31T19:43:11Z</dc:date>
    </item>
    <item>
      <title>Re: Why are log events indexed from GZip archive with a specified source type missing extracted fields?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-are-log-events-indexed-from-GZip-archive-with-a-specified/m-p/439127#M124944</link>
      <description>&lt;P&gt;Nothing out of the ordinary in indexes.conf.  I'm thinking that it might have something to do with the AWS add-on - &lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;If you want to ingest custom logs other the natively supported AWS log types, you must set s3_file_decoder = CustomLogs. This lets you ingest custom logs into Splunk but does not parse the data. To process custom logs into meaningful events, you need to perform additional configurations in props.conf and transforms.conf to parse the collected data to meet your specific requirements.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;Just need to find out what that is maybe?&lt;/P&gt;

&lt;P&gt;Is there a way to do search time extractions that still uses that papertrail sourcetype so I didn't need to rename fields everytime, but use the pre-existing column names in the sourcetype?&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 23:02:02 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-are-log-events-indexed-from-GZip-archive-with-a-specified/m-p/439127#M124944</guid>
      <dc:creator>statmuse</dc:creator>
      <dc:date>2020-09-29T23:02:02Z</dc:date>
    </item>
    <item>
      <title>Re: Why are log events indexed from GZip archive with a specified source type missing extracted fields?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-are-log-events-indexed-from-GZip-archive-with-a-specified/m-p/439128#M124945</link>
      <description>&lt;P&gt;Yes you dont need to change the souretype&lt;/P&gt;

&lt;P&gt;Just put a extraction like this (expand it out)&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;EXTRACT-all = (?&amp;lt;id&amp;gt;[^\t]+)\t(?&amp;lt;generated_at&amp;gt;[^\t]+)\t(?&amp;lt;received_at&amp;gt;[^\t]+)\t(?&amp;lt;source_id&amp;gt;[^\t]+)\t&lt;/CODE&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 31 Jan 2019 19:53:35 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-are-log-events-indexed-from-GZip-archive-with-a-specified/m-p/439128#M124945</guid>
      <dc:creator>chrisyounger</dc:creator>
      <dc:date>2019-01-31T19:53:35Z</dc:date>
    </item>
    <item>
      <title>Re: Why are log events indexed from GZip archive with a specified source type missing extracted fields?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Why-are-log-events-indexed-from-GZip-archive-with-a-specified/m-p/439129#M124946</link>
      <description>&lt;P&gt;After some other research, came across &lt;A href="https://www.hurricanelabs.com/blog/splunk-case-study-indexed-extractions-vs-search-time-extractions"&gt;https://www.hurricanelabs.com/blog/splunk-case-study-indexed-extractions-vs-search-time-extractions&lt;/A&gt; (linked from a colleague) that also advocated for search based extraction vs indexed. Going with your suggestion! Thanks!&lt;/P&gt;</description>
      <pubDate>Thu, 31 Jan 2019 21:41:24 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Why-are-log-events-indexed-from-GZip-archive-with-a-specified/m-p/439129#M124946</guid>
      <dc:creator>statmuse</dc:creator>
      <dc:date>2019-01-31T21:41:24Z</dc:date>
    </item>
  </channel>
</rss>

