<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Creating custom sourcetype from part of log filename results with field sourcetype=$1 in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Creating-custom-sourcetype-from-part-of-log-filename-results/m-p/160687#M32577</link>
    <description>&lt;P&gt;Since you want to use the first part of the filename, you need to change your regex.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;REGEX=\w+(?=_\w+_\w+\.log$)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;You want the \w characters that precede the _number_number.log, so you have make them a capturing group, like so:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;REGEX=(\w+)(?=_\w+_\w+\.log$)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Now $1 is ABCDE123 and similar.&lt;/P&gt;</description>
    <pubDate>Mon, 28 Sep 2020 17:52:24 GMT</pubDate>
    <dc:creator>jrodman</dc:creator>
    <dc:date>2020-09-28T17:52:24Z</dc:date>
    <item>
      <title>Creating custom sourcetype from part of log filename results with field sourcetype=$1</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Creating-custom-sourcetype-from-part-of-log-filename-results/m-p/160682#M32572</link>
      <description>&lt;P&gt;Firstly, I'll give my apologies now as you'll find my attempt to explain my problem will most likely show my inexperience with Splunk.&lt;/P&gt;

&lt;P&gt;To start off - I'm running Splunk 6 on RedHat Enterprise Linux 5. &lt;/P&gt;

&lt;P&gt;I'm attempting to ingest many application log files into Splunk where part of the filename contains the application subsystem, a date and time string, and a process ID. Suffice to say, these logs are only created once by a job triggered from the application - and never used in any subsequent jobs. &lt;/P&gt;

&lt;P&gt;I've based my research on suggestions from blogs and other posts in this forum such as:&lt;/P&gt;

&lt;P&gt;&lt;A href="http://blogs.splunk.com/2010/02/11/sourcetypes-gone-wild/" target="_blank"&gt;http://blogs.splunk.com/2010/02/11/sourcetypes-gone-wild/&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&lt;A href="http://blogs.splunk.com/2010/02/11/sourcetypes-gone-wild/" target="_blank"&gt;http://answers.splunk.com/answers/25560/field-names-from-file-including-source-and-host.html&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&lt;A href="http://blogs.splunk.com/2010/02/11/sourcetypes-gone-wild/" target="_blank"&gt;http://answers.splunk.com/answers/83619/source-sourcetype-defined-by-folder-names.html&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;A sample of log files I'm wanting to ingest are: &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;/app/prod/app_1/logs/ABCDE123_20141013163738_24772.log
/app/prod/app_1/logs/XYZABC456_20141013093007_16799.log
/app/prod/app_1/logs/EFGHIJK789_20141013093007_16799.log
/app/prod/app_1/logs/123ABC_20141013093007_16799.log
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;In my universal forwarder I have an inputs.conf file with the following entry:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[monitor:///app/prod/app_1/logs/*.log]
disabled = false
followTail = 1
index = app_index
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;In my indexer I have a props.conf file with the following entry:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[source::/app/prod/app_1/logs/*.log]
TRANSFORMS-set_sourcetype_app_logs = set_sourcetype_app_logs
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Also in my indexer a transforms.conf file with the following entry:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[set_sourcetype_app_logs]
DEST_KEY=MetaData:Sourcetype
SOURCE_KEY=MetaData:Source
REGEX=\w+(?=_\w+_\w+\.log$)
FORMAT=sourcetype::$1
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;My expectation is that indexed logs should a source like "&lt;STRONG&gt;/app/prod/app_1/logs/ABCDE123_20141013163738_24772.log&lt;/STRONG&gt;" and a sourcetype like "&lt;STRONG&gt;ABCDE123&lt;/STRONG&gt;"&lt;/P&gt;

&lt;P&gt;However, once the logs are ingested and indexed, a search reveals that all data ingested appeared literally with sourcetype of '&lt;STRONG&gt;$1&lt;/STRONG&gt;' instead of the intended filename regex,&lt;/P&gt;

&lt;P&gt;Do I have a problem with my transforms.conf regex or is my configuration completely off the mark?&lt;/P&gt;

&lt;P&gt;Any help would be greatly appreciated. &lt;/P&gt;

&lt;P&gt;Thanks,&lt;BR /&gt;
Bobby&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 17:51:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Creating-custom-sourcetype-from-part-of-log-filename-results/m-p/160682#M32572</guid>
      <dc:creator>bobmacks</dc:creator>
      <dc:date>2020-09-28T17:51:39Z</dc:date>
    </item>
    <item>
      <title>Re: Creating custom sourcetype from part of log filename results with field sourcetype=$1</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Creating-custom-sourcetype-from-part-of-log-filename-results/m-p/160683#M32573</link>
      <description>&lt;P&gt;Are you sure &lt;CODE&gt;(?=)&lt;/CODE&gt; acts as a capturing group?  I think it's just a zero-width assertion that doesn't capture any text.  Why not drop the &lt;CODE&gt;?=&lt;/CODE&gt;  &lt;/P&gt;

&lt;P&gt;I'm a little skittish of a sourcetype like 20141013093007_16799.log though.  That doesn't sound like a data format, which is what sourcetypes are.  It sounds like the time of day at which the data was produced.  I would typically want to call this data something like "app_1".&lt;/P&gt;

&lt;P&gt;Aside: you probably want to disable followTail, it's not really reasonable/safe and splunk only get the new data anyway.  FollowTail is just useful when first setting up a forwarder.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 17:51:42 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Creating-custom-sourcetype-from-part-of-log-filename-results/m-p/160683#M32573</guid>
      <dc:creator>jrodman</dc:creator>
      <dc:date>2020-09-28T17:51:42Z</dc:date>
    </item>
    <item>
      <title>Re: Creating custom sourcetype from part of log filename results with field sourcetype=$1</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Creating-custom-sourcetype-from-part-of-log-filename-results/m-p/160684#M32574</link>
      <description>&lt;P&gt;Make that an answer and I'll upvote it.&lt;/P&gt;</description>
      <pubDate>Mon, 13 Oct 2014 21:58:06 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Creating-custom-sourcetype-from-part-of-log-filename-results/m-p/160684#M32574</guid>
      <dc:creator>sowings</dc:creator>
      <dc:date>2014-10-13T21:58:06Z</dc:date>
    </item>
    <item>
      <title>Re: Creating custom sourcetype from part of log filename results with field sourcetype=$1</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Creating-custom-sourcetype-from-part-of-log-filename-results/m-p/160685#M32575</link>
      <description>&lt;P&gt;Hi jrodman, &lt;/P&gt;

&lt;P&gt;Thanks for your feedback&lt;BR /&gt;
Just to be clear the sourcetype I wanted was "XYZABC456" and not "20141013093007_16799.log"&lt;/P&gt;

&lt;P&gt;Regarding the regex I tested this on &lt;A href="http://www.regexr.com" target="_blank"&gt;www.regexr.com&lt;/A&gt; and it seemed to work fine there.  According to regexr "(?=)" is a positive lookahead so &lt;CODE&gt;"(?=_\w+_\w+\.log$)"&lt;/CODE&gt; looks for the pattern "_word_word.log" and the preceding &lt;CODE&gt;"\w"&lt;/CODE&gt; matches the word before the lookahead pattern. &lt;/P&gt;

&lt;P&gt;I did consider sticking with "app_1" at one point - but we have so many different job types (the examples are only a very small subset) - that extracting job name as the sourcetype from the filename would be more useful. &lt;/P&gt;

&lt;P&gt;Cheers,&lt;BR /&gt;
Bobby&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 17:52:21 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Creating-custom-sourcetype-from-part-of-log-filename-results/m-p/160685#M32575</guid>
      <dc:creator>bobmacks</dc:creator>
      <dc:date>2020-09-28T17:52:21Z</dc:date>
    </item>
    <item>
      <title>Re: Creating custom sourcetype from part of log filename results with field sourcetype=$1</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Creating-custom-sourcetype-from-part-of-log-filename-results/m-p/160686#M32576</link>
      <description>&lt;P&gt;Are you sure it's the answer to the question? I still can't tell.&lt;/P&gt;</description>
      <pubDate>Tue, 14 Oct 2014 01:17:45 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Creating-custom-sourcetype-from-part-of-log-filename-results/m-p/160686#M32576</guid>
      <dc:creator>jrodman</dc:creator>
      <dc:date>2014-10-14T01:17:45Z</dc:date>
    </item>
    <item>
      <title>Re: Creating custom sourcetype from part of log filename results with field sourcetype=$1</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Creating-custom-sourcetype-from-part-of-log-filename-results/m-p/160687#M32577</link>
      <description>&lt;P&gt;Since you want to use the first part of the filename, you need to change your regex.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;REGEX=\w+(?=_\w+_\w+\.log$)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;You want the \w characters that precede the _number_number.log, so you have make them a capturing group, like so:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;REGEX=(\w+)(?=_\w+_\w+\.log$)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Now $1 is ABCDE123 and similar.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 17:52:24 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Creating-custom-sourcetype-from-part-of-log-filename-results/m-p/160687#M32577</guid>
      <dc:creator>jrodman</dc:creator>
      <dc:date>2020-09-28T17:52:24Z</dc:date>
    </item>
    <item>
      <title>Re: Creating custom sourcetype from part of log filename results with field sourcetype=$1</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Creating-custom-sourcetype-from-part-of-log-filename-results/m-p/160688#M32578</link>
      <description>&lt;P&gt;Oh sorry, reading comprehension fail on my part.&lt;/P&gt;</description>
      <pubDate>Tue, 14 Oct 2014 01:20:32 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Creating-custom-sourcetype-from-part-of-log-filename-results/m-p/160688#M32578</guid>
      <dc:creator>jrodman</dc:creator>
      <dc:date>2014-10-14T01:20:32Z</dc:date>
    </item>
    <item>
      <title>Re: Creating custom sourcetype from part of log filename results with field sourcetype=$1</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Creating-custom-sourcetype-from-part-of-log-filename-results/m-p/160689#M32579</link>
      <description>&lt;P&gt;Perfect! Worked like a charm. Thanks for your help!&lt;/P&gt;</description>
      <pubDate>Tue, 14 Oct 2014 04:19:42 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Creating-custom-sourcetype-from-part-of-log-filename-results/m-p/160689#M32579</guid>
      <dc:creator>bobmacks</dc:creator>
      <dc:date>2014-10-14T04:19:42Z</dc:date>
    </item>
    <item>
      <title>Re: Creating custom sourcetype from part of log filename results with field sourcetype=$1</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Creating-custom-sourcetype-from-part-of-log-filename-results/m-p/160690#M32580</link>
      <description>&lt;P&gt;Incidentally if you wanted to use the entire regex match, you could have used $0, but I encourage the explicit capture group approach.&lt;/P&gt;</description>
      <pubDate>Tue, 14 Oct 2014 06:14:11 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Creating-custom-sourcetype-from-part-of-log-filename-results/m-p/160690#M32580</guid>
      <dc:creator>jrodman</dc:creator>
      <dc:date>2014-10-14T06:14:11Z</dc:date>
    </item>
  </channel>
</rss>

