<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: regex file names from path and/or url in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84155#M21475</link>
    <description>&lt;P&gt;2013-01-10 16:01:55,411 INFO  [1357833716802] [775] ts=2013-01-10T21:01:55Z aid=&lt;A href="http://access.auth.sp1.internal.net/data/Account/2189541263"&gt;http://access.auth.sp1.internal.net/data/Account/2189541263&lt;/A&gt; id=1357833716802 t=Encoder.Task.CreateProfileJob rt=77 c=1 tm="&lt;ENCODER.TASK.CREATEPROFILEJOB xmlns="\&amp;quot;http://xml.sp1.internal.net/rmp/2.0/plugin/a039d6d6489548e1b27c99670c2de75c\&amp;quot;"&gt;&lt;SOURCEFILES&gt;&lt;FILE&gt;&lt;URL&gt;file://strg.cp03.internal.net/data/file/ingest/TestProcessFile.mov&lt;/URL&gt;&lt;/FILE&gt;&lt;/SOURCEFILES&gt;&lt;/ENCODER.TASK.CREATEPROFILEJOB&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 11 Jan 2013 01:21:17 GMT</pubDate>
    <dc:creator>marquiselee</dc:creator>
    <dc:date>2013-01-11T01:21:17Z</dc:date>
    <item>
      <title>regex file names from path and/or url</title>
      <link>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84146#M21466</link>
      <description>&lt;P&gt;I need to extract filenames so I can transact across many logs of different types and such. &lt;/P&gt;

&lt;P&gt;some logs have full urls - &lt;A href="http://www.test1.com/43/test.txt"&gt;http://www.test1.com/43/test.txt&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;some logs have only paths - /43/test.txt &lt;/P&gt;

&lt;P&gt;some logs are standar looking logs and some are actually XML data dump that was indexed as a "standard log".  -  &amp;lt;\url&amp;gt;&lt;A href="http://www.test1.com/43/test.txt"&gt;http://www.test1.com/43/test.txt&lt;/A&gt;&amp;lt;\/url&amp;gt;&lt;/P&gt;

&lt;P&gt;sometimes the whole path may be enclosed in parenthesis or quotes too  - "/43/test.txt"&lt;/P&gt;

&lt;P&gt;the basic principle is i need to extract files  (filename.ext)&lt;/P&gt;

&lt;P&gt;I don't have access to the file system and can only use "Extract Fields" in the web interface?&lt;/P&gt;

&lt;P&gt;any thoughts? &lt;/P&gt;</description>
      <pubDate>Thu, 10 Jan 2013 18:30:17 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84146#M21466</guid>
      <dc:creator>marquiselee</dc:creator>
      <dc:date>2013-01-10T18:30:17Z</dc:date>
    </item>
    <item>
      <title>Re: regex file names from path and/or url</title>
      <link>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84147#M21467</link>
      <description>&lt;P&gt;You could do this in your search:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;source=*test.txt
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;and it will find events from the &lt;CODE&gt;test.txt&lt;/CODE&gt; file, whether or not it has a URL or a path or nothing at all.&lt;BR /&gt;
If you really need a regular expression, you can even do that with the &lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Regex"&gt;regex&lt;/A&gt; command.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;yoursearchhere | regex "yourregexhere"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I don't think you need to do any field extractions at all. But perhaps I misunderstood the question. If this doesn't work, can you post a few lines of your data?&lt;/P&gt;

&lt;P&gt;Are you talking about the actual name of the log file? If yes, then there is already a field extracted. Its name is &lt;CODE&gt;source&lt;/CODE&gt;. You don't need to do a "join" - the first search will work.&lt;/P&gt;

&lt;P&gt;Are you talking about a file name that is contained within your event data? If yes, then I need to see some of the data to help you with the field extraction.&lt;/P&gt;

&lt;P&gt;Finally, do you want to summarize the data based on the file name? If yes, then this should work:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;yoursearchhere source=*test.txt
| rex field=source "/(?&amp;lt;filename&amp;gt;.*?)$"
| stats count by filename
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Of course, you might need to modify the stats command and the initial search, etc.&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jan 2013 18:53:37 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84147#M21467</guid>
      <dc:creator>lguinn2</dc:creator>
      <dc:date>2013-01-10T18:53:37Z</dc:date>
    </item>
    <item>
      <title>Re: regex file names from path and/or url</title>
      <link>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84148#M21468</link>
      <description>&lt;P&gt;test.txt was an example.  there are thousands of files that are uniquely name but appear in different logs.  The files name aren't what's important but that in many cases is the only thing i'll be able to join on.&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jan 2013 19:03:37 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84148#M21468</guid>
      <dc:creator>marquiselee</dc:creator>
      <dc:date>2013-01-10T19:03:37Z</dc:date>
    </item>
    <item>
      <title>Re: regex file names from path and/or url</title>
      <link>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84149#M21469</link>
      <description>&lt;P&gt;test.txt was an example.  there are thousands of files that are uniquely name but appear in different logs.  The files name aren't what's important but that in many cases is the only thing i'll be able to join on.&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jan 2013 19:03:50 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84149#M21469</guid>
      <dc:creator>marquiselee</dc:creator>
      <dc:date>2013-01-10T19:03:50Z</dc:date>
    </item>
    <item>
      <title>Re: regex file names from path and/or url</title>
      <link>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84150#M21470</link>
      <description>&lt;P&gt;You can use the rex command. This will find anything after a slash, then anything except a period, then the 3 \w extension. &lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;your_search | rex field=_raw "/(?&amp;lt;filename&amp;gt;[^\.]*\.\w{3})"&lt;/CODE&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jan 2013 19:38:50 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84150#M21470</guid>
      <dc:creator>alacercogitatus</dc:creator>
      <dc:date>2013-01-10T19:38:50Z</dc:date>
    </item>
    <item>
      <title>Re: regex file names from path and/or url</title>
      <link>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84151#M21471</link>
      <description>&lt;P&gt;Thanks, This is heading in the right direction but the paths are much longer than my example and are not uniform in the directory structure...&lt;/P&gt;

&lt;P&gt;/mnt/mezzanine/mezzanine/provider/business.doc&lt;BR /&gt;
or &lt;BR /&gt;
/pac/output/brand/media.mov&lt;BR /&gt;
or&lt;BR /&gt;
http://&lt;/P&gt;

&lt;P&gt;also, some files have 4 letter extensions.&lt;/P&gt;

&lt;P&gt;if it helps the extension on the file will always be followed only by a space or the following characters ' " &amp;lt; &amp;gt; ()&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jan 2013 21:37:59 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84151#M21471</guid>
      <dc:creator>marquiselee</dc:creator>
      <dc:date>2013-01-10T21:37:59Z</dc:date>
    </item>
    <item>
      <title>Re: regex file names from path and/or url</title>
      <link>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84152#M21472</link>
      <description>&lt;P&gt;Is there a pattern within the event that can be used to identify the file name? What does an actual event look like?&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jan 2013 23:05:16 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84152#M21472</guid>
      <dc:creator>lguinn2</dc:creator>
      <dc:date>2013-01-10T23:05:16Z</dc:date>
    </item>
    <item>
      <title>Re: regex file names from path and/or url</title>
      <link>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84153#M21473</link>
      <description>&lt;P&gt;The source is not the file name I'm trying to extract.  The various logs(sources) contain reference to hundreds of thousands of files.  so a log line may look like this...&lt;/P&gt;

&lt;P&gt;"2013-01-10 11:24:17,345 DEBUG [1357817043844] [649] 439 : FAILURE : 100% : Exception encountered in plugin [Encoder.Task.CreateJob]!  Plugin Terminated. Encode operations failed: 10102013-01-10T11:24:05-05:00Unable to open input file [/pac/output/lcvtv/testmedia.mov] : 4110"&lt;/P&gt;</description>
      <pubDate>Fri, 11 Jan 2013 01:03:50 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84153#M21473</guid>
      <dc:creator>marquiselee</dc:creator>
      <dc:date>2013-01-11T01:03:50Z</dc:date>
    </item>
    <item>
      <title>Re: regex file names from path and/or url</title>
      <link>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84154#M21474</link>
      <description>&lt;P&gt;no patter other than it's obviously a file name.  path/file.ext &lt;/P&gt;

&lt;P&gt;here are a few examples... all from the same sourcetype&lt;/P&gt;

&lt;P&gt;2013-01-10 16:02:27,033 DEBUG [1357833733497] [775] 404 : FAILURE : 100% : Exception encountered in plugin [Encoder.Task.CreateProfileJob]!  Plugin Terminated. Encode operations failed: 10102013-01-10T16:02:15-05:00Unable to open input file [/mnt/ops/file/ingest/TestProcessFile.mov] : 3951&lt;/P&gt;

&lt;P&gt;2013-01-10 16:02:17,601 DEBUG [1357833739023] [742] --- input #0: sourceFile=file://strg.cp03.internal.net/data/file/ingest/TestProcessFile.mov&lt;/P&gt;</description>
      <pubDate>Fri, 11 Jan 2013 01:21:07 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84154#M21474</guid>
      <dc:creator>marquiselee</dc:creator>
      <dc:date>2013-01-11T01:21:07Z</dc:date>
    </item>
    <item>
      <title>Re: regex file names from path and/or url</title>
      <link>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84155#M21475</link>
      <description>&lt;P&gt;2013-01-10 16:01:55,411 INFO  [1357833716802] [775] ts=2013-01-10T21:01:55Z aid=&lt;A href="http://access.auth.sp1.internal.net/data/Account/2189541263"&gt;http://access.auth.sp1.internal.net/data/Account/2189541263&lt;/A&gt; id=1357833716802 t=Encoder.Task.CreateProfileJob rt=77 c=1 tm="&lt;ENCODER.TASK.CREATEPROFILEJOB xmlns="\&amp;quot;http://xml.sp1.internal.net/rmp/2.0/plugin/a039d6d6489548e1b27c99670c2de75c\&amp;quot;"&gt;&lt;SOURCEFILES&gt;&lt;FILE&gt;&lt;URL&gt;file://strg.cp03.internal.net/data/file/ingest/TestProcessFile.mov&lt;/URL&gt;&lt;/FILE&gt;&lt;/SOURCEFILES&gt;&lt;/ENCODER.TASK.CREATEPROFILEJOB&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Jan 2013 01:21:17 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84155#M21475</guid>
      <dc:creator>marquiselee</dc:creator>
      <dc:date>2013-01-11T01:21:17Z</dc:date>
    </item>
    <item>
      <title>Re: regex file names from path and/or url</title>
      <link>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84156#M21476</link>
      <description>&lt;P&gt;Look, if you can't find a pattern that uniquely identifies the data you're after, then neither can Splunk. So what you need is simply to go through all the different encountered variants of filenames in your logs and find a common pattern that catches them all - or, failing that, a set of different patterns that catch them all separately.&lt;/P&gt;</description>
      <pubDate>Fri, 11 Jan 2013 09:35:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84156#M21476</guid>
      <dc:creator>Ayn</dc:creator>
      <dc:date>2013-01-11T09:35:23Z</dc:date>
    </item>
    <item>
      <title>Re: regex file names from path and/or url</title>
      <link>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84157#M21477</link>
      <description>&lt;P&gt;&lt;CODE&gt;your_search | rex field=_raw "/(?&amp;lt;filename&amp;gt;[^\./]*\.\w{3,4})[\s'\"&amp;lt;&amp;gt;\(\)]"&lt;/CODE&gt; This should grab anything after the slash, with an extension 3 or 4 &lt;CODE&gt;\w&lt;/CODE&gt; in length, followed by the characters you described earlier.&lt;/P&gt;</description>
      <pubDate>Fri, 11 Jan 2013 13:45:01 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84157#M21477</guid>
      <dc:creator>alacercogitatus</dc:creator>
      <dc:date>2013-01-11T13:45:01Z</dc:date>
    </item>
    <item>
      <title>Re: regex file names from path and/or url</title>
      <link>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84158#M21478</link>
      <description>&lt;P&gt;Thanks,  I finally got something to work using your rex as the foundation and by specifying extensions.&lt;/P&gt;

&lt;P&gt;[^/]/(?&lt;FILENAME&gt;[\w-]+.(?:[A-Z]{3}|mpeg|mpg|mp4|mov|ism|ismv|isma|ts|flv|sami|scc|vtt|ttml))&lt;/FILENAME&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 15 Jan 2013 19:51:14 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/regex-file-names-from-path-and-or-url/m-p/84158#M21478</guid>
      <dc:creator>marquiselee</dc:creator>
      <dc:date>2013-01-15T19:51:14Z</dc:date>
    </item>
  </channel>
</rss>

