<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Custom Logfile extract fields help in Splunk Dev</title>
    <link>https://community.splunk.com/t5/Splunk-Dev/Custom-Logfile-extract-fields-help/m-p/24301#M263</link>
    <description>&lt;P&gt;You do not need to program anything specific to get these fields out of your data. These look like SYSLOG style messages. It should be noted that Splunk recommends using the &lt;A href="http://docs.splunk.com/Documentation/Splunk/5.0.3/Knowledge/UnderstandandusetheCommonInformationModel"&gt;Common Information Model&lt;/A&gt; to standardize the naming convention for fields extracted from your data. You are not mandated to do this but it is a best practice recommendation.&lt;/P&gt;

&lt;P&gt;In your case, the following will extract most fields appropriately. If necessary, just rename them.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;^(?&amp;lt;date&amp;gt;\w{3}\s\d{2}\s\d{2}:\d{2}:\d{2})\s(?&amp;lt;hostname&amp;gt;[a-zA-Z0-9]+)\s(?&amp;lt;message_type&amp;gt;\w+)\[(?&amp;lt;message_id&amp;gt;\d+)\]\:\s+(?&amp;lt;epoch&amp;gt;\d{10}\.\d{3})\s+\d+\s(?&amp;lt;src&amp;gt;\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s(?&amp;lt;action&amp;gt;[A-Z_]+?)\/(?&amp;lt;status&amp;gt;\d{3})\s+(?&amp;lt;bytes_in&amp;gt;\d+)\s+(?&amp;lt;method&amp;gt;[A-Z]+)\s+(?&amp;lt;url&amp;gt;.+?)\s+(?&amp;lt;user&amp;gt;[a-z]+|\-)\s+(?&amp;lt;other&amp;gt;[A-Z_]+)/(?&amp;lt;http_user_agent&amp;gt;\w+|-)\s+(?&amp;lt;http_content_type&amp;gt;.+?)$&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;So, you could run a search using this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;sourcetype=squid | rex "\w{3}\s\d{2}\s\d{2}:\d{2}:\d{2})\s(?&amp;lt;hostname&amp;gt;[a-zA-Z0-9]+)\s(?&amp;lt;message_type&amp;gt;\w+)\[(?&amp;lt;message_id&amp;gt;\d+)\]\:\s+(?&amp;lt;epoch&amp;gt;\d{10}\.\d{3})\s+\d+\s(?&amp;lt;src&amp;gt;\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s(?&amp;lt;action&amp;gt;[A-Z_]+?)\/(?&amp;lt;status&amp;gt;\d{3})\s+(?&amp;lt;bytes_in&amp;gt;\d+)\s+(?&amp;lt;method&amp;gt;[A-Z]+)\s+(?&amp;lt;url&amp;gt;.+?)\s+(?&amp;lt;user&amp;gt;[a-z]+|\-)\s+(?&amp;lt;other&amp;gt;[A-Z_]+)/(?&amp;lt;http_user_agent&amp;gt;\w+|-)\s+(?&amp;lt;http_content_type&amp;gt;.+?)$"&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Which will render you something like this:&lt;/P&gt;

&lt;P&gt;&lt;IMG src="http://splunk-base.splunk.com//storage/Untitled1400.png" alt="alt text" /&gt;&lt;/P&gt;

&lt;P&gt;Once you've confirmed that this is what you need, automate the extraction by navigating SplunkWeb to &lt;/P&gt;

&lt;P&gt;Manager &amp;gt;&amp;gt; Fields &amp;gt;&amp;gt; Field Extractions&lt;BR /&gt;
&lt;BR /&gt;Click "New"&lt;BR /&gt;
&lt;BR /&gt;Fill in the blanks&lt;/P&gt;

&lt;P&gt;&lt;BR /&gt;&lt;IMG src="http://splunk-base.splunk.com//storage/Untitled1401.png" alt="alt text" /&gt;&lt;/P&gt;

&lt;P&gt;And enjoy your automatic extractions. There are multiple ways to accomplish this but this is the most straight forward.&lt;/P&gt;

&lt;P&gt;--gc&lt;/P&gt;</description>
    <pubDate>Mon, 05 Aug 2013 06:10:20 GMT</pubDate>
    <dc:creator>Gilberto_Castil</dc:creator>
    <dc:date>2013-08-05T06:10:20Z</dc:date>
    <item>
      <title>Custom Logfile extract fields help</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Custom-Logfile-extract-fields-help/m-p/24299#M261</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;i am very new to Splunk and a total greenhorn in regex. I have a log file with the following format&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;Jul 31 12:23:32 BALTHAZAR squid[7415]: 1375237412.537     93 10.110.40.144 TCP_MISS/200 1214 GET somewebsite ftropea FIRST_UP_PARENT/content1 application/x-javascript
Jul 30 23:59:13 BALTHAZAR squid[7415]: 1375192753.517      0 10.110.40.113 TCP_DENIED/407 3646 GET somewebsite - NONE/- text/html
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;it is a firewall/proxy access.log and when i import the data I choose access.log as type, then I need to customize since splunk gets just the Date/Time part correct and treats the whole rest as event.&lt;/P&gt;

&lt;P&gt;I would like to extract the following fields:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;Date = Jul 31 12:23:32 
Servername = BALTHAZAR 
IP = 10.110.40.144 
Code= TCP_MISS/200 
RequestType = GET 
Website = somewebsite includes the http://
User = ftropea 
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I also have to note that the username is sometimes empty and sometimes filled out.&lt;BR /&gt;
I used the inbuilt field extractor and could extract almost all of the fields above except the User.&lt;/P&gt;

&lt;P&gt;What i got until now is something like&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;(?:[^ \n]* ){3}(?P&amp;lt;_Servername_&amp;gt;[^ ]+)[^\.\n]*\.\d+\s+\d+\s+(?P&amp;lt;_IP_&amp;gt;[^ ]+)\s+(?P&amp;lt;_Code_&amp;gt;[^ ]+)\s+\d+\s+(?P&amp;lt;_RequestType_&amp;gt;[^ ]+)\s+(?P&amp;lt;_Website_&amp;gt;[^ ]+)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;and I think even that this is not correct... any idea what i could do?&lt;/P&gt;

&lt;P&gt;Or do i have to write my own app/plugin and write a parser (php/c# or whatever) for this file?&lt;/P&gt;</description>
      <pubDate>Mon, 05 Aug 2013 03:18:34 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Custom-Logfile-extract-fields-help/m-p/24299#M261</guid>
      <dc:creator>tomeki</dc:creator>
      <dc:date>2013-08-05T03:18:34Z</dc:date>
    </item>
    <item>
      <title>Re: Custom Logfile extract fields help</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Custom-Logfile-extract-fields-help/m-p/24300#M262</link>
      <description>&lt;P&gt;Nope, you don't need a special parser for this.&lt;BR /&gt;
I think you might be trying to do two things at once, so I'll address the extraction first.&lt;BR /&gt;
I notice you have line break characters in your regex, so it looks like you're trying to make a muiltiline event.&lt;BR /&gt;
First, that's not how you do that... so let's set that aside for a moment.&lt;/P&gt;

&lt;P&gt;Extracting fields in the way you've begun, will pull fields make them available for you to use in searching. It will not populate your index with visible field=value pairs. &lt;/P&gt;

&lt;P&gt;Here is the extraction syntax you're looking for to pull the fields you've indicated:&lt;/P&gt;

&lt;P&gt;(?i)(?P&lt;DATE&gt;\w+\s+\d+\s+\d+:\d+:\d+)\s+(?P&lt;SERVERNAME&gt;\w+[^ ]+)\s+\S+\s+\S+\s+\S+\s+(?P&lt;IPADDRESS&gt;\S+[^ ]+)\s+(?P&lt;CODE&gt;\S+[^ ]+)\s+\S+\s+(?P&lt;REQUESTTYPE&gt;\S+[^ ]+)\s+(?P&lt;WEBSITE&gt;\S+[^ ]+)\s+(?P&lt;USER&gt;\S+[^ ]+)&lt;/USER&gt;&lt;/WEBSITE&gt;&lt;/REQUESTTYPE&gt;&lt;/CODE&gt;&lt;/IPADDRESS&gt;&lt;/SERVERNAME&gt;&lt;/DATE&gt;&lt;/P&gt;

&lt;P&gt;That will work in the field extractor (which is creating a search time field extraction) or you can put it in your props.conf for the same effect preceded like this:&lt;/P&gt;

&lt;P&gt;EXTRACT-all_fields = (?i)(?P&lt;DATE&gt;\S+\s+\d+\s+\d+:\d+:\d+)\s+(?P&lt;SERVERNAME&gt;\S+[^ ]+)\s+\S+\s+\S+\s+\S+\s+(?P&lt;IPADDRESS&gt;\d+.\d+.\d+.\d+[^ ]+)\s+(?P&lt;CODE&gt;\S+[^ ]+)\s+\S+\s+(?P&lt;REQUESTTYPE&gt;\S+[^ ]+)\s+(?P&lt;WEBSITE&gt;\S+[^ ]+)\s+(?P&lt;USER&gt;\S+[^ ]+)\s+(?P&lt;FILLER&gt;\S+.[^ ]+)&lt;/FILLER&gt;&lt;/USER&gt;&lt;/WEBSITE&gt;&lt;/REQUESTTYPE&gt;&lt;/CODE&gt;&lt;/IPADDRESS&gt;&lt;/SERVERNAME&gt;&lt;/DATE&gt;&lt;/P&gt;

&lt;P&gt;The second iteration has a slightly different regex. I took out the \w and replaced them with \S just for the heck of it... and I added an extra field to grab what's left at the end of the string, also just for the heck of it.&lt;/P&gt;

&lt;P&gt;If you are trying to create a multiline event with value pairs like your example so it looks more like a windows log and you want the index to look like that... that would be a whole different question.&lt;/P&gt;

&lt;P&gt;As it again separately.&lt;/P&gt;</description>
      <pubDate>Mon, 05 Aug 2013 05:08:40 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Custom-Logfile-extract-fields-help/m-p/24300#M262</guid>
      <dc:creator>rsennett_splunk</dc:creator>
      <dc:date>2013-08-05T05:08:40Z</dc:date>
    </item>
    <item>
      <title>Re: Custom Logfile extract fields help</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Custom-Logfile-extract-fields-help/m-p/24301#M263</link>
      <description>&lt;P&gt;You do not need to program anything specific to get these fields out of your data. These look like SYSLOG style messages. It should be noted that Splunk recommends using the &lt;A href="http://docs.splunk.com/Documentation/Splunk/5.0.3/Knowledge/UnderstandandusetheCommonInformationModel"&gt;Common Information Model&lt;/A&gt; to standardize the naming convention for fields extracted from your data. You are not mandated to do this but it is a best practice recommendation.&lt;/P&gt;

&lt;P&gt;In your case, the following will extract most fields appropriately. If necessary, just rename them.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;^(?&amp;lt;date&amp;gt;\w{3}\s\d{2}\s\d{2}:\d{2}:\d{2})\s(?&amp;lt;hostname&amp;gt;[a-zA-Z0-9]+)\s(?&amp;lt;message_type&amp;gt;\w+)\[(?&amp;lt;message_id&amp;gt;\d+)\]\:\s+(?&amp;lt;epoch&amp;gt;\d{10}\.\d{3})\s+\d+\s(?&amp;lt;src&amp;gt;\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s(?&amp;lt;action&amp;gt;[A-Z_]+?)\/(?&amp;lt;status&amp;gt;\d{3})\s+(?&amp;lt;bytes_in&amp;gt;\d+)\s+(?&amp;lt;method&amp;gt;[A-Z]+)\s+(?&amp;lt;url&amp;gt;.+?)\s+(?&amp;lt;user&amp;gt;[a-z]+|\-)\s+(?&amp;lt;other&amp;gt;[A-Z_]+)/(?&amp;lt;http_user_agent&amp;gt;\w+|-)\s+(?&amp;lt;http_content_type&amp;gt;.+?)$&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;So, you could run a search using this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;sourcetype=squid | rex "\w{3}\s\d{2}\s\d{2}:\d{2}:\d{2})\s(?&amp;lt;hostname&amp;gt;[a-zA-Z0-9]+)\s(?&amp;lt;message_type&amp;gt;\w+)\[(?&amp;lt;message_id&amp;gt;\d+)\]\:\s+(?&amp;lt;epoch&amp;gt;\d{10}\.\d{3})\s+\d+\s(?&amp;lt;src&amp;gt;\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s(?&amp;lt;action&amp;gt;[A-Z_]+?)\/(?&amp;lt;status&amp;gt;\d{3})\s+(?&amp;lt;bytes_in&amp;gt;\d+)\s+(?&amp;lt;method&amp;gt;[A-Z]+)\s+(?&amp;lt;url&amp;gt;.+?)\s+(?&amp;lt;user&amp;gt;[a-z]+|\-)\s+(?&amp;lt;other&amp;gt;[A-Z_]+)/(?&amp;lt;http_user_agent&amp;gt;\w+|-)\s+(?&amp;lt;http_content_type&amp;gt;.+?)$"&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Which will render you something like this:&lt;/P&gt;

&lt;P&gt;&lt;IMG src="http://splunk-base.splunk.com//storage/Untitled1400.png" alt="alt text" /&gt;&lt;/P&gt;

&lt;P&gt;Once you've confirmed that this is what you need, automate the extraction by navigating SplunkWeb to &lt;/P&gt;

&lt;P&gt;Manager &amp;gt;&amp;gt; Fields &amp;gt;&amp;gt; Field Extractions&lt;BR /&gt;
&lt;BR /&gt;Click "New"&lt;BR /&gt;
&lt;BR /&gt;Fill in the blanks&lt;/P&gt;

&lt;P&gt;&lt;BR /&gt;&lt;IMG src="http://splunk-base.splunk.com//storage/Untitled1401.png" alt="alt text" /&gt;&lt;/P&gt;

&lt;P&gt;And enjoy your automatic extractions. There are multiple ways to accomplish this but this is the most straight forward.&lt;/P&gt;

&lt;P&gt;--gc&lt;/P&gt;</description>
      <pubDate>Mon, 05 Aug 2013 06:10:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Custom-Logfile-extract-fields-help/m-p/24301#M263</guid>
      <dc:creator>Gilberto_Castil</dc:creator>
      <dc:date>2013-08-05T06:10:20Z</dc:date>
    </item>
  </channel>
</rss>

