<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Multi-valued Index-time key extraction not working, please help! in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76818#M19436</link>
    <description>&lt;P&gt;Hi all,&lt;/P&gt;

&lt;P&gt;I've been struggling with Splunk for weeks now (and had Developer training!) and I still can't get it to do what I want it to do, so here begins the first of many questions....&lt;/P&gt;

&lt;P&gt;I'm attempting to build an app that does a single parse of some static data. Basically it's designed to read in lots of files and then using a dashboard, display the data in a meaningful way.&lt;/P&gt;

&lt;P&gt;As such I'm attempting to do Index-time field extraction, as I want the displays to be as fast as possible for the end user. I've tried this a thousand ways and I can't get it working &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;All of the data is in XML format, and a large chunk of it features multiple field values, which is where I'm getting stuck. I can extract multi-valued fields with no problem using REX, but it seems to refuse to do it using the config files. I've compiled the following example to show you what I mean, I've just done it with one file, but I'm having the same problem with all files I'm pulling in:&lt;/P&gt;

&lt;P&gt;props.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[nessus]
SHOULD_LINEMERGE = False
LINE_BREAKER = (?&amp;lt;=&amp;lt;/ReportHost&amp;gt;)([\r\n]+)
TRUNCATE = 0
TRANSFORMS-nessus_high_vulnerbility = nessus_high_vulnerbility
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;transforms.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[nessus_high_vulnerbility]
REGEX = &amp;lt;ReportItem.*severity=\"3\".*pluginName=\"([^"]+)\"
FORMAT = nessus_high_vulnerbility::"$1"
LOOKAHEAD = 10000000000
WRITE_META = true
REPEAT_MATCH = true
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;fields.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[nessus_high_vulnerbility]
INDEXED = true
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Example data&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;&amp;lt;Report name="1.1.1.1"&amp;gt;
&amp;lt;ReportHost name="1.1.1.1"&amp;gt;&amp;lt;HostProperties&amp;gt;
&amp;lt;tag name="HOST_END"&amp;gt;Tue Nov 22 12:06:01 2011&amp;lt;/tag&amp;gt;
&amp;lt;tag name="system-type"&amp;gt;general-purpose&amp;lt;/tag&amp;gt;
&amp;lt;tag name="operating-system"&amp;gt;Linux Kernel 2.6.9-101.ELsmp on Red Hat Enterprise Linux ES release 4 (Nahant Update 9)&amp;lt;/tag&amp;gt;
&amp;lt;tag name="mac-address"&amp;gt;00:00:00:00:00:00&amp;lt;/tag&amp;gt;
&amp;lt;ReportItem port="1234" svc_name="snmp?" protocol="udp" severity="3" pluginID="51160" pluginName="SNMP Agent Default Community Name (public)" pluginFamily="SNMP"&amp;gt;
&amp;lt;/ReportItem&amp;gt;
&amp;lt;ReportItem port="0" svc_name="general" protocol="tcp" severity="3" pluginID="21157" pluginName="Unix Compliance Checks" pluginFamily="Policy Compliance"&amp;gt;
&amp;lt;/ReportItem&amp;gt;
&amp;lt;/ReportHost&amp;gt;
&amp;lt;/Report&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Now if I search for * it tells me that the &lt;CODE&gt;"nessus_high_vulnerbility"&lt;/CODE&gt; field has one result.&lt;/P&gt;

&lt;P&gt;But if I do the following search, the &lt;CODE&gt;"high_vulnerbility"&lt;/CODE&gt; field has 2 results, the correct number.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;* | rex "\&amp;lt;ReportItem.*severity=\"3\".*pluginName=\"(?&amp;lt;high_vulnerbility&amp;gt;[^\"]+)\"" max_match=100000
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I've tried everything I can think of, been through the documentation a hundred times, and still can't figure it out. Please help!&lt;/P&gt;

&lt;P&gt;(PS, apologies if the above doesn't come out right, I'm struggling with getting Markdown to play nicely with the pasted code)&lt;/P&gt;</description>
    <pubDate>Mon, 01 Oct 2012 20:30:44 GMT</pubDate>
    <dc:creator>jonaubrey</dc:creator>
    <dc:date>2012-10-01T20:30:44Z</dc:date>
    <item>
      <title>Multi-valued Index-time key extraction not working, please help!</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76818#M19436</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;

&lt;P&gt;I've been struggling with Splunk for weeks now (and had Developer training!) and I still can't get it to do what I want it to do, so here begins the first of many questions....&lt;/P&gt;

&lt;P&gt;I'm attempting to build an app that does a single parse of some static data. Basically it's designed to read in lots of files and then using a dashboard, display the data in a meaningful way.&lt;/P&gt;

&lt;P&gt;As such I'm attempting to do Index-time field extraction, as I want the displays to be as fast as possible for the end user. I've tried this a thousand ways and I can't get it working &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;All of the data is in XML format, and a large chunk of it features multiple field values, which is where I'm getting stuck. I can extract multi-valued fields with no problem using REX, but it seems to refuse to do it using the config files. I've compiled the following example to show you what I mean, I've just done it with one file, but I'm having the same problem with all files I'm pulling in:&lt;/P&gt;

&lt;P&gt;props.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[nessus]
SHOULD_LINEMERGE = False
LINE_BREAKER = (?&amp;lt;=&amp;lt;/ReportHost&amp;gt;)([\r\n]+)
TRUNCATE = 0
TRANSFORMS-nessus_high_vulnerbility = nessus_high_vulnerbility
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;transforms.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[nessus_high_vulnerbility]
REGEX = &amp;lt;ReportItem.*severity=\"3\".*pluginName=\"([^"]+)\"
FORMAT = nessus_high_vulnerbility::"$1"
LOOKAHEAD = 10000000000
WRITE_META = true
REPEAT_MATCH = true
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;fields.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[nessus_high_vulnerbility]
INDEXED = true
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Example data&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;&amp;lt;Report name="1.1.1.1"&amp;gt;
&amp;lt;ReportHost name="1.1.1.1"&amp;gt;&amp;lt;HostProperties&amp;gt;
&amp;lt;tag name="HOST_END"&amp;gt;Tue Nov 22 12:06:01 2011&amp;lt;/tag&amp;gt;
&amp;lt;tag name="system-type"&amp;gt;general-purpose&amp;lt;/tag&amp;gt;
&amp;lt;tag name="operating-system"&amp;gt;Linux Kernel 2.6.9-101.ELsmp on Red Hat Enterprise Linux ES release 4 (Nahant Update 9)&amp;lt;/tag&amp;gt;
&amp;lt;tag name="mac-address"&amp;gt;00:00:00:00:00:00&amp;lt;/tag&amp;gt;
&amp;lt;ReportItem port="1234" svc_name="snmp?" protocol="udp" severity="3" pluginID="51160" pluginName="SNMP Agent Default Community Name (public)" pluginFamily="SNMP"&amp;gt;
&amp;lt;/ReportItem&amp;gt;
&amp;lt;ReportItem port="0" svc_name="general" protocol="tcp" severity="3" pluginID="21157" pluginName="Unix Compliance Checks" pluginFamily="Policy Compliance"&amp;gt;
&amp;lt;/ReportItem&amp;gt;
&amp;lt;/ReportHost&amp;gt;
&amp;lt;/Report&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Now if I search for * it tells me that the &lt;CODE&gt;"nessus_high_vulnerbility"&lt;/CODE&gt; field has one result.&lt;/P&gt;

&lt;P&gt;But if I do the following search, the &lt;CODE&gt;"high_vulnerbility"&lt;/CODE&gt; field has 2 results, the correct number.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;* | rex "\&amp;lt;ReportItem.*severity=\"3\".*pluginName=\"(?&amp;lt;high_vulnerbility&amp;gt;[^\"]+)\"" max_match=100000
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I've tried everything I can think of, been through the documentation a hundred times, and still can't figure it out. Please help!&lt;/P&gt;

&lt;P&gt;(PS, apologies if the above doesn't come out right, I'm struggling with getting Markdown to play nicely with the pasted code)&lt;/P&gt;</description>
      <pubDate>Mon, 01 Oct 2012 20:30:44 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76818#M19436</guid>
      <dc:creator>jonaubrey</dc:creator>
      <dc:date>2012-10-01T20:30:44Z</dc:date>
    </item>
    <item>
      <title>Re: Multi-valued Index-time key extraction not working, please help!</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76819#M19437</link>
      <description>&lt;P&gt;I fixed your formatting a bit - please check that it came out as you originally intended. Code blocks should be indented with 4 spaces at the beginning of the line in order to be correctly interpreted.&lt;/P&gt;</description>
      <pubDate>Mon, 01 Oct 2012 21:34:55 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76819#M19437</guid>
      <dc:creator>Ayn</dc:creator>
      <dc:date>2012-10-01T21:34:55Z</dc:date>
    </item>
    <item>
      <title>Re: Multi-valued Index-time key extraction not working, please help!</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76820#M19438</link>
      <description>&lt;P&gt;First off, I highly doubt you really want to use index-time field extractions unless you really really know what you are doing and why. Index-time extractions will in fact most often decrease performance rather than increase it. Indexed fields do not work the same way as they do in traditional RDBMS's - if you're trying to apply that kind of thinking in Splunk, that's wrong. Use search-time field extractions - the performance is better and it makes Splunk's behaviour less confusing and more flexible. So, I would advise you to change your TRANSFORMS directive in props.conf to a REPORT directive instead.&lt;/P&gt;

&lt;P&gt;That said, I think the issue here is that Splunk will match your regex only once unless you specify &lt;CODE&gt;MV_ADD = true&lt;/CODE&gt;, which makes Splunk continue looking for matches in the event even after it's found the first one. &lt;CODE&gt;MV_ADD&lt;/CODE&gt; is only valid for search-time extractions, so you should consider using that kind instead...did I make myself clear enough on what kind of extraction you should be using? &lt;span class="lia-unicode-emoji" title=":winking_face:"&gt;😉&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;As a sidenote, I'm assuming you've seen that there's a Nessus app for Splunk? Don't know if it supports the XML report format though. &lt;A href="http://splunk-base.splunk.com/apps/52460/nessus-in-splunk"&gt;http://splunk-base.splunk.com/apps/52460/nessus-in-splunk&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Oct 2012 21:48:19 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76820#M19438</guid>
      <dc:creator>Ayn</dc:creator>
      <dc:date>2012-10-01T21:48:19Z</dc:date>
    </item>
    <item>
      <title>Re: Multi-valued Index-time key extraction not working, please help!</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76821#M19439</link>
      <description>&lt;P&gt;Many thanks for your reply.&lt;/P&gt;

&lt;P&gt;Yes I've seen the "Nessus In Splunk" app, but it relies on non-XML format, and I'm attempting to standardise all of the outputs from various tools to a single format, XML being the most common.&lt;/P&gt;

&lt;P&gt;Unfortunately I was attempting to do search-time extractions previously and it failed, which is why I swapped over to Index-time extractions.&lt;/P&gt;

&lt;P&gt;Just to confirm, I've changed my config files back to how I had them originally, but with the same result. I've posted them below:&lt;/P&gt;</description>
      <pubDate>Tue, 02 Oct 2012 09:18:00 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76821#M19439</guid>
      <dc:creator>jonaubrey</dc:creator>
      <dc:date>2012-10-02T09:18:00Z</dc:date>
    </item>
    <item>
      <title>Re: Multi-valued Index-time key extraction not working, please help!</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76822#M19440</link>
      <description>&lt;P&gt;Props.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[nessus]
SHOULD_LINEMERGE = False
LINE_BREAKER = (?&amp;lt;=&amp;lt;/ReportHost&amp;gt;)([\r\n]+)
TRUNCATE = 0
REPORT-nessus_high_vulnerbility = nessus_high_vulnerbility
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;transforms.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[nessus_high_vulnerbility]
REGEX = &amp;lt;ReportItem.*severity=\"3\".*pluginName=\"([^"]+)\"
FORMAT = nessus_high_vulnerbility::"$1"
LOOKAHEAD = 10000000000
WRITE_META = true    
MV_ADD = true
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Any other ideas?&lt;/P&gt;</description>
      <pubDate>Tue, 02 Oct 2012 09:18:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76822#M19440</guid>
      <dc:creator>jonaubrey</dc:creator>
      <dc:date>2012-10-02T09:18:25Z</dc:date>
    </item>
    <item>
      <title>Re: Multi-valued Index-time key extraction not working, please help!</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76823#M19441</link>
      <description>&lt;P&gt;That looks pretty OK. What are the current results?&lt;/P&gt;

&lt;P&gt;btw, &lt;CODE&gt;WRITE_META&lt;/CODE&gt; is only valid for index-time extractions. As such it should just be ignored in your current config anyway, but just to simplify things you might as well remove it.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Oct 2012 09:25:22 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76823#M19441</guid>
      <dc:creator>Ayn</dc:creator>
      <dc:date>2012-10-02T09:25:22Z</dc:date>
    </item>
    <item>
      <title>Re: Multi-valued Index-time key extraction not working, please help!</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76824#M19442</link>
      <description>&lt;P&gt;Same as before, the REX extraction pulls out two values, but the transforms extraction only pulls out a single value, "Unix Compliance Checks".&lt;/P&gt;</description>
      <pubDate>Tue, 02 Oct 2012 09:28:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76824#M19442</guid>
      <dc:creator>jonaubrey</dc:creator>
      <dc:date>2012-10-02T09:28:23Z</dc:date>
    </item>
    <item>
      <title>Re: Multi-valued Index-time key extraction not working, please help!</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76825#M19443</link>
      <description>&lt;P&gt;Ah! One idea - since the information is represented as key=value pairs, you might be hitting some issues with Splunk's default key=value extraction mechanism. Basically Splunk tries to be smart about generating field and corresponding values automatically when it sees stuff delimited by &lt;CODE&gt;=&lt;/CODE&gt; signs, putting the lefthand side as the fieldname and the righthand side as the value. This extraction does not have &lt;CODE&gt;MV_ADD = true&lt;/CODE&gt; I believe.&lt;/P&gt;

&lt;P&gt;Try setting &lt;CODE&gt;KV_MODE = none&lt;/CODE&gt; in your &lt;CODE&gt;props.conf&lt;/CODE&gt; settings. This will ensure that automatic key/value extraction is not performed for that stanza.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Oct 2012 09:34:55 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76825#M19443</guid>
      <dc:creator>Ayn</dc:creator>
      <dc:date>2012-10-02T09:34:55Z</dc:date>
    </item>
    <item>
      <title>Re: Multi-valued Index-time key extraction not working, please help!</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76826#M19444</link>
      <description>&lt;P&gt;Great idea!&lt;/P&gt;

&lt;P&gt;Unfortunately, it didn't change anything, I'm still only extracting a single value&lt;/P&gt;</description>
      <pubDate>Tue, 02 Oct 2012 10:07:01 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76826#M19444</guid>
      <dc:creator>jonaubrey</dc:creator>
      <dc:date>2012-10-02T10:07:01Z</dc:date>
    </item>
    <item>
      <title>Re: Multi-valued Index-time key extraction not working, please help!</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76827#M19445</link>
      <description>&lt;P&gt;I've changed the extraction back to a Index time extraction and run "walklex" against the Index. This is showing only a single value instead of multiple values within the index, so something definately isn't getting pulled out right.&lt;/P&gt;

&lt;P&gt;Interestingly the TRANSFORMS extraction pulls out the value "SNMP Agent Default Community Name (public)" and the REPORT extraction pulls out the value "Unix Compliance Checks" even though it's the same REGEX. I guess Splunk is discarding all but one entry but depending on if it's a search-time or index-time extraction, it's either keeping the first or last entry&lt;/P&gt;</description>
      <pubDate>Tue, 02 Oct 2012 11:17:52 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76827#M19445</guid>
      <dc:creator>jonaubrey</dc:creator>
      <dc:date>2012-10-02T11:17:52Z</dc:date>
    </item>
    <item>
      <title>Re: Multi-valued Index-time key extraction not working, please help!</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76828#M19446</link>
      <description>&lt;P&gt;would you mind changing the &lt;CODE&gt;.*&lt;/CODE&gt; in your regex to a non-greedy matching &lt;CODE&gt;.*?&lt;/CODE&gt; and see if that make a difference?&lt;/P&gt;</description>
      <pubDate>Wed, 17 Oct 2012 16:22:17 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76828#M19446</guid>
      <dc:creator>gkanapathy</dc:creator>
      <dc:date>2012-10-17T16:22:17Z</dc:date>
    </item>
    <item>
      <title>Re: Multi-valued Index-time key extraction not working, please help!</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76829#M19447</link>
      <description>&lt;P&gt;Interestingly, when I replace your transform.conf with the following:&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;&lt;BR /&gt;
[getPlugin]&lt;BR /&gt;
REGEX = severity=\"3\".*?pluginName=\"([^\"]+)&lt;BR /&gt;
FORMAT = pluginName::$1&lt;BR /&gt;
MV_ADD = true&lt;BR /&gt;
&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;I get a multi-value field pluginName with 2 values SNMP &amp;amp; UNIX.  So it's something to do with the extended regex you're using.&lt;/P&gt;

&lt;P&gt;To be honest, I'd be hesitant to use the Regex to filter data, instead I'd aim to add all the fields and then filter using Splunks native search capabilities.  You never know when you might need to search using different criteria and by hard coding your results you limit that flexibility.&lt;/P&gt;

&lt;P&gt;As an aside - the xml as written is broken.  The HostProperties tag doesn't seem to be closed.&lt;/P&gt;</description>
      <pubDate>Tue, 30 Oct 2012 16:21:55 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Multi-valued-Index-time-key-extraction-not-working-please-help/m-p/76829#M19447</guid>
      <dc:creator>ahattrell_splun</dc:creator>
      <dc:date>2012-10-30T16:21:55Z</dc:date>
    </item>
  </channel>
</rss>

