Splunk Search

Why am I unable to extract multivalue fields from XML events with my current props.conf and transforms.conf?

Contributor

I am trying to extract multivalue fields from XML events by using transforms.conf and props.conf.

<Event><System><Provider Name="Netapp-Security-Auditing"/><EventID>4624</EventID><EventName>Logon Attempt</EventName><Version>101.1</Version><Source>CIFS</Source><Level>0</Level><Opcode>0</Opcode><Keywords>0x8020000000000000</Keywords><Result>Audit Success</Result><TimeCreated SystemTime="2016-02-02T15:47:18.408488000Z"/><Correlation/><Channel>Security</Channel><Computer>CN</Computer><Security/></System><EventData><Data Name="IpAddress" IPVersion="4">xx.xx.xx.xx</Data><Data Name="IpPort">39963</Data><Data Name="TargetUserSid">UID</Data><Data Name="TargetUserName">UN</Data><Data Name="TargetUserIsLocal">false</Data><Data Name="TargetDomainName">US</Data><Data Name="AuthenticationPackageName">NTLM_V1</Data><Data Name="LogonType">3</Data></EventData></Event>

props.conf

[parse_xml]
TIME_PREFIX=SystemTime
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
KV_MODE=XML
TRANSFORMS-null1 = null_queue_nasxml_filter
SEDCMD-fillnull = s/><\/Data>/>NoValue<\/Data>/g
REPORT-xmlext = xml-extr

Transforms.conf

[null_queue_nasxml_filter]
REGEX = ^<(Events\sxmlns|/Events>)
DEST_KEY = queue
FORMAT = nullQueue
#MV_ADD = true

[xml-extr]
#REGEX = <(w+)>([^>]*)<1>
REGEX = <Data Name="([^>]+)">([^<]+)</Data>
FORMAT = $1::$2
MV_ADD = true
#REPEAT_MATCH = true ( tried with and without this option)

I have used above conf settings, but it is not extracting multivalue fields enclosed within tags like TargetDomainName

What I am missing here?

0 Karma
1 Solution

Influencer

Hi. It looks like you are looking at NetApp audit data - I am currently working with it as well so please do get in touch if you want to compare notes, cause its tricky (please please please let me know if you have an answer to this https://answers.splunk.com/answers/350252/netapp-xml-audit-data-makes-the-file-monitor-stop.html)

I am currently using the following props and transforms. For your exact problem look at [extractxmlnetapp1]. The _KEY1 VALUE1 syntax makes Splunk keep matching, so it will extract all the Data Name= fields. Have a look at the REGEX = section in http://docs.splunk.com/Documentation/Splunk/latest/Admin/Transformsconf

(Note I am abbreviating the source name as we have to index the rolled log files rather than the audit_last.xml - see my open question above for why!)

props.conf

[xml_netapp]
DATETIME_CONFIG = /etc/datetime.xml
MAX_TIMESTAMP_LOOKAHEAD = 200
TIME_PREFIX = \<TimeCreated\sSystemTime="
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%6N%Z
LINE_BREAKER =  (<Event>)
SHOULD_LINEMERGE = false
TRANSFORMS-1_source = abbreviate_netapp_audit_source
TRANSFORMS-filter = filterdiscard_xml_netapp
REPORT-eventdata = extract_xml_netapp_1, extract_xml_netapp_2
KV_MODE = xml

transforms.conf

[extract_xml_netapp_1]
REGEX = Name="(?<_KEY_1>\w+?)".*?>(?<_VAL_1>[^<]+)    

[extract_xml_netapp_2]
REGEX = Name="SubjectUnix" Uid="(?<SubjectUnix_Uid>\w+?)" Gid="(?<SubjectUnix_Gid>\w+?)" Local="(?<SubjectUnix_Local>\w+?)"    

[filterdiscard_xml_netapp]
REGEX = ^<\/|^<Events
DEST_KEY = queue
FORMAT = nullQueue    

[abbreviate_netapp_audit_source]
SOURCE_KEY = MetaData:Source
REGEX = \/([^_]+)_([^_]+)
DEST_KEY = MetaData:Source
FORMAT = source::/$1_$2.xml

View solution in original post

Influencer

Hi. It looks like you are looking at NetApp audit data - I am currently working with it as well so please do get in touch if you want to compare notes, cause its tricky (please please please let me know if you have an answer to this https://answers.splunk.com/answers/350252/netapp-xml-audit-data-makes-the-file-monitor-stop.html)

I am currently using the following props and transforms. For your exact problem look at [extractxmlnetapp1]. The _KEY1 VALUE1 syntax makes Splunk keep matching, so it will extract all the Data Name= fields. Have a look at the REGEX = section in http://docs.splunk.com/Documentation/Splunk/latest/Admin/Transformsconf

(Note I am abbreviating the source name as we have to index the rolled log files rather than the audit_last.xml - see my open question above for why!)

props.conf

[xml_netapp]
DATETIME_CONFIG = /etc/datetime.xml
MAX_TIMESTAMP_LOOKAHEAD = 200
TIME_PREFIX = \<TimeCreated\sSystemTime="
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%6N%Z
LINE_BREAKER =  (<Event>)
SHOULD_LINEMERGE = false
TRANSFORMS-1_source = abbreviate_netapp_audit_source
TRANSFORMS-filter = filterdiscard_xml_netapp
REPORT-eventdata = extract_xml_netapp_1, extract_xml_netapp_2
KV_MODE = xml

transforms.conf

[extract_xml_netapp_1]
REGEX = Name="(?<_KEY_1>\w+?)".*?>(?<_VAL_1>[^<]+)    

[extract_xml_netapp_2]
REGEX = Name="SubjectUnix" Uid="(?<SubjectUnix_Uid>\w+?)" Gid="(?<SubjectUnix_Gid>\w+?)" Local="(?<SubjectUnix_Local>\w+?)"    

[filterdiscard_xml_netapp]
REGEX = ^<\/|^<Events
DEST_KEY = queue
FORMAT = nullQueue    

[abbreviate_netapp_audit_source]
SOURCE_KEY = MetaData:Source
REGEX = \/([^_]+)_([^_]+)
DEST_KEY = MetaData:Source
FORMAT = source::/$1_$2.xml

View solution in original post

Explorer

Hello Jplumsdaine22,
Thanks for sharing this.

Would you mind help me understand how to use all of your solution? Where should I create those files? On /etc/app/customNetapp (something like that)? Do I have to have those files up before importing the XML, or can I add them later?

Thanks

0 Karma

Influencer

The REPORT stanza is a search time setting so you can add that at anytime. The TRANSFORMS I am using are index time settings, so they will need to be in place before you start indexing the data.

You have some options regarding where you place the files. (http://docs.splunk.com/Documentation/Splunk/6.4.0/Admin/Wheretofindtheconfigurationfiles) For example you could just create them in $SPLUNK_HOME/etc/system/local/ on you r HF and indexer. But as stated above, best practice is to create an app, and put the props and transforms .conf files in its 'default' directory

Have a read here for more info, or google 'make a splunk app' 🙂
http://dev.splunk.com/view/get-started/SP-CAAAESC

0 Karma

Contributor

Thanks. These settings only work on search head to extract properly? or it will work if i give these settings on heavy forwarder?

0 Karma

Influencer

** updated **

All the index time things need to be on the HF and indexer, all the search time settings need to be on the search head. However best practice would be to create an app with the one set of files and deploy that to all three

Thanks @Jeremiah for the correction

0 Karma