Hi,
I have a need for field extraction. I have a sourcetype that has compliance related information for our use case. This data has field name "Text". This field has data coming in variations. Below are two of the many variations. I need the extraction via regex that can detect fields within tags and parse them out. Data cardinality will be by:
<cm:compliance-check-id>36c4d07cc410439bf3bf79f7f5942672</cm:compliance-check-id>
Sample: 1
<cm:compliance-result>WARNING</cm:compliance-result>
<cm:compliance-actual-value>Error -- evaluation period has ended</cm:compliance-actual-value>
<cm:compliance-check-id>36c4d07cc410439bf3bf79f7f5942672</cm:compliance-check-id>
<cm:compliance-policy-value>WARNING</cm:compliance-policy-value>
<cm:compliance-check-name>Connection error</cm:compliance-check-name>
Sample: 2
<compliance>true</compliance>
<cm:compliance-check-name>WN10-00-000005 - Domain-joined systems must use Windows 10 Enterprise Edition 64-bit version - 64-bit</cm:compliance-check-name>
<cm:compliance-audit-file>DISA_STIG_Windows_10_v1r20.audit</cm:compliance-audit-file>
<cm:compliance-check-id>55aeff4f26d6b8307f6f9672750a5548</cm:compliance-check-id>
<cm:compliance-actual-value>'64-bit'</cm:compliance-actual-value>
<cm:compliance-policy-value>'64-bit'</cm:compliance-policy-value>
<cm:compliance-info> Features such as Credential Guard use virtualization based security to protect information that could be used in credential theft attacks if compromised. There are a number of system requirements that must be met in order for Credential Guard to be configured and enabled properly. Virtualization based security and Credential Guard are only available with Windows 10 Enterprise 64-bit version. </cm:compliance-info>
<cm:compliance-result>PASSED</cm:compliance-result>
<cm:compliance-reference>800-171|3.4.1,800-53|CM-8,CAT|II,CCI|CCI-000366,CN-L3|8.1.10.2(a),CN-L3|8.1.10.2(b),CSF|DE.CM-7,CSF|ID.AM-1,CSF|ID.AM-2,CSF|PR.DS-3,ISO/IEC-27001|A.8.1.1,ITSG-33|CM-8,NESA|T1.2.1,NESA|T1.2.2,NIAv2|NS35,Rule-ID|SV-77809r3_rule,STIG-ID|WN10-00-000005,Vuln-ID|V-63319</cm:compliance-reference>
<cm:compliance-see-also>https://dl.dod.cyber.mil/wp-content/uploads/stigs/zip/U_MS_Windows_10_V1R20_STIG.zip</cm:compliance-see-also>
Thanks in-advance!!!
Hi @mbasharat
Ah OK. Have you looked at the xpath command then. It should automatically be able to do this for you.
https://docs.splunk.com/Documentation/Splunk/8.0.6/SearchReference/Xpath
Otherwise, using transforms.conf and props.conf configuration can be used on your search head to auto extract these fields.
For example, on the search head(s)
transforms.conf
...
[xml-extract]
REGEX = ^<(?:cm:)*([^\>]+)>([^<]+)
FORMAT = $1::$2
props.conf (references the transforms rule)
...
[...your sourcetype...]
REPORT-extractXMLfields = xml-extract
This can be done via the search head UI too.
Hi @mbasharat
I believe you're asking for a regex to just extract the compliance-check-id for the event. Is this correct? There are a few ways to do this but if you just want that one field then this will work for you.
...
| rex "id>(?<complianceCheckID>[a-fA-F0-9]+)\<"
...
Hope it helps.
Hi @yeahnah,
Appreciate your support first of all. I need all the fields extracted that are coming in tags:
<cm:sample>sample</cm:sample>
Hi @mbasharat
Ah OK. Have you looked at the xpath command then. It should automatically be able to do this for you.
https://docs.splunk.com/Documentation/Splunk/8.0.6/SearchReference/Xpath
Otherwise, using transforms.conf and props.conf configuration can be used on your search head to auto extract these fields.
For example, on the search head(s)
transforms.conf
...
[xml-extract]
REGEX = ^<(?:cm:)*([^\>]+)>([^<]+)
FORMAT = $1::$2
props.conf (references the transforms rule)
...
[...your sourcetype...]
REPORT-extractXMLfields = xml-extract
This can be done via the search head UI too.
Thanks you!!!