I have a data set with multiple key pair field values that start with the same key name.
Data source is Web Sense proxy logs authenticated by Active Directory.
=1 category=227 user=LDAP://1.2.3.4 OU=Sub Department,OU=IS,OU=HQ,OU=Employee,OU=WidgetCo,DC=subdomain,DC=TLDdomain,DC=local/Splunk Nerd src_host=192.168.100.200
=1 category=227 user=LDAP://1.2.3.4 OU=Sub Department,OU=IS,OU=HQ,OU=Employee,OU=WidgetCo,DC=subdomain,DC=TLDdomain,DC=local/Splunk Nerd src_host=192.168.100.200
=7 category=1526 user=LDAP://1.2.3.4 OU=Sub Department,OU=IS,OU=HQ,OU=Employee,OU=WidgetCo,DC=subdomain,DC=TLDdomain,DC=local/Splunk Nerd src_host=192.168.100.200
By default Splunk is parsing the first OU= and the first DC=. However, it is not parsing the remaining OU and DC pairs. I tried using | eval NextOU=mvindex(OU,1). That does not seem to be working. I wonder if that is because the OU= pairs are all on the same line?
I have a working regex that allows me to parse out the username. | rex field=_raw "^.local\/(?P.?)src_host.*$" I could create a regex to parse out each OU and DC pair. However, there is the possibility a particular user may be nested under more or less OU’s.
Not sure what I am missing here. I looked into the extract command, but I think Splunk is working as expected.
Try this and see... carefully copy the exact spacing, etc.
In props.conf
[yoursourcetypehere]
REPORT-websense_ext=websense_extraction
In transforms.conf
[websense_extraction]
DELIMS = ", ", "="
MV_ADD = true
Try this and see... carefully copy the exact spacing, etc.
In props.conf
[yoursourcetypehere]
REPORT-websense_ext=websense_extraction
In transforms.conf
[websense_extraction]
DELIMS = ", ", "="
MV_ADD = true
Hi lquinn,
The information you provided works, thank you. However, more event fields are created with garbage data. I think it may have to do with there are other key value pairs separated by spaces in the same event. I am reviewing the props.conf and transforms.conf documentation to better understand what is occurring.
What do you think? Can Splunk handle parsing the key value pairs that have spaces and commas in the same event?
Here is an entire event.
Dec 3 16:48:31 101.1.1.42 vendor=Websense product=Security product_version=3.2.1 action=permitted severity=1 category=17 user=LDAP://1.2.3.4 OU=Sub Department,OU=IS,OU=HQ,OU=Employee,OU=WidgetCo,DC=subdomain,DC=TLDDomain,DC=local/Splunk Nerd src_host=1.2.3.4 src_port=60608 dst_host=context.bestbuy.com dst_ip=172.226.16.62 dst_port=80 bytes_out=2102 bytes_in=768 http_response=200 http_method=GET http_content_type=image/gif http_user_agent=Mozilla/5.0_(compatible;_MSIE_9.0;_Windows_NT_6.1;_WOW64;_Trident/5.0) http_proxy_status_code=200 reason=- disposition=1048 policy=Web Surfer role=8 duration=3 url=http://context.bestbuy.com/
Other things to try:
Leave out the DELIMS
attribute, but keep the MV_ADD
. Don't change anything else in the answer above and see what happens.
Replace the transforms.conf
stanza with
[websense_extraction_ou]
REGEX=(OU)=(\S+?)(:?\s|,)
FORMAT = $1::$2
MV_ADD = true
[websense_extraction_dc]
REGEX=(DC)=(\S+?)(:?\s|,)
FORMAT = $1::$2
MV_ADD = true
and props.conf
becomes
[yoursourcetypehere]
REPORT-websense_ext=websense_extraction_ou,websense_extraction_dc
This works perfectly. I may have been incorrect about the additional key value pairs being created due to the props.conf and transforms.conf modification. What appears to be happening is Splunk is parsing additional event fields out of really long URL strings in each event that contain sometext=sometext. Depending on the the results to my search, I sometimes have more or less of goofy event fields.
Thank you again for the help.
Yes, I have seen that problem with URL strings, too. There isn't much you can do about it, except just ignore the weird fields.