Splunk Search

How do you extract key/value pairs when the data is verbose and sometimes inconsistent?

joemiller
Path Finder

I'm having trouble extracting key/value pairs from a set of data. I think there are two separate problems that are making this difficult.

  1. The key/value data has redundant descriptors. For example rather than {"foo":"bar"}, the data looks like {"Name":"foo","Value":"bar"}.
  2. The data is sometimes coming in with the value before the key, e.g. {"Value":"bar","Name":"foo"}

Each event also has several of these pairs (I'll include some examples below). I'm looking for a way to consistently extract from this data such that I get a new field (e.g. foo) with the corresponding value (e.g. bar) for each key/value pair in each event.

_raw examples (data has been anonymized, but I haven't changed the structure):

Key/Value:

{"OrganizationName": "example.com", "Parameters": [{"Name": "Identity", "Value": "user@example.com"}, {"Name": "AccessRights", "Value": "ReadPermission"}, {"Name": "User", "Value": "Example-Username"}], "OrganizationId": "012-345-6789", "Operation": "Get-Something", "SessionId": "", "Workload": "Exchange", "CreationTime": "2018-10-12T22:08:13", "UserKey": "000011122233334445555", "ExternalAccess": false, "Version": 1, "Id": "00000000-0000-0000-0000-0000000000", "ObjectId": "targetusername", "ClientIP": "192.168.0.1:12345", "UserId": "admin@example.com", "RecordType": 1, "ResultStatus": "True", "UserType": 2, "OriginatingServer": "ABCD00000 (00.00.0000.000)"}

Value/Key:

{"OrganizationName": "example.com", "Parameters": [{"Value": "user@example.com", "Name": "Identity"}, {"Value": "Example-Username", "Name": "User"}, {"Value": "FullAccess", "Name": "AccessRights"}, {"Value": "All", "Name": "InheritanceType"}], "OrganizationId": "012-345-6789", "Operation": "Get-Something", "SessionId": "", "Workload": "Exchange", "CreationTime": "2018-10-12T22:08:13", "UserKey": "000011122233334445555", "ExternalAccess": false, "Version": 1, "Id": "00000000-0000-0000-0000-0000000000", "ObjectId": "targetusername", "ClientIP": "192.168.0.1:12345", "UserId": "admin@example.com", "RecordType": 1, "ResultStatus": "True", "UserType": 2, "OriginatingServer": "ABCD00000 (00.00.0000.000)"} 

How can I consistently extract all of the key/value pairs within "Parameters": [] ?

0 Karma
1 Solution

kamlesh_vaghela
SplunkTrust
SplunkTrust

Hi @joemiller,

Can you please try the following search?

YOUR_SEARCH
| rename Parameters{}.* as * 
| eval temp=mvzip(Name,Value) 
| stats values(*) as * by _time Id temp 
| eval Name=mvindex(split(temp,","),0),Value=mvindex(split(temp,","),1) 
| eval {Name}=Value 
| stats values(*) as * by _time,Id 
| fields - temp Name Value

My Sample Search:

| makeresults 
| eval _raw="{\"OrganizationName\": \"example.com\", \"Parameters\": [{\"Name\": \"Identity\", \"Value\": \"user@example.com\"}, {\"Name\": \"AccessRights\", \"Value\": \"ReadPermission\"}, {\"Name\": \"User\", \"Value\": \"Example-Username\"}], \"OrganizationId\": \"012-345-6789\", \"Operation\": \"Get-Something\", \"SessionId\": \"\", \"Workload\": \"Exchange\", \"CreationTime\": \"2018-10-12T22:08:13\", \"UserKey\": \"000011122233334445555\", \"ExternalAccess\": false, \"Version\": 1, \"Id\": \"00000000-0000-0000-0000-0000000000\", \"ObjectId\": \"targetusername\", \"ClientIP\": \"192.168.0.1:12345\", \"UserId\": \"admin@example.com\", \"RecordType\": 1, \"ResultStatus\": \"True\", \"UserType\": 2, \"OriginatingServer\": \"ABCD00000 (00.00.0000.000)\"}" 
| append 
    [| makeresults 
    | eval _raw="{\"OrganizationName\": \"example.com\", \"Parameters\": [{\"Value\": \"user@example.com\", \"Name\": \"Identity\"}, {\"Value\": \"Example-Username\", \"Name\": \"User\"}, {\"Value\": \"FullAccess\", \"Name\": \"AccessRights\"}, {\"Value\": \"All\", \"Name\": \"InheritanceType\"}], \"OrganizationId\": \"012-345-6789\", \"Operation\": \"Get-Something\", \"SessionId\": \"\", \"Workload\": \"Exchange\", \"CreationTime\": \"2018-10-12T22:08:13\", \"UserKey\": \"000011122233334445555\", \"ExternalAccess\": false, \"Version\": 1, \"Id\": \"00000000-0000-0000-0000-0000000001\", \"ObjectId\": \"targetusername\", \"ClientIP\": \"192.168.0.1:12345\", \"UserId\": \"admin@example.com\", \"RecordType\": 1, \"ResultStatus\": \"True\", \"UserType\": 2, \"OriginatingServer\": \"ABCD00000 (00.00.0000.000)\"} "] 
| kv 
| rename Parameters{}.* as * 
| eval temp=mvzip(Name,Value) 
| stats values(*) as * by _time Id temp 
| eval Name=mvindex(split(temp,","),0),Value=mvindex(split(temp,","),1) 
| eval {Name}=Value 
| stats values(*) as * by _time,Id 
| fields - temp Name Value

Thanks

View solution in original post

kamlesh_vaghela
SplunkTrust
SplunkTrust

Hi @joemiller,

Can you please try the following search?

YOUR_SEARCH
| rename Parameters{}.* as * 
| eval temp=mvzip(Name,Value) 
| stats values(*) as * by _time Id temp 
| eval Name=mvindex(split(temp,","),0),Value=mvindex(split(temp,","),1) 
| eval {Name}=Value 
| stats values(*) as * by _time,Id 
| fields - temp Name Value

My Sample Search:

| makeresults 
| eval _raw="{\"OrganizationName\": \"example.com\", \"Parameters\": [{\"Name\": \"Identity\", \"Value\": \"user@example.com\"}, {\"Name\": \"AccessRights\", \"Value\": \"ReadPermission\"}, {\"Name\": \"User\", \"Value\": \"Example-Username\"}], \"OrganizationId\": \"012-345-6789\", \"Operation\": \"Get-Something\", \"SessionId\": \"\", \"Workload\": \"Exchange\", \"CreationTime\": \"2018-10-12T22:08:13\", \"UserKey\": \"000011122233334445555\", \"ExternalAccess\": false, \"Version\": 1, \"Id\": \"00000000-0000-0000-0000-0000000000\", \"ObjectId\": \"targetusername\", \"ClientIP\": \"192.168.0.1:12345\", \"UserId\": \"admin@example.com\", \"RecordType\": 1, \"ResultStatus\": \"True\", \"UserType\": 2, \"OriginatingServer\": \"ABCD00000 (00.00.0000.000)\"}" 
| append 
    [| makeresults 
    | eval _raw="{\"OrganizationName\": \"example.com\", \"Parameters\": [{\"Value\": \"user@example.com\", \"Name\": \"Identity\"}, {\"Value\": \"Example-Username\", \"Name\": \"User\"}, {\"Value\": \"FullAccess\", \"Name\": \"AccessRights\"}, {\"Value\": \"All\", \"Name\": \"InheritanceType\"}], \"OrganizationId\": \"012-345-6789\", \"Operation\": \"Get-Something\", \"SessionId\": \"\", \"Workload\": \"Exchange\", \"CreationTime\": \"2018-10-12T22:08:13\", \"UserKey\": \"000011122233334445555\", \"ExternalAccess\": false, \"Version\": 1, \"Id\": \"00000000-0000-0000-0000-0000000001\", \"ObjectId\": \"targetusername\", \"ClientIP\": \"192.168.0.1:12345\", \"UserId\": \"admin@example.com\", \"RecordType\": 1, \"ResultStatus\": \"True\", \"UserType\": 2, \"OriginatingServer\": \"ABCD00000 (00.00.0000.000)\"} "] 
| kv 
| rename Parameters{}.* as * 
| eval temp=mvzip(Name,Value) 
| stats values(*) as * by _time Id temp 
| eval Name=mvindex(split(temp,","),0),Value=mvindex(split(temp,","),1) 
| eval {Name}=Value 
| stats values(*) as * by _time,Id 
| fields - temp Name Value

Thanks

joemiller
Path Finder

This is great! Thank you very much! It works for both of the examples I gave in my original post, so I will mark it as the answer.

I just got another event where one of the parameters has a blank value, and in that case, this doesn't quite work. I'm going to play around with it a bit and see if I can make it work for that case too. If you have any suggestions, I would also appreciate that very much! Here's that example, in the makeresults format you used (I just used one of the existing ones and replaced the "Parameters[]" value, and incremented the "Id" value):

[| makeresults | eval _raw="{\"OrganizationName\": \"example.com\", \"Parameters\": [{\"Value\": \"\", \"Name\": \"DomainController\"}, {\"Value\": \"example.com/Microsoft Exchange Hosted Organizations/example.com/DiscoverySearchMailbox{000000-00000-00000-0000-00000}\", \"Name\": \"Identity\"}, {\"Value\": \"ABC000000.example.com/Microsoft Exchange Hosted Organizations/example.com/Discovery Management\", \"Name\": \"User\"}, {\"Value\": \"FullAccess\", \"Name\": \"AccessRights\"}], \"OrganizationId\": \"012-345-6789\", \"Operation\": \"Get-Something\", \"SessionId\": \"\", \"Workload\": \"Exchange\", \"CreationTime\": \"2018-10-12T22:08:13\", \"UserKey\": \"000011122233334445555\", \"ExternalAccess\": false, \"Version\": 1, \"Id\": \"00000000-0000-0000-0000-0000000003\", \"ObjectId\": \"targetusername\", \"ClientIP\": \"192.168.0.1:12345\", \"UserId\": \"admin@example.com\", \"RecordType\": 1, \"ResultStatus\": \"True\", \"UserType\": 2, \"OriginatingServer\": \"ABCD00000 (00.00.0000.000)\"} "]

The thing that is different in this event is that the "Name":"DomainController" has a "Value":"" (blank). This causes your search to produce AccessRights="" (blank) and User="FullAccess" (which , should be the value of AccessRights).

0 Karma

joemiller
Path Finder

Okay, I ended up adding this line:

| rex mode=sed "s/Value\": \"\"/Value\": \"empty\"/"

right before all the other commands you suggested, and that fixed the problem related to empty values of "Value". But please let me know if you suggest something different. Thanks again for your help!

0 Karma

joemiller
Path Finder

Okay actually, I still have a small problem.. when I do this on the test search using makeresults, I have to do the rex and then do | kv to get the Parameters field to use the modified version of _raw. But if I do that rex on a "live" search and then do | kv, I get duplicates of every field..

0 Karma

joemiller
Path Finder

And I think I fixed that last issue by adding a | fields _raw before the | kv..

0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

Hi @joemiller,
Glad to help you. It seems you did some workarounds with my solutions to get a better result. That's a great thing to learning and improving our knowledge. Keep it up.

Happy Splunking

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...