Splunk Search

Extracting key-value pairs using regex at search time

jmartens
Path Finder

I am trying to extract multiple key value pairs from data like this:

 

Image |Loading |\path\to\obfuscated\\CT_384.dcm
------------------------------------------------------------------------------------------------
Image |Photometric Interpretation
|MONOCHROME2 |
Image |Compression |EXPLICIT_LITTLE_ENDIAN (with metaheader)
Series |Creation... | |
Image |Image width [mm] |512 |
Image |Image height [mm] |512 |
Image |Bit-Depth |BLuint16 |
Image |Row Vector [mm] |(1/0/0) |
Image |Column Vector [mm] |(0/1/0) |
Image |Image Position [mm] |(-249.512/-504.512/-226.5) |
Image |PixelsizeX [mm] |0.976563 |
Image |PixelsizeY [mm] |0.976563 |
Image |Img Orientation Info |axial; RL: 0.000°, AP: 0.000°, HF: -90.000°; normal: 0.000°; Orientation supported
Image |Horizontal flip |1 |
Image |Vertical flip |0 |
Image |Brainlab Subsystem |[OK] |
Image |InstNumber/BLScanNum |170 |
Image |Series number |0 |
Image |Instance UID |1.3.12.2.1107.5.1.4.53031.000000000000000000001
Image |Study UID |1.3.12.2.1107.5.1.4.53031.000000000000000000001
Image |Series UID |1.3.12.2.1107.5.1.4.53031.000000000000000000000
Image |BitsAllocated |16 |
Image |PatientOrientation |Supine |
Image |HeadFeetOrientation |HeadFirst |
Image |Modality |CT |
Image |SliceThickness [mm] |1 |
Image |AcquisitionID |2 |
Image |FrameOfRefUID |1.3.12.2.1107.5.1.4.53031.000000000000000000009
Image |ScanDate |10-AUG-2009 |
Image |ScanTime |093806.140000 |
Image |Manufacturer |SIEMENS |
Image |Manufacturer model |Sensation 10 |
Image |Institution name |Maastro Clinic |
Image |Station name |MEDPC |
Image |Software Versions |syngo CT 2006G |
Image |Comment |Head^ST_Head1mmCONTRAST (Adult); CT10082009093806:Head 1.0 B31s; ST_Head1mmCONTRAST; CHEST
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------

 

So far I have tried the following:

 

index="test" 
| rex max_match=0 field=_raw "(?<k1>[^\|\s]+)\s+\|(?<k2>[^\|]+)\|(?<v>[^\|]+)[\|]?[\r\n]" 
| eval z=mvzip(mvzip(trim(k1), trim(k2), " "), trim(v), "~") 
| mvexpand z 
| rex field=z "(?<key>[^~]+)~(?<value>.*)" 
| eval {key} = value

 

It yields desired results for key-value pairs, but each individual key value pair is assigned to an individual copy of the event, thus generating a cross product of events and extracted keys.

I don't want that. I want all extractions to be tied to the unique event they are extracted from.

So I tried this:

 

index="test" 
| rex max_match=0 field=_raw "(?<k1>[^\|]+)\|(?<k2>[^\|]+)\|(?<v>[^\|]+)[\|]?[\r\n]+" 
| eval key=mvzip(trim(k1), trim(k2), " ") 
| eval {key}=v

 

Which extracts the key and values properly, but when trying to evaluate the key=value statement it merges all key values into the field name and assigns each individual value to it, so now I have separate values but no unique keys and my key looks like this:

 

Image Loading Image Photometric Interpretation Image Compression Image Image width [mm] Image Image height [mm] Image Bit-Depth Image Row Vector [mm] Image Column Vector [mm] Image Image Position [mm] Image PixelsizeX [mm] Image PixelsizeY [mm] Image Img Orientation Info Image Horizontal flip Image Vertical flip Image Brainlab Subsystem Image InstNumber/BLScanNum Image Series number Image Instance UID Image Study UID Image Series UID Image BitsAllocated Image PatientOrientation Image HeadFeetOrientation Image Modality Image SliceThickness [mm] Image AcquisitionID Image FrameOfRefUID Image ScanDate Image ScanTime Image Manufacturer Image Manufacturer model Image Institution name Image Station name Image Software Versions Image Comment

 

How can I get the best of both?

  • evaluated keys from search time extraction
  • value assigned to the proper key
  • all key-value pairs connected to the unique individual event
Labels (3)
0 Karma
1 Solution

jmartens
Path Finder

Eventually I was able to solve it using redirection of the _raw event with a custom string and parsing that with extract, making sure to use custom delimters not present in my dataset like this:

 

index="test"
| eval temp=_raw 
| rex max_match=0 field=_raw "(?<k1>[^\|]+)\|(?<k2>[^\|]+)\|(?<v>[^\|]+)[\|]?[\r\n]+" 
| eval _raw=mvjoin(mvzip(mvzip(trim(k1), trim(k2), " "), trim(v), "#"), "@") 
| extract pairdelim="@" kvdelim="#" clean_keys=f 
| eval _raw=temp

 

View solution in original post

0 Karma

jmartens
Path Finder

Eventually I was able to solve it using redirection of the _raw event with a custom string and parsing that with extract, making sure to use custom delimters not present in my dataset like this:

 

index="test"
| eval temp=_raw 
| rex max_match=0 field=_raw "(?<k1>[^\|]+)\|(?<k2>[^\|]+)\|(?<v>[^\|]+)[\|]?[\r\n]+" 
| eval _raw=mvjoin(mvzip(mvzip(trim(k1), trim(k2), " "), trim(v), "#"), "@") 
| extract pairdelim="@" kvdelim="#" clean_keys=f 
| eval _raw=temp

 

0 Karma
Get Updates on the Splunk Community!

Building Reliable Asset and Identity Frameworks in Splunk ES

 Accurate asset and identity resolution is the backbone of security operations. Without it, alerts are ...

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

For Splunk Cloud customers, understanding and optimizing Splunk Virtual Compute (SVC) usage and resource ...

Automatic Discovery Part 3: Practical Use Cases

If you’ve enabled Automatic Discovery in your install of the Splunk Distribution of the OpenTelemetry ...