Splunk Search

Extracting key-value pairs using regex at search time

jmartens
Path Finder

I am trying to extract multiple key value pairs from data like this:

 

Image |Loading |\path\to\obfuscated\\CT_384.dcm
------------------------------------------------------------------------------------------------
Image |Photometric Interpretation
|MONOCHROME2 |
Image |Compression |EXPLICIT_LITTLE_ENDIAN (with metaheader)
Series |Creation... | |
Image |Image width [mm] |512 |
Image |Image height [mm] |512 |
Image |Bit-Depth |BLuint16 |
Image |Row Vector [mm] |(1/0/0) |
Image |Column Vector [mm] |(0/1/0) |
Image |Image Position [mm] |(-249.512/-504.512/-226.5) |
Image |PixelsizeX [mm] |0.976563 |
Image |PixelsizeY [mm] |0.976563 |
Image |Img Orientation Info |axial; RL: 0.000°, AP: 0.000°, HF: -90.000°; normal: 0.000°; Orientation supported
Image |Horizontal flip |1 |
Image |Vertical flip |0 |
Image |Brainlab Subsystem |[OK] |
Image |InstNumber/BLScanNum |170 |
Image |Series number |0 |
Image |Instance UID |1.3.12.2.1107.5.1.4.53031.000000000000000000001
Image |Study UID |1.3.12.2.1107.5.1.4.53031.000000000000000000001
Image |Series UID |1.3.12.2.1107.5.1.4.53031.000000000000000000000
Image |BitsAllocated |16 |
Image |PatientOrientation |Supine |
Image |HeadFeetOrientation |HeadFirst |
Image |Modality |CT |
Image |SliceThickness [mm] |1 |
Image |AcquisitionID |2 |
Image |FrameOfRefUID |1.3.12.2.1107.5.1.4.53031.000000000000000000009
Image |ScanDate |10-AUG-2009 |
Image |ScanTime |093806.140000 |
Image |Manufacturer |SIEMENS |
Image |Manufacturer model |Sensation 10 |
Image |Institution name |Maastro Clinic |
Image |Station name |MEDPC |
Image |Software Versions |syngo CT 2006G |
Image |Comment |Head^ST_Head1mmCONTRAST (Adult); CT10082009093806:Head 1.0 B31s; ST_Head1mmCONTRAST; CHEST
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------

 

So far I have tried the following:

 

index="test" 
| rex max_match=0 field=_raw "(?<k1>[^\|\s]+)\s+\|(?<k2>[^\|]+)\|(?<v>[^\|]+)[\|]?[\r\n]" 
| eval z=mvzip(mvzip(trim(k1), trim(k2), " "), trim(v), "~") 
| mvexpand z 
| rex field=z "(?<key>[^~]+)~(?<value>.*)" 
| eval {key} = value

 

It yields desired results for key-value pairs, but each individual key value pair is assigned to an individual copy of the event, thus generating a cross product of events and extracted keys.

I don't want that. I want all extractions to be tied to the unique event they are extracted from.

So I tried this:

 

index="test" 
| rex max_match=0 field=_raw "(?<k1>[^\|]+)\|(?<k2>[^\|]+)\|(?<v>[^\|]+)[\|]?[\r\n]+" 
| eval key=mvzip(trim(k1), trim(k2), " ") 
| eval {key}=v

 

Which extracts the key and values properly, but when trying to evaluate the key=value statement it merges all key values into the field name and assigns each individual value to it, so now I have separate values but no unique keys and my key looks like this:

 

Image Loading Image Photometric Interpretation Image Compression Image Image width [mm] Image Image height [mm] Image Bit-Depth Image Row Vector [mm] Image Column Vector [mm] Image Image Position [mm] Image PixelsizeX [mm] Image PixelsizeY [mm] Image Img Orientation Info Image Horizontal flip Image Vertical flip Image Brainlab Subsystem Image InstNumber/BLScanNum Image Series number Image Instance UID Image Study UID Image Series UID Image BitsAllocated Image PatientOrientation Image HeadFeetOrientation Image Modality Image SliceThickness [mm] Image AcquisitionID Image FrameOfRefUID Image ScanDate Image ScanTime Image Manufacturer Image Manufacturer model Image Institution name Image Station name Image Software Versions Image Comment

 

How can I get the best of both?

  • evaluated keys from search time extraction
  • value assigned to the proper key
  • all key-value pairs connected to the unique individual event
Labels (3)
0 Karma
1 Solution

jmartens
Path Finder

Eventually I was able to solve it using redirection of the _raw event with a custom string and parsing that with extract, making sure to use custom delimters not present in my dataset like this:

 

index="test"
| eval temp=_raw 
| rex max_match=0 field=_raw "(?<k1>[^\|]+)\|(?<k2>[^\|]+)\|(?<v>[^\|]+)[\|]?[\r\n]+" 
| eval _raw=mvjoin(mvzip(mvzip(trim(k1), trim(k2), " "), trim(v), "#"), "@") 
| extract pairdelim="@" kvdelim="#" clean_keys=f 
| eval _raw=temp

 

View solution in original post

0 Karma

jmartens
Path Finder

Eventually I was able to solve it using redirection of the _raw event with a custom string and parsing that with extract, making sure to use custom delimters not present in my dataset like this:

 

index="test"
| eval temp=_raw 
| rex max_match=0 field=_raw "(?<k1>[^\|]+)\|(?<k2>[^\|]+)\|(?<v>[^\|]+)[\|]?[\r\n]+" 
| eval _raw=mvjoin(mvzip(mvzip(trim(k1), trim(k2), " "), trim(v), "#"), "@") 
| extract pairdelim="@" kvdelim="#" clean_keys=f 
| eval _raw=temp

 

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...