Splunk Search

Extracting key-value pairs using regex at search time

jmartens
Path Finder

I am trying to extract multiple key value pairs from data like this:

 

Image |Loading |\path\to\obfuscated\\CT_384.dcm
------------------------------------------------------------------------------------------------
Image |Photometric Interpretation
|MONOCHROME2 |
Image |Compression |EXPLICIT_LITTLE_ENDIAN (with metaheader)
Series |Creation... | |
Image |Image width [mm] |512 |
Image |Image height [mm] |512 |
Image |Bit-Depth |BLuint16 |
Image |Row Vector [mm] |(1/0/0) |
Image |Column Vector [mm] |(0/1/0) |
Image |Image Position [mm] |(-249.512/-504.512/-226.5) |
Image |PixelsizeX [mm] |0.976563 |
Image |PixelsizeY [mm] |0.976563 |
Image |Img Orientation Info |axial; RL: 0.000°, AP: 0.000°, HF: -90.000°; normal: 0.000°; Orientation supported
Image |Horizontal flip |1 |
Image |Vertical flip |0 |
Image |Brainlab Subsystem |[OK] |
Image |InstNumber/BLScanNum |170 |
Image |Series number |0 |
Image |Instance UID |1.3.12.2.1107.5.1.4.53031.000000000000000000001
Image |Study UID |1.3.12.2.1107.5.1.4.53031.000000000000000000001
Image |Series UID |1.3.12.2.1107.5.1.4.53031.000000000000000000000
Image |BitsAllocated |16 |
Image |PatientOrientation |Supine |
Image |HeadFeetOrientation |HeadFirst |
Image |Modality |CT |
Image |SliceThickness [mm] |1 |
Image |AcquisitionID |2 |
Image |FrameOfRefUID |1.3.12.2.1107.5.1.4.53031.000000000000000000009
Image |ScanDate |10-AUG-2009 |
Image |ScanTime |093806.140000 |
Image |Manufacturer |SIEMENS |
Image |Manufacturer model |Sensation 10 |
Image |Institution name |Maastro Clinic |
Image |Station name |MEDPC |
Image |Software Versions |syngo CT 2006G |
Image |Comment |Head^ST_Head1mmCONTRAST (Adult); CT10082009093806:Head 1.0 B31s; ST_Head1mmCONTRAST; CHEST
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------

 

So far I have tried the following:

 

index="test" 
| rex max_match=0 field=_raw "(?<k1>[^\|\s]+)\s+\|(?<k2>[^\|]+)\|(?<v>[^\|]+)[\|]?[\r\n]" 
| eval z=mvzip(mvzip(trim(k1), trim(k2), " "), trim(v), "~") 
| mvexpand z 
| rex field=z "(?<key>[^~]+)~(?<value>.*)" 
| eval {key} = value

 

It yields desired results for key-value pairs, but each individual key value pair is assigned to an individual copy of the event, thus generating a cross product of events and extracted keys.

I don't want that. I want all extractions to be tied to the unique event they are extracted from.

So I tried this:

 

index="test" 
| rex max_match=0 field=_raw "(?<k1>[^\|]+)\|(?<k2>[^\|]+)\|(?<v>[^\|]+)[\|]?[\r\n]+" 
| eval key=mvzip(trim(k1), trim(k2), " ") 
| eval {key}=v

 

Which extracts the key and values properly, but when trying to evaluate the key=value statement it merges all key values into the field name and assigns each individual value to it, so now I have separate values but no unique keys and my key looks like this:

 

Image Loading Image Photometric Interpretation Image Compression Image Image width [mm] Image Image height [mm] Image Bit-Depth Image Row Vector [mm] Image Column Vector [mm] Image Image Position [mm] Image PixelsizeX [mm] Image PixelsizeY [mm] Image Img Orientation Info Image Horizontal flip Image Vertical flip Image Brainlab Subsystem Image InstNumber/BLScanNum Image Series number Image Instance UID Image Study UID Image Series UID Image BitsAllocated Image PatientOrientation Image HeadFeetOrientation Image Modality Image SliceThickness [mm] Image AcquisitionID Image FrameOfRefUID Image ScanDate Image ScanTime Image Manufacturer Image Manufacturer model Image Institution name Image Station name Image Software Versions Image Comment

 

How can I get the best of both?

  • evaluated keys from search time extraction
  • value assigned to the proper key
  • all key-value pairs connected to the unique individual event
Labels (3)
0 Karma
1 Solution

jmartens
Path Finder

Eventually I was able to solve it using redirection of the _raw event with a custom string and parsing that with extract, making sure to use custom delimters not present in my dataset like this:

 

index="test"
| eval temp=_raw 
| rex max_match=0 field=_raw "(?<k1>[^\|]+)\|(?<k2>[^\|]+)\|(?<v>[^\|]+)[\|]?[\r\n]+" 
| eval _raw=mvjoin(mvzip(mvzip(trim(k1), trim(k2), " "), trim(v), "#"), "@") 
| extract pairdelim="@" kvdelim="#" clean_keys=f 
| eval _raw=temp

 

View solution in original post

0 Karma

jmartens
Path Finder

Eventually I was able to solve it using redirection of the _raw event with a custom string and parsing that with extract, making sure to use custom delimters not present in my dataset like this:

 

index="test"
| eval temp=_raw 
| rex max_match=0 field=_raw "(?<k1>[^\|]+)\|(?<k2>[^\|]+)\|(?<v>[^\|]+)[\|]?[\r\n]+" 
| eval _raw=mvjoin(mvzip(mvzip(trim(k1), trim(k2), " "), trim(v), "#"), "@") 
| extract pairdelim="@" kvdelim="#" clean_keys=f 
| eval _raw=temp

 

View solution in original post

0 Karma
.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!