I am trying to extract multiple key value pairs from data like this:
Image |Loading |\path\to\obfuscated\\CT_384.dcm
------------------------------------------------------------------------------------------------
Image |Photometric Interpretation
|MONOCHROME2 |
Image |Compression |EXPLICIT_LITTLE_ENDIAN (with metaheader)
Series |Creation... | |
Image |Image width [mm] |512 |
Image |Image height [mm] |512 |
Image |Bit-Depth |BLuint16 |
Image |Row Vector [mm] |(1/0/0) |
Image |Column Vector [mm] |(0/1/0) |
Image |Image Position [mm] |(-249.512/-504.512/-226.5) |
Image |PixelsizeX [mm] |0.976563 |
Image |PixelsizeY [mm] |0.976563 |
Image |Img Orientation Info |axial; RL: 0.000°, AP: 0.000°, HF: -90.000°; normal: 0.000°; Orientation supported
Image |Horizontal flip |1 |
Image |Vertical flip |0 |
Image |Brainlab Subsystem |[OK] |
Image |InstNumber/BLScanNum |170 |
Image |Series number |0 |
Image |Instance UID |1.3.12.2.1107.5.1.4.53031.000000000000000000001
Image |Study UID |1.3.12.2.1107.5.1.4.53031.000000000000000000001
Image |Series UID |1.3.12.2.1107.5.1.4.53031.000000000000000000000
Image |BitsAllocated |16 |
Image |PatientOrientation |Supine |
Image |HeadFeetOrientation |HeadFirst |
Image |Modality |CT |
Image |SliceThickness [mm] |1 |
Image |AcquisitionID |2 |
Image |FrameOfRefUID |1.3.12.2.1107.5.1.4.53031.000000000000000000009
Image |ScanDate |10-AUG-2009 |
Image |ScanTime |093806.140000 |
Image |Manufacturer |SIEMENS |
Image |Manufacturer model |Sensation 10 |
Image |Institution name |Maastro Clinic |
Image |Station name |MEDPC |
Image |Software Versions |syngo CT 2006G |
Image |Comment |Head^ST_Head1mmCONTRAST (Adult); CT10082009093806:Head 1.0 B31s; ST_Head1mmCONTRAST; CHEST
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
So far I have tried the following:
index="test"
| rex max_match=0 field=_raw "(?<k1>[^\|\s]+)\s+\|(?<k2>[^\|]+)\|(?<v>[^\|]+)[\|]?[\r\n]"
| eval z=mvzip(mvzip(trim(k1), trim(k2), " "), trim(v), "~")
| mvexpand z
| rex field=z "(?<key>[^~]+)~(?<value>.*)"
| eval {key} = value
It yields desired results for key-value pairs, but each individual key value pair is assigned to an individual copy of the event, thus generating a cross product of events and extracted keys.
I don't want that. I want all extractions to be tied to the unique event they are extracted from.
So I tried this:
index="test"
| rex max_match=0 field=_raw "(?<k1>[^\|]+)\|(?<k2>[^\|]+)\|(?<v>[^\|]+)[\|]?[\r\n]+"
| eval key=mvzip(trim(k1), trim(k2), " ")
| eval {key}=v
Which extracts the key and values properly, but when trying to evaluate the key=value statement it merges all key values into the field name and assigns each individual value to it, so now I have separate values but no unique keys and my key looks like this:
Image Loading Image Photometric Interpretation Image Compression Image Image width [mm] Image Image height [mm] Image Bit-Depth Image Row Vector [mm] Image Column Vector [mm] Image Image Position [mm] Image PixelsizeX [mm] Image PixelsizeY [mm] Image Img Orientation Info Image Horizontal flip Image Vertical flip Image Brainlab Subsystem Image InstNumber/BLScanNum Image Series number Image Instance UID Image Study UID Image Series UID Image BitsAllocated Image PatientOrientation Image HeadFeetOrientation Image Modality Image SliceThickness [mm] Image AcquisitionID Image FrameOfRefUID Image ScanDate Image ScanTime Image Manufacturer Image Manufacturer model Image Institution name Image Station name Image Software Versions Image Comment
How can I get the best of both?
Eventually I was able to solve it using redirection of the _raw event with a custom string and parsing that with extract, making sure to use custom delimters not present in my dataset like this:
index="test"
| eval temp=_raw
| rex max_match=0 field=_raw "(?<k1>[^\|]+)\|(?<k2>[^\|]+)\|(?<v>[^\|]+)[\|]?[\r\n]+"
| eval _raw=mvjoin(mvzip(mvzip(trim(k1), trim(k2), " "), trim(v), "#"), "@")
| extract pairdelim="@" kvdelim="#" clean_keys=f
| eval _raw=temp
Eventually I was able to solve it using redirection of the _raw event with a custom string and parsing that with extract, making sure to use custom delimters not present in my dataset like this:
index="test"
| eval temp=_raw
| rex max_match=0 field=_raw "(?<k1>[^\|]+)\|(?<k2>[^\|]+)\|(?<v>[^\|]+)[\|]?[\r\n]+"
| eval _raw=mvjoin(mvzip(mvzip(trim(k1), trim(k2), " "), trim(v), "#"), "@")
| extract pairdelim="@" kvdelim="#" clean_keys=f
| eval _raw=temp