Extracting Fields from Structured HL7 Data

dmbreton · ‎08-07-2014

I am trying to figure out how to extract structured data from an HL7 2.x message

The entire message is wrapped in a hl7 mlp wrapper, <VT><payload><FS><CR>, which I am using in the source type I created to extract individual messages. The grammar of this message is MSH PID PV1 OBR { OBX }. Essentially what this means is that the message will have 4 segments(strings) delimited by a <CR> followed by 1 to n OBX segments each delimited by a <CR>. Each segment represents a different set of information:

MSH => Message Header
PID => Patient Info
PV1 => Patient Visit/Encounter Info
OBR => Observation Request
OBX => Observation/Result

Because the first 4 segments are required and in order I was able to extract all fields using a regex.

Example:

Message(excluding message wrapper)

MSH|^~\&|Sending Application|N|||20140731105559||ORU^R01|47311055594607d|P|2.3||||||8859/1
PID|||MRN19||PV1^19||19000101|M||||||||||CSN19
PV1||I|SNGH GICU||||||||||||||||ECN123456
OBR|||||||20140731105559
OBX|1|ST|<Observation_Identifier>||<Observation_Value>|<Observation_Units>|||||<Observation_Status>|||<Observation_Time>||
OBX|2|NM|Temperature||98.6|Celsius|||||F|||20140731105559||
OBX|3|ST|Heart Rate||60|/min|||||F|||20140731105559||

Regex to extract all fields from the MSH segment

(?m).*MSH\|(?:(?:(?:$)|(?:\n)|(?<encoding_characters>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<sending_application>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<sending_facility>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<receiving_application>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<receiving_facility>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<date_time_of_message>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<security>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<message_type>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<message_control_id>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<processing_id>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<version_id>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<sequence_id>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<continuation_pointer>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<accept_acknowledge_type>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<application_acknowledge_type>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<country_code>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<character_set>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<principal_language_of_message>[^|\n]*)|(?:\|))(?:\|?))

In the message above message each OBX segment represents a measurement.

OBX 1 => Example with field names
OBX 2 => Temperature Measurement
OBX 3 => Heart Rate Measurement

So for any given message I need to be able to extract each measurement plus the attributes of the measurement, Value, Units, Time, .... and there can be 1 to n instances of the OBX segments or even of the same measurement type at a different time.

The only way I have been able to get this to work so far is to deconstruct the message before injecting it into splunk and generating a new message for each measurement. This is a less than ideal solution and I would prefer to get this to work using splunk.

Any suggestions would be greatly appreciated.

dstuder · ‎09-19-2016

There is now a TA for parsing HL7 that was released subsequent to this question being asked.

https://splunkbase.splunk.com/app/3283/

somesoni2 · ‎08-07-2014

Something like this (just for OBX, assuming there are 15 fields after the keyword OBX)

[YourSourceType]
REPORT-mv_obx = xf-obx

TRANSFORMS.CONF:

[xf-obx]
REGEX = ^OBX\|(?<field1>.*)\|(?<field2>.*)\|(?<field3>.*)\|.....write others...\|(?<field15>.*)\|
MV_ADD = true

somesoni2 · ‎08-07-2014

You can setup multivalue field extraction using transforms.conf.

Reference:
http://answers.splunk.com/answers/112311/multi-value-field-extraction
http://answers.splunk.com/answers/11777/field-extraction-into-multivalue-field

Extracting Fields from Structured HL7 Data

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Best Practices: Splunk auto adjust pipeline queue

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability

Join the Conversation

Extracting Fields from Structured HL7 Data

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Best Practices: Splunk auto adjust pipeline queue

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability