Splunk Search

Extracting Fields from Structured HL7 Data

dmbreton
New Member

I am trying to figure out how to extract structured data from an HL7 2.x message

The entire message is wrapped in a hl7 mlp wrapper, <VT><payload><FS><CR>, which I am using in the source type I created to extract individual messages. The grammar of this message is MSH PID PV1 OBR { OBX }. Essentially what this means is that the message will have 4 segments(strings) delimited by a <CR> followed by 1 to n OBX segments each delimited by a <CR>. Each segment represents a different set of information:

  • MSH => Message Header
  • PID => Patient Info
  • PV1 => Patient Visit/Encounter Info
  • OBR => Observation Request
  • OBX => Observation/Result

Because the first 4 segments are required and in order I was able to extract all fields using a regex.

Example:

Message(excluding message wrapper)

MSH|^~\&|Sending Application|N|||20140731105559||ORU^R01|47311055594607d|P|2.3||||||8859/1
PID|||MRN19||PV1^19||19000101|M||||||||||CSN19
PV1||I|SNGH GICU||||||||||||||||ECN123456
OBR|||||||20140731105559
OBX|1|ST|<Observation_Identifier>||<Observation_Value>|<Observation_Units>|||||<Observation_Status>|||<Observation_Time>||
OBX|2|NM|Temperature||98.6|Celsius|||||F|||20140731105559||
OBX|3|ST|Heart Rate||60|/min|||||F|||20140731105559||

Regex to extract all fields from the MSH segment

(?m).*MSH\|(?:(?:(?:$)|(?:\n)|(?<encoding_characters>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<sending_application>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<sending_facility>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<receiving_application>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<receiving_facility>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<date_time_of_message>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<security>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<message_type>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<message_control_id>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<processing_id>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<version_id>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<sequence_id>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<continuation_pointer>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<accept_acknowledge_type>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<application_acknowledge_type>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<country_code>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<character_set>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<principal_language_of_message>[^|\n]*)|(?:\|))(?:\|?))

In the message above message each OBX segment represents a measurement.

  • OBX 1 => Example with field names
  • OBX 2 => Temperature Measurement
  • OBX 3 => Heart Rate Measurement

So for any given message I need to be able to extract each measurement plus the attributes of the measurement, Value, Units, Time, .... and there can be 1 to n instances of the OBX segments or even of the same measurement type at a different time.

The only way I have been able to get this to work so far is to deconstruct the message before injecting it into splunk and generating a new message for each measurement. This is a less than ideal solution and I would prefer to get this to work using splunk.

Any suggestions would be greatly appreciated.

0 Karma

dstuder
Communicator

There is now a TA for parsing HL7 that was released subsequent to this question being asked.

https://splunkbase.splunk.com/app/3283/

0 Karma

somesoni2
Revered Legend

Something like this (just for OBX, assuming there are 15 fields after the keyword OBX)

[YourSourceType]
REPORT-mv_obx = xf-obx

TRANSFORMS.CONF:

[xf-obx]
REGEX = ^OBX\|(?<field1>.*)\|(?<field2>.*)\|(?<field3>.*)\|.....write others...\|(?<field15>.*)\|
MV_ADD = true
0 Karma

somesoni2
Revered Legend
0 Karma
Get Updates on the Splunk Community!

Almost Too Eventful Assurance: Part 1

Modern IT and Network teams still struggle with too many alerts and isolating issues before they are notified. ...

Demo Day: Strengthen Your SOC with Splunk Enterprise Security 8.1

Today’s threat landscape is more complex than ever. Security operation centers (SOCs) are overwhelmed with ...

Dashboards: Hiding charts while search is being executed and other uses for tokens

There are a couple of features of SimpleXML / Classic dashboards that can be used to enhance the user ...