Splunk Search

Extracting Fields from Structured HL7 Data

New Member

I am trying to figure out how to extract structured data from an HL7 2.x message

The entire message is wrapped in a hl7 mlp wrapper, <VT><payload><FS><CR>, which I am using in the source type I created to extract individual messages. The grammar of this message is MSH PID PV1 OBR { OBX }. Essentially what this means is that the message will have 4 segments(strings) delimited by a <CR> followed by 1 to n OBX segments each delimited by a <CR>. Each segment represents a different set of information:

  • MSH => Message Header
  • PID => Patient Info
  • PV1 => Patient Visit/Encounter Info
  • OBR => Observation Request
  • OBX => Observation/Result

Because the first 4 segments are required and in order I was able to extract all fields using a regex.

Example:

Message(excluding message wrapper)

MSH|^~\&|Sending Application|N|||20140731105559||ORU^R01|47311055594607d|P|2.3||||||8859/1
PID|||MRN19||PV1^19||19000101|M||||||||||CSN19
PV1||I|SNGH GICU||||||||||||||||ECN123456
OBR|||||||20140731105559
OBX|1|ST|<Observation_Identifier>||<Observation_Value>|<Observation_Units>|||||<Observation_Status>|||<Observation_Time>||
OBX|2|NM|Temperature||98.6|Celsius|||||F|||20140731105559||
OBX|3|ST|Heart Rate||60|/min|||||F|||20140731105559||

Regex to extract all fields from the MSH segment

(?m).*MSH\|(?:(?:(?:$)|(?:\n)|(?<encoding_characters>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<sending_application>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<sending_facility>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<receiving_application>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<receiving_facility>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<date_time_of_message>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<security>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<message_type>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<message_control_id>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<processing_id>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<version_id>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<sequence_id>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<continuation_pointer>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<accept_acknowledge_type>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<application_acknowledge_type>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<country_code>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<character_set>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<principal_language_of_message>[^|\n]*)|(?:\|))(?:\|?))

In the message above message each OBX segment represents a measurement.

  • OBX 1 => Example with field names
  • OBX 2 => Temperature Measurement
  • OBX 3 => Heart Rate Measurement

So for any given message I need to be able to extract each measurement plus the attributes of the measurement, Value, Units, Time, .... and there can be 1 to n instances of the OBX segments or even of the same measurement type at a different time.

The only way I have been able to get this to work so far is to deconstruct the message before injecting it into splunk and generating a new message for each measurement. This is a less than ideal solution and I would prefer to get this to work using splunk.

Any suggestions would be greatly appreciated.

0 Karma

Path Finder

There is now a TA for parsing HL7 that was released subsequent to this question being asked.

https://splunkbase.splunk.com/app/3283/

0 Karma

Revered Legend

Something like this (just for OBX, assuming there are 15 fields after the keyword OBX)

[YourSourceType]
REPORT-mv_obx = xf-obx

TRANSFORMS.CONF:

[xf-obx]
REGEX = ^OBX\|(?<field1>.*)\|(?<field2>.*)\|(?<field3>.*)\|.....write others...\|(?<field15>.*)\|
MV_ADD = true
0 Karma

Revered Legend
0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!