Getting Data In

Extracting a value from MQTT string before parsing to JSON

luminousplumz
Engager

I have an requirement to extract a value from an mqtt string before i parse it to json.
Initially i was using MQTT Modular input app to pull each of the topics with their own input. 

I found that with more than 3 inputs /topics enabled i am dropping some if not all data.
So i decided to pull all the topics in a single input. This works well except i still need to be able to separate the topics for searches.

I managed to get this working using multiple transforms. i changed something and now i can get it to work again.


Using Transforms i can parse to json with no issues (mqtttojson)

Transforms.conf

[mqtttojson]

REGEX = msg\=(.+)$

FORMAT = $1

DEST_KEY = _raw

 

[mqtttopic]

CLEAN_KEYS = 0

FORMAT = Topic::"$1"

REGEX = tgw\/data\/0x155f\/(?<Topic>\S*?)\/

 

 

Props.conf

[mqtttojson_ubnpfc_all]
DATETIME_CONFIG = 
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
TIME_PREFIX = \"ts\":
TZ = Europe/London
category = Custom
pulldown_type = 1
TRANSFORMS-mqtttopic = mqtttojson, mqtttopic


In the example below i need the 4th topic level i.e. "TransportContextTracking".

Thu Apr 24 12:42:15 GMT 2025 name=mqtt_msg_received event_id= topic=tgw/data/0x155f/TransportContextTracking/MFC/0278494 msg={"data":{"destination":{"locationAddress":"/UrbanUK/PCOTS13/Exit"},"errorCode":null,"event":"Started","loadCarrierId":"0278494","source":{"locationAddress":"/UrbanUK/PCOTS13/Pick"},"transportId":"f0409b2a-e9d4-407c-bd65-48ccea17b520","transportType":"Transport"},"dbid":8104562815,"ts":1745498528217}

 

 

What am i missing ?????

Labels (3)
0 Karma

livehybrid
Ultra Champion

Hi @luminousplumz 

You need to apply the mqtttopic transform before the mqtttojson transform overwrites the _raw field. The order in TRANSFORMS-* matters. Also, adjust the mqtttopic regex and format for correct field extraction.

transforms.conf:

[mqtttojson]
REGEX = msg\=(.+)
FORMAT = $1
DEST_KEY = _raw

[mqtttopic]
# Extract from the original _raw field containing 'topic='
REGEX = topic=tgw\/data\/0x155f\/([^\/]+)
FORMAT = Topic::$1
WRITE_META = true

props.conf:

[mqtttojson_ubnpfc_all]
# Apply mqtttopic first, then mqtttojson
TRANSFORMS-topic_then_json = mqtttopic, mqtttojson
# The rest of your props.conf settings remain the same
DATETIME_CONFIG =
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
TIME_PREFIX = \"ts\":
TZ = Europe/London
category = Custom
pulldown_type = 1
# Ensure KV_MODE=none if you don't want Splunk's default key-value extraction
# KV_MODE = none
# Ensure JSON extraction runs after transforms if needed
# INDEXED_EXTRACTIONS = json
  1. Transform Order: The TRANSFORMS-topic_then_json line in props.conf should have mqtttopic first. This ensures it runs on the original event data before mqtttojson overwrites _raw with the JSON payload.
  2. mqtttopic REGEX: The regex topic=tgw\/data\/0x155f\/([^\/]+) specifically looks for the topic= string, skips the known prefix tgw/data/0x155f/, and captures the next segment of characters that are not a forward slash (/) into capture group 1.
  3. mqtttopic FORMAT: FORMAT = Topic::$1 creates a new field named Topic containing the value captured by the regex (the desired topic segment, e.g., "TransportContextTracking").
  4. mqtttopic WRITE_META: WRITE_META = true ensures the extracted field (Topic) is written to the index metadata, making it available for searching even though the original _raw field is later overwritten.
  5. mqtttojson: This transform runs second. It extracts the JSON part from the msg= field (which still exists in the original event data at this stage) and overwrites _raw with just the JSON content. Splunk's automatic JSON parsing (or INDEXED_EXTRACTIONS = json) will then parse this new _raw.

Some useful tips:

  • Restart the Splunk instance or reload the configuration for changes in props.conf and transforms.conf to take effect.
  • Ensure the sourcetype mqtttojson_ubnpfc_all is correctly assigned to your MQTT data input.
  • Test the regex using Splunk's rex command in search or on regex testing websites against your raw event data to confirm it captures the correct value.
  • If Splunk's automatic key-value extraction interferes before your transforms run, you might need KV_MODE = none in props.conf.
  • If Splunk isn't automatically parsing the final JSON _raw, add INDEXED_EXTRACTIONS = json to your props.conf stanza.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma

tscroggins
Influencer

Hi @luminousplumz,

For index-time field extractions, you want something like this (note the order of the transforms in the TRANSFORMS-mqtt setting):

# fields.conf

[sourcetype::mqtttojson_ubnpfc_all::Topic]
INDEXED = true

# props.conf

[mqtttojson_ubnpfc_all]
TRANSFORMS-mqtt = mqtttopic,mqtttojson

# transforms.conf

[mqtttojson]
CLEAN_KEYS = 0
DEST_KEY = _raw
FORMAT = $1
REGEX = msg=(.+)

[mqtttopic]
CLEAN_KEYS = 0
FORMAT = Topic::$1
REGEX = topic=(?:[^/]*/){3}([^/]+)
WRITE_META = true

 For search-time field extractions, you want something like this:

[mqtttojson_ubnpfc_all]
EXTRACT-Topic = topic=(?:[^/]*/){3}(?<Topic>[^/]+)
EVAL-_raw = replace(_raw, ".*? msg=", "")

 However, in the search-time configuration, you'll need to extract the JSON fields in a search as automatic key-value field extraction happens before calculated fields (EVAL-*):

sourcetype=mqtttojson_ubnpfc_all
| spath

You'll note that the original name, event_id, topic, and msg (value possibly truncated) fields are automatically extracted before the full value of msg is assigned to _raw.

0 Karma
Get Updates on the Splunk Community!

Splunk ITSI & Correlated Network Visibility

  Now On Demand   Take Your Network Visibility to the Next Level In today’s complex IT environments, ...

Leveraging Detections from the Splunk Threat Research Team & Cisco Talos

  Now On Demand  Stay ahead of today’s evolving threats with the combined power of the Splunk Threat Research ...

New in Splunk Observability Cloud: Automated Archiving for Unused Metrics

Automated Archival is a new capability within Metrics Management; which is a robust usage & cost optimization ...