Getting Data In

Extracting a value from MQTT string before parsing to JSON

luminousplumz
Engager

I have an requirement to extract a value from an mqtt string before i parse it to json.
Initially i was using MQTT Modular input app to pull each of the topics with their own input. 

I found that with more than 3 inputs /topics enabled i am dropping some if not all data.
So i decided to pull all the topics in a single input. This works well except i still need to be able to separate the topics for searches.

I managed to get this working using multiple transforms. i changed something and now i can get it to work again.


Using Transforms i can parse to json with no issues (mqtttojson)

Transforms.conf

[mqtttojson]

REGEX = msg\=(.+)$

FORMAT = $1

DEST_KEY = _raw

 

[mqtttopic]

CLEAN_KEYS = 0

FORMAT = Topic::"$1"

REGEX = tgw\/data\/0x155f\/(?<Topic>\S*?)\/

 

 

Props.conf

[mqtttojson_ubnpfc_all]
DATETIME_CONFIG = 
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
TIME_PREFIX = \"ts\":
TZ = Europe/London
category = Custom
pulldown_type = 1
TRANSFORMS-mqtttopic = mqtttojson, mqtttopic


In the example below i need the 4th topic level i.e. "TransportContextTracking".

Thu Apr 24 12:42:15 GMT 2025 name=mqtt_msg_received event_id= topic=tgw/data/0x155f/TransportContextTracking/MFC/0278494 msg={"data":{"destination":{"locationAddress":"/UrbanUK/PCOTS13/Exit"},"errorCode":null,"event":"Started","loadCarrierId":"0278494","source":{"locationAddress":"/UrbanUK/PCOTS13/Pick"},"transportId":"f0409b2a-e9d4-407c-bd65-48ccea17b520","transportType":"Transport"},"dbid":8104562815,"ts":1745498528217}

 

 

What am i missing ?????

Labels (3)
0 Karma

livehybrid
Super Champion

Hi @luminousplumz 

You need to apply the mqtttopic transform before the mqtttojson transform overwrites the _raw field. The order in TRANSFORMS-* matters. Also, adjust the mqtttopic regex and format for correct field extraction.

transforms.conf:

[mqtttojson]
REGEX = msg\=(.+)
FORMAT = $1
DEST_KEY = _raw

[mqtttopic]
# Extract from the original _raw field containing 'topic='
REGEX = topic=tgw\/data\/0x155f\/([^\/]+)
FORMAT = Topic::$1
WRITE_META = true

props.conf:

[mqtttojson_ubnpfc_all]
# Apply mqtttopic first, then mqtttojson
TRANSFORMS-topic_then_json = mqtttopic, mqtttojson
# The rest of your props.conf settings remain the same
DATETIME_CONFIG =
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
TIME_PREFIX = \"ts\":
TZ = Europe/London
category = Custom
pulldown_type = 1
# Ensure KV_MODE=none if you don't want Splunk's default key-value extraction
# KV_MODE = none
# Ensure JSON extraction runs after transforms if needed
# INDEXED_EXTRACTIONS = json
  1. Transform Order: The TRANSFORMS-topic_then_json line in props.conf should have mqtttopic first. This ensures it runs on the original event data before mqtttojson overwrites _raw with the JSON payload.
  2. mqtttopic REGEX: The regex topic=tgw\/data\/0x155f\/([^\/]+) specifically looks for the topic= string, skips the known prefix tgw/data/0x155f/, and captures the next segment of characters that are not a forward slash (/) into capture group 1.
  3. mqtttopic FORMAT: FORMAT = Topic::$1 creates a new field named Topic containing the value captured by the regex (the desired topic segment, e.g., "TransportContextTracking").
  4. mqtttopic WRITE_META: WRITE_META = true ensures the extracted field (Topic) is written to the index metadata, making it available for searching even though the original _raw field is later overwritten.
  5. mqtttojson: This transform runs second. It extracts the JSON part from the msg= field (which still exists in the original event data at this stage) and overwrites _raw with just the JSON content. Splunk's automatic JSON parsing (or INDEXED_EXTRACTIONS = json) will then parse this new _raw.

Some useful tips:

  • Restart the Splunk instance or reload the configuration for changes in props.conf and transforms.conf to take effect.
  • Ensure the sourcetype mqtttojson_ubnpfc_all is correctly assigned to your MQTT data input.
  • Test the regex using Splunk's rex command in search or on regex testing websites against your raw event data to confirm it captures the correct value.
  • If Splunk's automatic key-value extraction interferes before your transforms run, you might need KV_MODE = none in props.conf.
  • If Splunk isn't automatically parsing the final JSON _raw, add INDEXED_EXTRACTIONS = json to your props.conf stanza.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma

tscroggins
Influencer

Hi @luminousplumz,

For index-time field extractions, you want something like this (note the order of the transforms in the TRANSFORMS-mqtt setting):

# fields.conf

[sourcetype::mqtttojson_ubnpfc_all::Topic]
INDEXED = true

# props.conf

[mqtttojson_ubnpfc_all]
TRANSFORMS-mqtt = mqtttopic,mqtttojson

# transforms.conf

[mqtttojson]
CLEAN_KEYS = 0
DEST_KEY = _raw
FORMAT = $1
REGEX = msg=(.+)

[mqtttopic]
CLEAN_KEYS = 0
FORMAT = Topic::$1
REGEX = topic=(?:[^/]*/){3}([^/]+)
WRITE_META = true

 For search-time field extractions, you want something like this:

[mqtttojson_ubnpfc_all]
EXTRACT-Topic = topic=(?:[^/]*/){3}(?<Topic>[^/]+)
EVAL-_raw = replace(_raw, ".*? msg=", "")

 However, in the search-time configuration, you'll need to extract the JSON fields in a search as automatic key-value field extraction happens before calculated fields (EVAL-*):

sourcetype=mqtttojson_ubnpfc_all
| spath

You'll note that the original name, event_id, topic, and msg (value possibly truncated) fields are automatically extracted before the full value of msg is assigned to _raw.

0 Karma
Get Updates on the Splunk Community!

AppDynamics Summer Webinars

This summer, our mighty AppDynamics team is cooking up some delicious content on YouTube Live to satiate your ...

SOCin’ it to you at Splunk University

Splunk University is expanding its instructor-led learning portfolio with dedicated Security tracks at .conf25 ...

Credit Card Data Protection & PCI Compliance with Splunk Edge Processor

Organizations handling credit card transactions know that PCI DSS compliance is both critical and complex. The ...