Getting Data In

Remove first part of string before creating a JSON source type

robertlynch2020
Influencer

HI

I have used the below answer to get me 95% to a full solution, but i just cant get the last bit.
https://answers.splunk.com/answers/567087/how-to-split-data-into-separate-sourcetypes-with-t.html

I take in one file with multiple JSON and splits it into multiple source types.
However i have a sub issue, one of the source types is like below Text + JSON trace.

2018-01-10 15:52:03 [metrics-application-1-thread-1] INFO  METRIC:41 - {"v":"1.0","t":"MTR","ts":"2018-01-10T15:52:03.700Z","h":"mx7654vm","pid"

I am looking to get only the JSON and removing the other data (at the start).

So, i think i need a SED in the props? but not sure. I am trying not to use a heavy forwarder.

props.conf
[AMBER_RAW]
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
TRANSFORMS-sourcetye_routing = AMBER_RAW_json_METRIC

[AMBER_RAW:METRIC]
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
TIME_PREFIX = \"ts\":\"
INDEXED_EXTRACTIONS = JSON
SEDCMD-REGEX_ONLY = s/^.({"v".).*$/\1/

Transforms.conf
[AMBER_RAW_json_METRIC]
DEST_KEY = MetaData:Sourcetype
REGEX = {"v":"1.0\"
FORMAT = sourcetype::AMBER_RAW:METRIC

Thanks in Advance

1 Solution

jconger
Splunk Employee
Splunk Employee

Something like this should work in props.conf to remove the header text:

SEDCMD-remove_header = s/.*?\{/{/g

This matches everything up to (and including) the first {. Then, it replaces it all with just a {.

Note: this is an index-time extraction.

View solution in original post

jconger
Splunk Employee
Splunk Employee

Something like this should work in props.conf to remove the header text:

SEDCMD-remove_header = s/.*?\{/{/g

This matches everything up to (and including) the first {. Then, it replaces it all with just a {.

Note: this is an index-time extraction.

robertlynch2020
Influencer

hi

Thanks for your help here, hower i cant get this to work.
I have tried this as the JSON is more complex

[AMBER_RAW:METRIC]
SEDCMD-remove_header = s/.*?{"v/{"v/g
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
TIME_PREFIX = \"ts\":\"
INDEXED_EXTRACTIONS = JSON

2018-01-10 15:52:03 [metrics-application-1-thread-1] INFO  METRIC:41 - {"v":"1.0","t":"MTR","ts":"2018-01-10T15:52:03.700Z","h":"mx7654vm","pid":12483,"src":{"c":"authn-app","d":"auth"},"mtr":{"counters":{"process":{"cpu":{"time_cumulated_s":36},"memory":{"gc":{"ps_marksweep":{"total_duration_ms":814},"ps_scavenge":{"total_duration_ms":539}}}}},"gauges":{"com.murex.serviceframework.rest.datalayer.DataSourceMetrics.datasources.authn-authn-app-1":{"availableConnectionCount":1,"borrowedConnectionCount":0,"currPoolSize":1,"maxPoolSize":50,"poolName":"authn-authn-app-1"},"process":{"cpu":{"percentage":0.04184450581638631},"files":{"open_files":37},"memory":{"jvm":{"heap":{"committed_kb":195072,"used_kb":115080},"nonheap":{"committed_kb":91456,"used_kb":89860}},"rss_kb":32880864,"vsz_kb":2301276}}},"histograms":{},"meters":{},"timers":{"process":{"memory":{"gc":{"ps_marksweep":{"events":{"count":1,"rate_1m":0.0014267037570722622,"rate_5m":0.002038710931305469,"rate_15m":9.431526926661993E-4,"rate_mean":0.006158257067925256},"duration_ms":{"max":620.0,"mean":620.0,"median":620.0,"min":620.0,"percentile_75":620.0,"percentile_95":620.0,"percentile_98":620.0,"percentile_99":620.0,"percentile_999":620.0,"standard_deviation":0.0}},"ps_scavenge":{"events":{"count":32,"rate_1m":0.18094822353052323,"rate_5m":1.237424759817615,"rate_15m":1.7042656064654065,"rate_mean":0.19706273351906517},"duration_ms":{"max":18.0,"mean":9.125,"median":6.5,"min":3.0,"percentile_75":13.0,"percentile_95":18.0,"percentile_98":18.0,"percentile_99":18.0,"percentile_999":18.0,"standard_deviation":5.014495118187132}}}}}}}}
0 Karma

jconger
Splunk Employee
Splunk Employee

Try this for your SEDCMD. It anchors the regex to the beginning of the line and sets the replace flag:

SEDCMD-remove_header = s/^.*?\{/{/1

ti123
Engager

thank you, it helped me a lot

0 Karma

robertlynch2020
Influencer

Hi

Thanks for your help

I have applied this, but i am still getting the full line into SPLUNK, not sure why as to me it should work.

[AMBER_RAW:METRIC]
SEDCMD-remove_header = s/^.*?{/{/1
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
TIME_PREFIX = \"ts\":\"
INDEXED_EXTRACTIONS = JSON

0 Karma

robertlynch2020
Influencer

Hi

I can confirm this work if you use the below + take the file in without using a transform

[AMBER_RAW:METRIC_DIRECT]
SEDCMD-remove_header = s/^.*?{/{/1
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
TIME_PREFIX = \"ts\":\"
INDEXED_EXTRACTIONS = JSON

However in my case as my source is coming from a transform it does work, so i will post a separate question on this. (Below does work, however the code is exactly the same, so it is a bug or i am missing something)

transforms.conf
[AMBER_RAW_json_METRIC]
DEST_KEY = MetaData:Sourcetype
REGEX = {"v":"1.0\"
FORMAT = sourcetype::AMBER_RAW_METRIC

props.conf
[AMBER_RAW:METRIC]
SEDCMD-remove_header = s/^.*?{/{/1
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
TIME_PREFIX = \"ts\":\"
INDEXED_EXTRACTIONS = JSON

0 Karma

p_gurav
Champion
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...

Design, Compete, Win: Submit Your Best Splunk Dashboards for a .conf26 Pass

Hello Splunkers,  We’re excited to kick off a Splunk Dashboard contest! We know that dashboards are a primary ...

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...