Getting Data In

Remove first part of string before creating a JSON source type

robertlynch2020
Influencer

HI

I have used the below answer to get me 95% to a full solution, but i just cant get the last bit.
https://answers.splunk.com/answers/567087/how-to-split-data-into-separate-sourcetypes-with-t.html

I take in one file with multiple JSON and splits it into multiple source types.
However i have a sub issue, one of the source types is like below Text + JSON trace.

2018-01-10 15:52:03 [metrics-application-1-thread-1] INFO  METRIC:41 - {"v":"1.0","t":"MTR","ts":"2018-01-10T15:52:03.700Z","h":"mx7654vm","pid"

I am looking to get only the JSON and removing the other data (at the start).

So, i think i need a SED in the props? but not sure. I am trying not to use a heavy forwarder.

props.conf
[AMBER_RAW]
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
TRANSFORMS-sourcetye_routing = AMBER_RAW_json_METRIC

[AMBER_RAW:METRIC]
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
TIME_PREFIX = \"ts\":\"
INDEXED_EXTRACTIONS = JSON
SEDCMD-REGEX_ONLY = s/^.({"v".).*$/\1/

Transforms.conf
[AMBER_RAW_json_METRIC]
DEST_KEY = MetaData:Sourcetype
REGEX = {"v":"1.0\"
FORMAT = sourcetype::AMBER_RAW:METRIC

Thanks in Advance

1 Solution

jconger
Splunk Employee
Splunk Employee

Something like this should work in props.conf to remove the header text:

SEDCMD-remove_header = s/.*?\{/{/g

This matches everything up to (and including) the first {. Then, it replaces it all with just a {.

Note: this is an index-time extraction.

View solution in original post

jconger
Splunk Employee
Splunk Employee

Something like this should work in props.conf to remove the header text:

SEDCMD-remove_header = s/.*?\{/{/g

This matches everything up to (and including) the first {. Then, it replaces it all with just a {.

Note: this is an index-time extraction.

robertlynch2020
Influencer

hi

Thanks for your help here, hower i cant get this to work.
I have tried this as the JSON is more complex

[AMBER_RAW:METRIC]
SEDCMD-remove_header = s/.*?{"v/{"v/g
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
TIME_PREFIX = \"ts\":\"
INDEXED_EXTRACTIONS = JSON

2018-01-10 15:52:03 [metrics-application-1-thread-1] INFO  METRIC:41 - {"v":"1.0","t":"MTR","ts":"2018-01-10T15:52:03.700Z","h":"mx7654vm","pid":12483,"src":{"c":"authn-app","d":"auth"},"mtr":{"counters":{"process":{"cpu":{"time_cumulated_s":36},"memory":{"gc":{"ps_marksweep":{"total_duration_ms":814},"ps_scavenge":{"total_duration_ms":539}}}}},"gauges":{"com.murex.serviceframework.rest.datalayer.DataSourceMetrics.datasources.authn-authn-app-1":{"availableConnectionCount":1,"borrowedConnectionCount":0,"currPoolSize":1,"maxPoolSize":50,"poolName":"authn-authn-app-1"},"process":{"cpu":{"percentage":0.04184450581638631},"files":{"open_files":37},"memory":{"jvm":{"heap":{"committed_kb":195072,"used_kb":115080},"nonheap":{"committed_kb":91456,"used_kb":89860}},"rss_kb":32880864,"vsz_kb":2301276}}},"histograms":{},"meters":{},"timers":{"process":{"memory":{"gc":{"ps_marksweep":{"events":{"count":1,"rate_1m":0.0014267037570722622,"rate_5m":0.002038710931305469,"rate_15m":9.431526926661993E-4,"rate_mean":0.006158257067925256},"duration_ms":{"max":620.0,"mean":620.0,"median":620.0,"min":620.0,"percentile_75":620.0,"percentile_95":620.0,"percentile_98":620.0,"percentile_99":620.0,"percentile_999":620.0,"standard_deviation":0.0}},"ps_scavenge":{"events":{"count":32,"rate_1m":0.18094822353052323,"rate_5m":1.237424759817615,"rate_15m":1.7042656064654065,"rate_mean":0.19706273351906517},"duration_ms":{"max":18.0,"mean":9.125,"median":6.5,"min":3.0,"percentile_75":13.0,"percentile_95":18.0,"percentile_98":18.0,"percentile_99":18.0,"percentile_999":18.0,"standard_deviation":5.014495118187132}}}}}}}}
0 Karma

jconger
Splunk Employee
Splunk Employee

Try this for your SEDCMD. It anchors the regex to the beginning of the line and sets the replace flag:

SEDCMD-remove_header = s/^.*?\{/{/1

ti123
Engager

thank you, it helped me a lot

0 Karma

robertlynch2020
Influencer

Hi

Thanks for your help

I have applied this, but i am still getting the full line into SPLUNK, not sure why as to me it should work.

[AMBER_RAW:METRIC]
SEDCMD-remove_header = s/^.*?{/{/1
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
TIME_PREFIX = \"ts\":\"
INDEXED_EXTRACTIONS = JSON

0 Karma

robertlynch2020
Influencer

Hi

I can confirm this work if you use the below + take the file in without using a transform

[AMBER_RAW:METRIC_DIRECT]
SEDCMD-remove_header = s/^.*?{/{/1
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
TIME_PREFIX = \"ts\":\"
INDEXED_EXTRACTIONS = JSON

However in my case as my source is coming from a transform it does work, so i will post a separate question on this. (Below does work, however the code is exactly the same, so it is a bug or i am missing something)

transforms.conf
[AMBER_RAW_json_METRIC]
DEST_KEY = MetaData:Sourcetype
REGEX = {"v":"1.0\"
FORMAT = sourcetype::AMBER_RAW_METRIC

props.conf
[AMBER_RAW:METRIC]
SEDCMD-remove_header = s/^.*?{/{/1
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
TIME_PREFIX = \"ts\":\"
INDEXED_EXTRACTIONS = JSON

0 Karma

p_gurav
Champion
0 Karma
Get Updates on the Splunk Community!

Why You Can't Miss .conf25: Unleashing the Power of Agentic AI with Splunk & Cisco

The Defining Technology Movement of Our Lifetime The advent of agentic AI is arguably the defining technology ...

Deep Dive into Federated Analytics: Unlocking the Full Power of Your Security Data

In today’s complex digital landscape, security teams face increasing pressure to protect sprawling data across ...

Your summer travels continue with new course releases

Summer in the Northern hemisphere is in full swing, and is often a time to travel and explore. If your summer ...