Getting Data In

Remove first part of string before creating a JSON source type

robertlynch2020
Influencer

HI

I have used the below answer to get me 95% to a full solution, but i just cant get the last bit.
https://answers.splunk.com/answers/567087/how-to-split-data-into-separate-sourcetypes-with-t.html

I take in one file with multiple JSON and splits it into multiple source types.
However i have a sub issue, one of the source types is like below Text + JSON trace.

2018-01-10 15:52:03 [metrics-application-1-thread-1] INFO  METRIC:41 - {"v":"1.0","t":"MTR","ts":"2018-01-10T15:52:03.700Z","h":"mx7654vm","pid"

I am looking to get only the JSON and removing the other data (at the start).

So, i think i need a SED in the props? but not sure. I am trying not to use a heavy forwarder.

props.conf
[AMBER_RAW]
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
TRANSFORMS-sourcetye_routing = AMBER_RAW_json_METRIC

[AMBER_RAW:METRIC]
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
TIME_PREFIX = \"ts\":\"
INDEXED_EXTRACTIONS = JSON
SEDCMD-REGEX_ONLY = s/^.({"v".).*$/\1/

Transforms.conf
[AMBER_RAW_json_METRIC]
DEST_KEY = MetaData:Sourcetype
REGEX = {"v":"1.0\"
FORMAT = sourcetype::AMBER_RAW:METRIC

Thanks in Advance

1 Solution

jconger
Splunk Employee
Splunk Employee

Something like this should work in props.conf to remove the header text:

SEDCMD-remove_header = s/.*?\{/{/g

This matches everything up to (and including) the first {. Then, it replaces it all with just a {.

Note: this is an index-time extraction.

View solution in original post

jconger
Splunk Employee
Splunk Employee

Something like this should work in props.conf to remove the header text:

SEDCMD-remove_header = s/.*?\{/{/g

This matches everything up to (and including) the first {. Then, it replaces it all with just a {.

Note: this is an index-time extraction.

robertlynch2020
Influencer

hi

Thanks for your help here, hower i cant get this to work.
I have tried this as the JSON is more complex

[AMBER_RAW:METRIC]
SEDCMD-remove_header = s/.*?{"v/{"v/g
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
TIME_PREFIX = \"ts\":\"
INDEXED_EXTRACTIONS = JSON

2018-01-10 15:52:03 [metrics-application-1-thread-1] INFO  METRIC:41 - {"v":"1.0","t":"MTR","ts":"2018-01-10T15:52:03.700Z","h":"mx7654vm","pid":12483,"src":{"c":"authn-app","d":"auth"},"mtr":{"counters":{"process":{"cpu":{"time_cumulated_s":36},"memory":{"gc":{"ps_marksweep":{"total_duration_ms":814},"ps_scavenge":{"total_duration_ms":539}}}}},"gauges":{"com.murex.serviceframework.rest.datalayer.DataSourceMetrics.datasources.authn-authn-app-1":{"availableConnectionCount":1,"borrowedConnectionCount":0,"currPoolSize":1,"maxPoolSize":50,"poolName":"authn-authn-app-1"},"process":{"cpu":{"percentage":0.04184450581638631},"files":{"open_files":37},"memory":{"jvm":{"heap":{"committed_kb":195072,"used_kb":115080},"nonheap":{"committed_kb":91456,"used_kb":89860}},"rss_kb":32880864,"vsz_kb":2301276}}},"histograms":{},"meters":{},"timers":{"process":{"memory":{"gc":{"ps_marksweep":{"events":{"count":1,"rate_1m":0.0014267037570722622,"rate_5m":0.002038710931305469,"rate_15m":9.431526926661993E-4,"rate_mean":0.006158257067925256},"duration_ms":{"max":620.0,"mean":620.0,"median":620.0,"min":620.0,"percentile_75":620.0,"percentile_95":620.0,"percentile_98":620.0,"percentile_99":620.0,"percentile_999":620.0,"standard_deviation":0.0}},"ps_scavenge":{"events":{"count":32,"rate_1m":0.18094822353052323,"rate_5m":1.237424759817615,"rate_15m":1.7042656064654065,"rate_mean":0.19706273351906517},"duration_ms":{"max":18.0,"mean":9.125,"median":6.5,"min":3.0,"percentile_75":13.0,"percentile_95":18.0,"percentile_98":18.0,"percentile_99":18.0,"percentile_999":18.0,"standard_deviation":5.014495118187132}}}}}}}}
0 Karma

jconger
Splunk Employee
Splunk Employee

Try this for your SEDCMD. It anchors the regex to the beginning of the line and sets the replace flag:

SEDCMD-remove_header = s/^.*?\{/{/1

ti123
Engager

thank you, it helped me a lot

0 Karma

robertlynch2020
Influencer

Hi

Thanks for your help

I have applied this, but i am still getting the full line into SPLUNK, not sure why as to me it should work.

[AMBER_RAW:METRIC]
SEDCMD-remove_header = s/^.*?{/{/1
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
TIME_PREFIX = \"ts\":\"
INDEXED_EXTRACTIONS = JSON

0 Karma

robertlynch2020
Influencer

Hi

I can confirm this work if you use the below + take the file in without using a transform

[AMBER_RAW:METRIC_DIRECT]
SEDCMD-remove_header = s/^.*?{/{/1
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
TIME_PREFIX = \"ts\":\"
INDEXED_EXTRACTIONS = JSON

However in my case as my source is coming from a transform it does work, so i will post a separate question on this. (Below does work, however the code is exactly the same, so it is a bug or i am missing something)

transforms.conf
[AMBER_RAW_json_METRIC]
DEST_KEY = MetaData:Sourcetype
REGEX = {"v":"1.0\"
FORMAT = sourcetype::AMBER_RAW_METRIC

props.conf
[AMBER_RAW:METRIC]
SEDCMD-remove_header = s/^.*?{/{/1
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
TIME_PREFIX = \"ts\":\"
INDEXED_EXTRACTIONS = JSON

0 Karma

p_gurav
Champion
0 Karma
Get Updates on the Splunk Community!

Monitoring Postgres with OpenTelemetry

Behind every business-critical application, you’ll find databases. These behind-the-scenes stores power ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...