Getting Data In

Remove first part of string before creating a JSON source type

robertlynch2020
Motivator

HI

I have used the below answer to get me 95% to a full solution, but i just cant get the last bit.
https://answers.splunk.com/answers/567087/how-to-split-data-into-separate-sourcetypes-with-t.html

I take in one file with multiple JSON and splits it into multiple source types.
However i have a sub issue, one of the source types is like below Text + JSON trace.

2018-01-10 15:52:03 [metrics-application-1-thread-1] INFO  METRIC:41 - {"v":"1.0","t":"MTR","ts":"2018-01-10T15:52:03.700Z","h":"mx7654vm","pid"

I am looking to get only the JSON and removing the other data (at the start).

So, i think i need a SED in the props? but not sure. I am trying not to use a heavy forwarder.

props.conf
[AMBER_RAW]
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
TRANSFORMS-sourcetye_routing = AMBER_RAW_json_METRIC

[AMBER_RAW:METRIC]
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
TIME_PREFIX = \"ts\":\"
INDEXED_EXTRACTIONS = JSON
SEDCMD-REGEX_ONLY = s/^.({"v".).*$/\1/

Transforms.conf
[AMBER_RAW_json_METRIC]
DEST_KEY = MetaData:Sourcetype
REGEX = {"v":"1.0\"
FORMAT = sourcetype::AMBER_RAW:METRIC

Thanks in Advance

1 Solution

jconger
Splunk Employee
Splunk Employee

Something like this should work in props.conf to remove the header text:

SEDCMD-remove_header = s/.*?\{/{/g

This matches everything up to (and including) the first {. Then, it replaces it all with just a {.

Note: this is an index-time extraction.

View solution in original post

jconger
Splunk Employee
Splunk Employee

Something like this should work in props.conf to remove the header text:

SEDCMD-remove_header = s/.*?\{/{/g

This matches everything up to (and including) the first {. Then, it replaces it all with just a {.

Note: this is an index-time extraction.

robertlynch2020
Motivator

hi

Thanks for your help here, hower i cant get this to work.
I have tried this as the JSON is more complex

[AMBER_RAW:METRIC]
SEDCMD-remove_header = s/.*?{"v/{"v/g
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
TIME_PREFIX = \"ts\":\"
INDEXED_EXTRACTIONS = JSON

2018-01-10 15:52:03 [metrics-application-1-thread-1] INFO  METRIC:41 - {"v":"1.0","t":"MTR","ts":"2018-01-10T15:52:03.700Z","h":"mx7654vm","pid":12483,"src":{"c":"authn-app","d":"auth"},"mtr":{"counters":{"process":{"cpu":{"time_cumulated_s":36},"memory":{"gc":{"ps_marksweep":{"total_duration_ms":814},"ps_scavenge":{"total_duration_ms":539}}}}},"gauges":{"com.murex.serviceframework.rest.datalayer.DataSourceMetrics.datasources.authn-authn-app-1":{"availableConnectionCount":1,"borrowedConnectionCount":0,"currPoolSize":1,"maxPoolSize":50,"poolName":"authn-authn-app-1"},"process":{"cpu":{"percentage":0.04184450581638631},"files":{"open_files":37},"memory":{"jvm":{"heap":{"committed_kb":195072,"used_kb":115080},"nonheap":{"committed_kb":91456,"used_kb":89860}},"rss_kb":32880864,"vsz_kb":2301276}}},"histograms":{},"meters":{},"timers":{"process":{"memory":{"gc":{"ps_marksweep":{"events":{"count":1,"rate_1m":0.0014267037570722622,"rate_5m":0.002038710931305469,"rate_15m":9.431526926661993E-4,"rate_mean":0.006158257067925256},"duration_ms":{"max":620.0,"mean":620.0,"median":620.0,"min":620.0,"percentile_75":620.0,"percentile_95":620.0,"percentile_98":620.0,"percentile_99":620.0,"percentile_999":620.0,"standard_deviation":0.0}},"ps_scavenge":{"events":{"count":32,"rate_1m":0.18094822353052323,"rate_5m":1.237424759817615,"rate_15m":1.7042656064654065,"rate_mean":0.19706273351906517},"duration_ms":{"max":18.0,"mean":9.125,"median":6.5,"min":3.0,"percentile_75":13.0,"percentile_95":18.0,"percentile_98":18.0,"percentile_99":18.0,"percentile_999":18.0,"standard_deviation":5.014495118187132}}}}}}}}
0 Karma

jconger
Splunk Employee
Splunk Employee

Try this for your SEDCMD. It anchors the regex to the beginning of the line and sets the replace flag:

SEDCMD-remove_header = s/^.*?\{/{/1

ti123
Engager

thank you, it helped me a lot

0 Karma

robertlynch2020
Motivator

Hi

Thanks for your help

I have applied this, but i am still getting the full line into SPLUNK, not sure why as to me it should work.

[AMBER_RAW:METRIC]
SEDCMD-remove_header = s/^.*?{/{/1
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
TIME_PREFIX = \"ts\":\"
INDEXED_EXTRACTIONS = JSON

0 Karma

robertlynch2020
Motivator

Hi

I can confirm this work if you use the below + take the file in without using a transform

[AMBER_RAW:METRIC_DIRECT]
SEDCMD-remove_header = s/^.*?{/{/1
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
TIME_PREFIX = \"ts\":\"
INDEXED_EXTRACTIONS = JSON

However in my case as my source is coming from a transform it does work, so i will post a separate question on this. (Below does work, however the code is exactly the same, so it is a bug or i am missing something)

transforms.conf
[AMBER_RAW_json_METRIC]
DEST_KEY = MetaData:Sourcetype
REGEX = {"v":"1.0\"
FORMAT = sourcetype::AMBER_RAW_METRIC

props.conf
[AMBER_RAW:METRIC]
SEDCMD-remove_header = s/^.*?{/{/1
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
TIME_PREFIX = \"ts\":\"
INDEXED_EXTRACTIONS = JSON

0 Karma

p_gurav
Champion
0 Karma
Get Updates on the Splunk Community!

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...