Getting Data In

Remove Non-json data to auto-extract json fields

splunklearner
Path Finder

Hello all,

Currently we have following event which contains both json and non json data. Please help me in removing this non-json part and where I need to give indexed_extractuons or KV_mode effectively to auto extract all json fields.

Nov 9 17:34:28 128.160.82.28 [local0.warning] <132>1 2024-11-09T17:34:28.436542Z AviVantage v-epswafhic2-wdc.hc.cloud.uk.hc-443 NILVALUE NILVALUE - {"adf":true,"significant":0,"udf":false,"virtualservice":"virtualservice-4583863f-48a3-42b9-8115-252a7fb487f5","report_timestamp":"2024-11-09T17:34:28.436542Z","service_engine":"GB-DRN-AB-Tier2-se-vxeuz","vcpu_id":0,"log_id":10181,"client_ip":"128.12.73.92","client_src_port":44908,"client_dest_port":443,"client_rtt":1,"http_version":"1.1","method":"HEAD","uri_path":"/path/to/monitor/page/","host":"udg1704n01.hc.cloud.uk.hc","response_content_type":"text/html","request_length":93,"response_length":94,"response_code":400,"response_time_first_byte":1,"response_time_last_byte":1,"compression_percentage":0,"compression":"","client_insights":"","request_headers":3,"response_headers":12,"request_state":"AVI_HTTP_REQUEST_STATE_READ_CLIENT_REQ_HDR","significant_log":["ADF_HTTP_BAD_REQUEST_PLAIN_HTTP_REQUEST_SENT_ON_HTTPS_PORT","ADF_RESPONSE_CODE_4XX"],"vs_ip":"128.160.71.14","request_id":"61e-RDl6-OZgZ","max_ingress_latency_fe":0,"avg_ingress_latency_fe":0,"conn_est_time_fe":1,"source_ip":"128.12.73.92","vs_name":"v-epswafhic2-wdc.hc.cloud.uk.hc-443","tenant_name":"admin"}

And where I need to give these configurations? 

We have syslog servers with UF installed and that send data to our deployment server. DS will push apps to master and deployer from there pushing will be done. 

As of now we have props.conf in master which will push to indexers.

Labels (5)
0 Karma

splunklearner
Path Finder

Hi all, I have given the below stanza in props.conf and pushed to indexers. Fields are being extracted in json but logs are getting duplicated. Please help me.

[sony_waf]
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 25
TIME_FORMAT = %b %d %H:%M:%S
LINE_BREAKER=([\r\n]+)
pulldown_type=true
SEDCMD-removeheader=s/^[^\{]*//g
SHOULD_LINEMERGE=false
TRUNCATE = 20000
KV_MODE=json
AUTO_KV_JSON=true
0 Karma

kiran_panchavat
Influencer

@splunklearner 

Verify in splunkd.log whether your Universal Forwarder (UF) or Heavy Forwarder (HF) is sending duplicate events.

Check inputs.conf, make sure crcSalt = <SOURCE> is set to avoid duplicate ingestion.

I hope this helps, if any reply helps you, you could add your upvote/karma points to that reply, thanks.
0 Karma

kiran_panchavat
Influencer

@splunklearner 

Please check this solution. 

Solved: Re: Why would INDEXED_EXTRACTIONS=JSON in props.co... - Splunk Community

I hope this helps, if any reply helps you, you could add your upvote/karma points to that reply, thanks.
0 Karma

splunklearner
Path Finder

Hi @kiran_panchavat can you please guide me where to add your stanza? Indexers or Search heads??

0 Karma

kiran_panchavat
Influencer

@splunklearner Yes, KV_MODE is for search time field extractions. 

KV_MODE = [none|auto|auto_escaped|multi|multi:<multikv.conf_stanza_name>|json|xml]
* Used for search-time field extractions only.
* Specifies the field/value extraction mode for the data.
* Set KV_MODE to one of the following:
  * none - Disables field extraction for the host, source, or source type.
  * auto_escaped - Extracts fields/value pairs separated by equal signs and
                   honors \" and \\ as escaped sequences within quoted
                   values. For example: field="value with \"nested\" quotes"
  * multi - Invokes the 'multikv' search command, which extracts fields from 
            table-formatted events.
  * multi:<multikv.conf_stanza_name> - Invokes a custom multikv.conf 
    configuration to extract fields from a specific type of table-formatted 
    event. Use this option in situations where the default behavior of the 
    'multikv' search command is not meeting your needs.
  * xml - Automatically extracts fields from XML data.
  * json - Automatically extracts fields from JSON data.
* Setting to 'none' can ensure that one or more custom field extractions are not
  overridden by automatic field/value extraction for a particular host,
  source, or source type. You can also use 'none' to increase search 
  performance by disabling extraction for common but nonessential fields.
* The 'xml' and 'json' modes do not extract any fields when used on data
  that isn't of the correct format (JSON or XML).
* If you set 'KV_MODE = json' for a source type, do not also set 
  'INDEXED_EXTRACTIONS = JSON' for the same source type. This causes the Splunk 
  software to extract the json fields twice: once at index time and again at 
  search time.
* When KV_MODE is set to 'auto' or 'auto_escaped', automatic JSON field 
  extraction can take place alongside other automatic field/value extractions. 
  To disable JSON field extraction when 'KV_MODE' is set to 'auto' or 
  'auto_escaped', add 'AUTO_KV_JSON = false' to the stanza. 
* Default: auto
I hope this helps, if any reply helps you, you could add your upvote/karma points to that reply, thanks.

splunklearner
Path Finder

So should I give the following stanza in Deployer or cluster manager?

0 Karma

livehybrid
Influencer

Hi @splunklearner 
To have this processed at ingest time you can do a simple INGEST_EVAL on your indexers.

 

== props.conf ==
[yourStanzaName]
TRANSFORMS = stripNonJSON

== transforms.conf ==
[stripNonJSON]
INGEST_EVAL = _raw:=replace(_raw, ".*- ({.*})", "\1")

 

livehybrid_0-1739100479577.png

Please let me know how you get on and consider upvoting/karma this answer if it has helped.
Regards

Will

 

livehybrid
Influencer

@splunklearner 
If you go down the ingest time approach then you will add the props/transforms.conf within an app in your manager-apps folder on your Cluster Manager and then push out to your indexers.

No changes should be required for your searchheads if you go down that route, but feel free to evaluate the alternatives provided in this post too.

I hope this helps.

Please let me know how you get on and consider upvoting/karma this answer if it has helped.
Regards

Will

splunklearner
Path Finder

Hi @livehybrid ,

I heard that search time extractions are more better than index time due to performance issues? Is it so? Please clear fy

0 Karma

livehybrid
Influencer

Hi @splunklearner ,

I guess the answer really is "it depends" however in this scenario we are overwriting the original data with just the JSON, rather than adding an additional extracted field. 

Search time field extractions/eval/changes are executed every time you search the data, and in some cases need to be evaluated before the search is filtered down. For example if you search for "uri=/test" then you may find that at search time it needs to process all events to determine the uri field for each event, before it can then filter down. Being able to search against the URI without having to do any modification to every event means it should be faster. 

The disadvantage of index-time extractions is that it doesnt apply retrospectively to data you already have,  whereas search time will apply to everything currently indexed.

0 Karma

kiran_panchavat
Influencer

@splunklearner I have standalone server, so you can try this settings on your heavy forwarder or indexers. 

I hope this helps, if any reply helps you, you could add your upvote/karma points to that reply, thanks.
0 Karma

splunklearner
Path Finder

I don't have access to UI. I need to do it from backend only. Where I can give this props.conf? In cluster master or deployer? Is it index time extraction or search time?

0 Karma

kiran_panchavat
Influencer

@splunklearner 

kiran_panchavat_3-1739098754209.png

kiran_panchavat_4-1739098888737.png

 

I hope this helps, if any reply helps you, you could add your upvote/karma points to that reply, thanks.

kiran_panchavat
Influencer

@splunklearner I tried this using your sample data; please have a look. 

kiran_panchavat_0-1739098542377.png

kiran_panchavat_1-1739098606746.pngkiran_panchavat_2-1739098618659.png

 

[syslogtest]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
CHARSET=UTF-8
category=Custom
pulldown_type=true
SEDCMD-removeheader=s/^[^\{]*//g
KV_MODE=json
AUTO_KV_JSON=true

 

I hope this helps, if any reply helps you, you could add your upvote/karma points to that reply, thanks.

splunklearner
Path Finder

Hi @kiran_panchavat ,

Thanks for the answer.

But I read that kv_mode = json needs to be given on search time extraction i.e on search heads... But you are saying to give this on indexers or heavy forwarders... Will it help.. please clarify?

0 Karma

splunklearner
Path Finder

Hi @kiran_panchavat ,

This is present in my current props.conf which is there is Cluster Manager for this sourcetype (which is copied from other sourcetype)--

[sony_waf]

TIME_PREFIX = ^

MAX_TIMESTAMP_LOOKAHEAD = 25

TIME_FORMAT = %b %d %H:%M:%S

SEDCMD-newline_remove = s/\\r\\n/\n/g

SEDCMD-formatxml =s/></>\n</g

LINE_BREAKER = ([\r\n]+)[A-Z][a-z]

{2}\s+\d{1,2}\s\d{2}:\d{2}:\d{2}\s

SHOULD_LINEMERGE = False

TRUNCATE = 10000

Now do I need to add here in this props.conf and push it to indexers? Or create new props.conf in Deployer which includes your props.conf stanza and push it to search heads?

0 Karma
Get Updates on the Splunk Community!

Index This | How many sides does a circle have?

  March 2025 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

New This Month - Splunk Observability updates and improvements for faster ...

What’s New? This month, we’re delivering several enhancements across Splunk Observability Cloud for faster and ...

What's New in Splunk Cloud Platform 9.3.2411?

Hey Splunky People! We are excited to share the latest updates in Splunk Cloud Platform 9.3.2411. This release ...