About mattymo

mattymo · ‎02-10-2025

As I explained before, kv_mode on the search head is all thats needed to auto parse well formatted json. see the spec file for KV_MODE here and then for INDEXED_EXTRACTIONS here noting it explains why you should NOT set both. They are two means to a similar outcome, but indexed_extractions actually puts the value into TSIDX files, where search time it does not. You should always start with search time and only move fields that absolutely need it to index time. Please read this and consider taking a few of the Free Splunk EDU classes to learn more

mattymo · ‎02-10-2025

I believe you are mixing scenarios here, leading to your confusion. Allow me to try and unwind this a bit. Duplicate events are likely unrelated to your json extractions. Let's separate the two items: 1. Indexed Extractions - Lets start with your config. As I mentioned in the previous answers post, you DO NOT need INDEXED_EXTRACTIONS=JSON for this use case. At least not to start. Furthermore, if you only put that setting on the Indexers, as shown above, it does nothing. This setting is meant for properly formatted JSON events and must be set on the forwarder and send to indexers already parsed - Please read this doc explaining the feature Please take INDEXED_EXTRACTIONS out of the equation moving forward ok? It is causing unnecessary confusion here because your original data IS NOT JSON. You do not need this setting to auto parse JSON at search time, which should always be the first step when onboarding data. I almost ALWAYS try and avoid INDEXED_EXTRACTIONS for reasons that are beyond the scope of getting you sorted. please remove it from the config and lets focus on getting your data massaged and auto parsing at search time. 2. Dupe Events - Duplicate events can happen for a few reasons, but none of them are generally related to json parsing. Duplicate events can be confirmed by comparing the _raw event to confirm they are complete dupes. See this helpful answer to see how you can validate whether they are truly duplicates, then we can go from there on why you have duplicate events. This should/will be completely unrelated to your json extractions, and is more likely do to your inputs configuration, where your collector is reading the same file twice, or truly is duplicated in your source files. I don't want you to continue twisting in the wind on this data onboarding, it's been ongoing for quite sometime. Do you know who your Splunk account team is? Your Sales Engineer should be able to help you get unstuck. Please contact them as we have various folks who can sit with you and show you the deal. If you don't know who they are, DM me and I can find them for you. No need to continue to keep banging your head on the desk when we have plenty of trained experts that can help you navigate this learning path.

mattymo · ‎01-22-2025

Where are you seeing the 400 error? The Hec client said timeout in this post, didnt seem to mention 400 bad request? if its format then something is fundamentally wrong with the payload or you sending to the wrong url, etc. Anyways suggest you post your own post with any info and config, especially the hec url config and any splunk internal logs that align. Otherwise would try support or the github issues. found this issue that sounded kinda similar, but hard to tell without you providing config details or logs. https://github.com/splunk/splunk-connect-for-syslog/issues/1329

mattymo · ‎01-21-2025

Are you also on a cloud trial? These could just be momentary server busy etc, be sure to check splunk internal logs index=_internal source=*splunkd.log httpinputdatahandler) to see if the payload hit a 503 or something then retried. It is "expected" that hec clients have to handle backpressure or timeouts, so from time to time you may see a failed send, but as long as retry is successful, its "normal" unless you up your indexing layer to handle more traffic uninterrupted. The error says "timeout reached" so it could be that Splunk was to busy to answer (especially in standalone trail or small test boxes). Also please confirm the HEC full URL you are using. I believe you need to put the full URL https://http-inputs.foo.splunkcloud.com/services/collector/event (or trial equivalient) OP looks like they configured to just the cloud url on 8088, which is not a correct url for HEC.

mattymo · ‎01-16-2025

yes it can. and no it wont. because you wont be extracting fields at index time if you dont use indexed_extractions=json. Splunk is very good at applying only what config matters. So when in doubt send them to both idx and sh. Splunk usually just figures it out. The duplicate extractions issues happens when you do BOTH index time (indexed_extractions=json) AND Search time (kv_mode=json) in your props.conf config. Thats when they may collide, and is why i say i ALMOST never enable indexed_extractions=json as I would always prefer review of search time extract then only move key fields i need to index time for performance reasons.

mattymo · ‎01-16-2025

Simplest way to put it...create a single app with all your sourcetype configs in it, then distribute that app using the appropriate mechanism for 1. indexers (manager node) 2. Search heads (deployer for SHC or DS/Directly, if standalone)

mattymo · ‎01-16-2025

kv_mode=json would be in the sourcetype on the Search Heads. Ingest_Eval will be props/transforms on indexers. Technically you can just put all the configs everywhere and splunk will sort it out.

mattymo · ‎01-16-2025

If they want to parse JSON automatically, the sender agent/mechanism must send full formed JSON events. Review the event with them...its not JSON. its JSON in an unstructured log line. In fact, this looks like some json thru syslog adventure. yum! <12>Nov 12 20:15:12 localhost whatever: data={"a":"b","c":"d"} The easiest way in syslog is to send kvpairs in the log events instead of json, like foo=bar bar=baz, <12>Nov 12 20:15:12 localhost whatever: a=b c=d then splunk can just pick out all the kv pairs automagically, instead of having to parse json to do the same thing. Many apps have this option in their logger. might get lucky. JSON provides no value here if we have to live with whatever pipeline is sending this syslog filled with json stuff. If the app cant change its format, or the ingestion path cant be reviewed, then the next option is surgery on the inbound event, where Splunk config is used to parse out the syslog facility, timestamp (which doesnt even have the year or precision timestamp) the host into indexed fields, then remove this part of the event: <12>Nov 12 20:15:12 localhost whatever: data= so all thats left when splunk indexes the _raw event is: {"a":"b","c":"d"} Which will allow kv_mode=json to do its thing. you never should go straight to indexed_extractions=json. See this awesome conf talk on the power of splunk ingest_eval https://conf.splunk.com/files/2020/slides/PLA1154C.pdf Then these examples on github from the con talk https://github.com/silkyrich/ingest_eval_examples/blob/master/default/props.conf https://github.com/silkyrich/ingest_eval_examples/blob/master/default/transforms.conf or look into splunk edge processor or ingest processor if you are a cloud customer. Options after that, are reviewing the ingestion process and move away from syslog to more modern collection to get better data like iso timestamps with timezone, etc. but whatever you use, still needs to be able to format the event properly if you want the benefit of structured data format. I strongly suggest you consult with your Splunk Sales Engineer on the customer's account so that an expert or partner can help them achieve this and you can learn by working with them. Is this a onprem enterprise user? or Cloud user?

mattymo · ‎01-15-2025

No you cannot use indexed_extractions or kv_mode as the event is not json, only a part of the event is. The way I have gone about this, is to extract the json bit automatically using props/transforms, so the json bit ends up in its own field, then can be worked on. Otherwise I would look at if you really need to extract all the json, or just extract known important field values by extracting their key value pairs with regex, or even look at using ingest eval to extract the non json bits to fields, then dump them, then only keep the json..but it really all depends on the end user and their requirements/use case Either way this is a custom data onboarding...will require some work to get your use case done.. I usually ask why the sender is not using properly formatted json events to begin with...or if they can just insert kv pairs like "foo=bar" instead of the useless json blob thrown into a nonstructured event ...shoving json into a non json event is not really the flex people think it is....but hey thats what devs do these days...either way cleaning up log formats can be hard so may have to just find a way that works for this end user. I know the pain I have to deal with this in OTel Collector events like this: 2025-01-09T20:29:14.015Z info internal/retry_sender.go:126 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "splunk_hec/platform_logs", "error": "Post \"https://http-inputs.foo.splunkcloud.com/services/collector\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)", "interval": "5.105973834s"} AFAIK there is no way you can deal with this so that a user doesnt have to spath the field, unless you hide it in a dashboard or fundamentally change the format of the event thats indexed.

mattymo · ‎01-15-2025

Hi Karthikeya! Are you parsing JSON out of a non-JSON payload? what would a sample event look like, are they not JSON to begin with? Do you need the rest of the event in splunk? or just the JSON part? The short answer is once you prove your extraction works for all your events in search, then you can move the regex parsing to the "props and transforms" configuration so you dont need to run it every time someone searches that sourcetype. It is not possible to give you every step as it depends on your data and outcomes and environment, but from what you simply shared see this documentation - https://docs.splunk.com/Documentation/SplunkCloud/9.3.2408/Knowledge/Createandmaintainsearch-timefieldextractionsthroughconfigurationfiles

mattymo · ‎01-15-2025

Hey @Karthikeya as always "it depends" what you mean by "extract json data" and what problem you are trying to solve? Are you seeing duplicate extractions? This thread talks about indexed extractions settings (in your case it would be needed on the UF) and search time "kv mode" settings (which would be on the Search Head) colliding. "Indexed Extractions" I would not suggest be the first step when working with JSON data. Splunk can extract well formed json at "search time", saving storage and search performance if it is not necessary to store the entire json blob in indexed fields. So can you clarify what exactly you are doing, or even better post a new question and we can move there? thanks!

mattymo · ‎11-12-2024

Is it a single search head or search head cluster? - No cluster all single search heads Then really it should just be a user migration to the new SH that is running side by side from a backup as descibed. With them being singletons, theres already a reduced amount of redundancy or expectation of no disruption, so should just be an organized migration and shut down. Have the indexers already been migrated to Azure? - No indexers have been migrated yet So will the Search Heads migrated be searching back to onprem when cutover is done? if so, as mentioned, you need to have the networking right, so another reason to build the sh in azure along side existing, confirm search and comms work. Will the SH(C) be searching Azure or On-Prem Indexers as well? - SH will be searching Azure indexers So when you build the Azure SH, they will be searching net new indexers in Azure? Do they need to Search Onprem too? Just gotta nail the config and networking. What "components" do you rely on most on this SH(C)? Premium apps like ES or ITSI? or just Splunk Enterprise apps? - SH will rely on both premium apps and Splunk Enterprise apps Well, ES and ITSI are their own beasts, see documentation for those. Enterprise apps will depend on what needs to persist and be migrated. Definitely ideal to involve experienced services folks or experienced Splunk Admins. Either way it is basically a build new SH, copy configs over, validate, migrate users over. All the traps you may encounter along the way should mostly be resolved in the standing up of the new SH and then getting configs running along side your existing. This is nuanced and is why you wont really find a 1:1 migration guide, cause with Splunk, "it depends". The amount of disruption will be mitigated by having good understanding of the major workloads running in the environments (montioring console can help with by app breakdowns) and what needs to be carried over to the new enviro and what can/needs to be cleaned up to reduce work needed in migration.

mattymo · ‎11-07-2024

Hi, What collector are you using to ship the logs?

mattymo · ‎11-07-2024

Hi! I think you are on the right track with field extraction and it's behaviours. The search that works, does so because the search "looks for any match of your fruit string in the _raw event", whereas the ones you are struggling with look for a field value pair, which actually does not exist int the raw event. (there is no "ERROR="). Splunk would have to extract this to recognize it as a field. I would start with, what is the sourcetype of this data? Does it have any JSON parsing happening at search time, index time or both. (HINT: kv_mode =json / props.conf / transforms.conf ) Easy way to start is..does the Splunk UI recognize this as properly formed JSON and show you it "pretty printed"? Do you see the JSON kv pairs extracted in "interesting fields"? If not then we would need to extract them to be able to reference the fields and their values.

mattymo · ‎11-06-2024

Hint: check out the splunk dump command. Assuming this is for already indexed data and depending on how much data your searches return this can be a quick and dirty way to dump to SH local disk then you can just have some process run the aws s3 cp commands. The benefit of this is you can get formatted output with fields you want to retain in the plaintext data that will end up in the files. also can be compressed and files can be rolled, etc. Can also be triggered via api or federated search. Downside, depending on the definition of "tonnes" and how much time those tonnes span, the searches may need to be well thought out and broken down into chunks of time. Command is marked as "internal" unsupported, but depending on your needs may be fine. Requires single long running search that could fail. This is where something like posting the job then programmatically pulling down the results and validating with some logic might be more robust depending on your needs. Other options would likely have you looking at surgery with Splunk formatted buckets which, is probably lowerl level then you need/want to go if some other system needs to eat it. If this is also new data streaming in, id be looking at Ingest Actions to output to S3.

mattymo · ‎11-06-2024

Hello! Is it a single search head or search head cluster? Have your indexers already been migrated to Azure? Will the SH(C) be searching Azure or On-Prem indexers as well? What "components" do you rely on most on this SH(C)? Premium apps like ES or ITSI? or just Splunk Enterprise apps? I would probably: - back up the apps and kvstore if needed - build the new SH/SHC in the cloud - restore configs - cut over DNS or point users in a uniform fashion to the new SH during a Maintenance Window - shut down the old SH. Curious what docs you ended up in, always worth feedback at the bottom of the docs page if the topic or guidance needed was missing.

mattymo · ‎10-25-2024

can you post what you ended up with and accept an answer that helped as the solution? even if it's your own ( i believe you can do that) Glad to hear you got where you needed to go!

mattymo · ‎10-17-2024

So all 3 files are picked up by this one monitor stanza? Are the files all truly the same format aka "sourcetype"? Can you explain a bit more about why we omitting just one file? What can we use to uniquely identify this particular source? the host? Sounds like it has to be source + something else to make it unique. If you cant differentiate them at the source, then perhaps something like ingest_eval or a "sourcetype rename" is needed. Seems to me you might just be overloading the config...I mean maybe just dont deploy an input that picks up this file in prod? thats why i asked if they truly are all the same sourcetype/format....

mattymo · ‎10-17-2024

This is part of the splunkd health report. It is configured in health.conf Would suggest reviewing if this "forwarder" is sending old files or actually is falling behind or have some clean up needed on its ingestion tracker values.

mattymo · ‎10-17-2024

Hi! If I am following your question, you are concerned because the lab and prod file paths are the same? You are not required to set the source path to the file in props.conf to get the desired outcome. If your sourcetype is being set in your inputs that pick up this file, you can simply configure the props to match on the sourcetype to do the processing. Also I don't think you want to duplicate the stanza names in transforms.conf ie. [setnull] is named twice. could lead to unintended consequences. What does your inputs.conf stanza look like? How are you sending this file (UF to indexers? Uf to HF to idx?) Also are you on-prem or cloud? I ask because "Ingest Actions" (and other solutions like ingest or edge processor) provides a UI for you to do this to help validate and avoid config mistakes. Regardless, please always test your configs in a local lab environment to avoid having a bad day 🙂

mattymo · ‎10-16-2024

See my other comment. You will need another input method. Suggest you google Azure functions "unzip" and see if they can just use Azure to do that. Otherwise you would need custom code or scripted input to pull in the zip and pass to something like the `unarchive_cmd` unarchive_cmd = <string> * Only called if invalid_cause is set to "archive". * This field is only valid on [source::<source>] stanzas. * <string> specifies the shell command to run to extract an archived source. * Must be a shell command that takes input on stdin and produces output on stdout. * Use _auto for Splunk software's automatic handling of archive files (tar, tar.gz, tgz, tbz, tbz2, zip) * This setting applies at input time, when data is first read by Splunk software, such as on a forwarder that has configured inputs acquiring the data. * Default: empty string Azure functions is likely the more scalable/flexible option, but if this is not a large amount of data, you might be able to hack together HF(s) to do this. Please, accept my original comment as solution to your post and review the options I mentioned in my comment. Also be sure to check out internal azure sme channels to learn more or holler at Pro Serv.

mattymo · ‎10-16-2024

Yeah, maybe investigate Azure Functions, pick up unzip, post to new blob, or send to HEC. Or HF and investigate a custom input to feed the `unarchive_cmd` Make sure to accept the answer to original post if it was helpful. Thanks!

mattymo · ‎10-16-2024

At least as of the time of this comment, the docs say "No" The Azure Storage Blob modular input for Splunk Add-on for Microsoft Cloud Services does not support the ingestion of gzip files. Only plaintext files are supported.

mattymo · ‎03-25-2024

Hey @padresman Will try your example. Gotta be very careful that your expression fields match the capture group you use, as it will store it in "attributes."capture group value" by default. Also, make sure to use golang regex on regex101. though your regex appears to be fine. Also its wise to iterate and NOT remove the fields you make to see what they look like when they arrive at splunk. Can help make sure your value is what you think it is.....

mattymo · ‎02-06-2024

All these questions can be answered in the cloud monitoring console and you should start there instead of trying to write your own bespoke spl. License > Storage overview is a great place to start. There is also DDAS and DDAA searches there. specifically for archive data please review here as per docs: https://docs.splunk.com/Documentation/SplunkCloud/9.1.2308/Admin/DataArchiver Steps to review the overall size and growth of your archived indexes You might want to review the size and growth of your archived indexes to better understand how much of your entitlement you are consuming. This can help you predict usage and expenses for your archived data. From Splunk Web, go to Settings > Indexes. From the Indexes page, click on a value in the Archive Retention column.

Posts	809
Solutions	85
Karma Given	220
Karma Received	351
Member Since	‎03-15-2016

Online Status	Offline
Date Last Visited	‎06-26-2025 03:29 AM

Re: Duplicate values because of json values

Re: Duplicate values because of json values

Re: SC4S to Splunk Cloud forwarding receiving erro...

Re: SC4S to Splunk Cloud forwarding receiving erro...

Re: Query to be auto applied

Re: Query to be auto applied

Re: Query to be auto applied

Re: Query to be auto applied

Re: Query to be auto applied

Re: Query to be auto applied

Re: Why would INDEXED_EXTRACTIONS=JSON in props.co...

Re: Guidance on Migrating Splunk Enterprise Search...

Re: openshift_events sourcetype suddenly missing ...

Re: Inconsistent Key-Value Pair Searching in Splun...

Re: Exporting Data to S3

Re: Guidance on Migrating Splunk Enterprise Search...

Re: Props.Conf - By-Pass

Re: Props.Conf - By-Pass

Re: Forwarder Ingestion Latency

Re: Props.Conf - By-Pass

Re: How to index .zip from Azure blob Storage via ...

Re: How to index .zip from Azure blob Storage via ...

Re: How to index .zip from Azure blob Storage via ...

Re: Excluding Special Characters from OTEL Splunk ...

Re: Difference in daily ingested GB & Daily Search...

Are you a member of the Splunk Community?