Hi all,
I'm trying to do something that seems pretty easy conceptually. I'm ingesting a .txt report into Splunk and I want to set the MetaData Host to the system that the report was generated from, not the host that Splunk is getting the log from. The problem is, every path I take, creates a different issue that I can't (or really don't want to) deal with. I've looked through all the docs, and i'm either missing something, misconfiguring something, or it's not possible.
From what I understand, there are only a couple ways to perform a host overwrite:
1) Specify a regex path in the inputs.conf stanza to extract the host from the source path, which could be either a folder or the filename; but is the "source" path nonetheless.
2) Specify a regex for the props.conf and transforms.conf, which overwrites the "host" metadata based on the hostname inside the log.
3) Force a specific hostname string through the configuration files, but then this would be a static hostname for the source or sourcetype.
I've gotten all of these solutions to work individually, but each one creates a separate issue which prevents me from using it:
1) Works well, but I end up with hundreds of "source" file paths inside Splunk, which eventually just makes everything cluttered when looking through the datasets, and confuses end-users. I can get around this issue by declaring a "source = <source>" in the inputs.conf, but then that changes the metadata that Splunk uses to regex extract the hostname from. So instead of getting the hostname from "source::c:\\logs\\client1.txt" it tries to regex the host from "source::<source_name>", which of course it will never find. So it seems like for #1, I either have to deal with a ton of file paths inside Splunk, OR a working host regex extractions.
2) This also seems to work, but brings another issue. The reports i'm ingesting are pretty large, so I have setup a custom LINE_BREAKER value. I can successfully extract the hostname from inside the report using props and transforms, but for reasons I can't figure out, the hostname doesn't carry to the rest of the events as it is line broken. So the first part of the txt file gets the correct host metadata (hostname is in first line of txt), but any line broken event after that, the regex fails and it defaults to the hostname of the system the log resides on. This really seems to baffle me, because for the time settings, if it can't extract the time in subsequent line_breaks, then it will copy the field from the previous. So the correct time metadata gets applied to all the events. But it doesn't do that for the host. And why would it not apply the host metadata to all event lines as it gets line broken, because Splunk should know, as it's ingesting, that this is all coming from the same "event"?
3) Works, but is not really an option because the reports come from different hosts, and this would just create erroneous data.
Transforms.conf:
[SET_HOST_NAME] DEST_KEY = MetaData:Host REGEX = \,HostName\:(.\S[^,]+) FORMAT = host::$1 DEFAULT_VALUE = bonkers
Props.conf:
[SCC_Report] TRANSFORMS-H1 = SET_HOST_NAME TIME_PREFIX = SessionID: TIME_FORMAT = %Y-%m-%d_%H%M%S LINE_BREAKER = (\s\s)Title.*\:\sV\- SEDCMD-remove_fluff = s/([\s]+?Weight[\s\Sa-zA-Z0-9~@#$^*()_+=[\]{}|\\,.?:]*?---------------)/\n\n<REDACTED DURING INGESTION>\n/g SHOULD_LINEMERGE = false category = Custom disabled = false
Fields.conf:
[H1] INDEXED=true Any help is appreciated. I can't tell if i'm trying to get Splunk to do something it can't do, or if i'm just going about it the wrong way. Preferred end-state is: 1) ingest *.txt Report
2) Set both "source" and "sourcetype" to something static (prevent a collection of filenames inside sources and sourcetypes)
3) Set the host metadata for all events created from that single txt report to be the host that is in the first line of the report.
... View more