I am attempting to populate a metrics index with data from an event index. Using a search similar to:
| eval balgrp=host_lbgroup
| eval restype=case(match(host,".*[a|b]$"),"user",match(host,".*c$"),"admin")
| eval prefix="res." + restype + "."
| rename host_cluster as cluster
| fields balgrp, cluster, _time, prefix, avg_sessions, max_sessions, curr_sessions, host
| meventcollect index=jvm_metrics split=true spool=true prefix_field=prefix host=host cluster balgrp
First, there is a reason for this, our application already outputs metric type data into its log files and instead of building something new and likely having to have code changes in the application to support it, I would like to put some of this already output metric data into an actual metric index for retention and faster searching.
The problem is that when running the search above, the host value for all of my metrics is set to the string literal "host" instead of the actual host value from the events that I am converting. I have also tried renaming the host field to newhost and setting host=newhost in the meventcollect statement but then the metric data has the string literal "newhost" as the host value for all of those data points. If I leave the host= out of my search above, the indexer host names get assigned to the host value for my metrics.
Additionally, every time I run the search above, it populates all values into the metric index again with no de-duplication (even when the search running is identical to a previous run). This will be challenging when attempting to back fill metrics data from my event data. There probably should be an additional option for meventcollect and mcollect to allow for de-duplicating data (or at least last write wins) when adding data. This way if identical data is added to an index the statistics calculated based off it aren't skewed because of multiple entries.
So, has anyone had success with dynamically assigning the host value to metrics entries?
Does anyone know if there might be a de-dupe or last write wins option in the works?
It looks like |mcollect is just a custom command to create metrics.csv files. The metrics documentation on importing .csv files makes no mention of how to add a custom "host" field to your .csv file. Everytime I have used .csv for metrics the host value comes from inputs.conf.
It's not elegant, but I verified a custom host can be extracted with props/transforms. Since using a new sourcetype name incurs license usage I'm betting your can have props reference a custom source instead.
When Splunk writes the .csv from this |mcollect query to disk, it looks like this (I happened to have an "id" dimension in my results)
|mcollect index=index_name source=source_name sourcetype=new_mcollect_stash spool=false id host
==> /var/opt/splunk/var/run/splunk/c247f3132f17807a_metrics.csv <== metric_timestamp,id,host,"metric_name:variable1","metric_name:variable2","metric_name:variable3","metric_name:variable4","metric_name:variable5","metric_name:variable6" 1583647501,71614,my_server_hostname,0,37.0842,0,-1,-1,-1
I copied the default "mcollect_stash" sourcetype to a new one for testing. The transforms extracts the 3rd column as the hostname.
[new_mcollect_stash] SHOULD_LINEMERGE = False pulldown_type = true INDEXED_EXTRACTIONS = csv ADD_EXTRA_TIME_FIELDS = False KV_MODE = none TRANSFORMS-newmetrichost = newmetrichost TIMESTAMP_FIELDS = metric_timestamp TIME_FORMAT = %s.%Q [newmetrichost] DEST_KEY = MetaData:Host REGEX = ^[^,]+,[^,]+,([^,]+) FORMAT = host::$1