Hi,
I've got a large list which is grouped in chronological order and I'd like to ingest it into Splunk.
The list structure is 'flexible' as we haven't defined one yet (and we're open to suggestions).
Data is:
A sample of the data:
Timestamp: 2019-01-01T01:01:01Z
12.345
30.314
52.143
.
(50k-100k values here)
.
34914.134
We have some use cases:
However, I've been coming across a few issues:
So I'm open for a solution/discussion on:
Thanks in advance
You should consider to things here:
1) Splunk natively will work with events. So the best performance can be achieved, if one data point is one event.
2) Splunk now supports metrics, which might be exactly what you want to have here...
My recommendation would be either to add a timestamp to each log line, and ingest it line-by-line (which would be Splunk default), or to leverage the automatic timestamp recognition and the fact that splunk will revert to the last recognized timestamp when indexing events.
Thereafter you should consider converting your events to metric data, see here for reference:
https://docs.splunk.com/Documentation/Splunk/latest/Metrics/L2MOverview
If you have the chance of sending the metric data directly to Splunk by other means (collectd, HEC) you should try to do so, as is will significantly boost your search performance for the mentioned use cases.
@DMohn, unfortunately, we can't use metrics as it's not a metric (it just looks like one). What I displayed is the summarised data, the actual data point is:
13,45.45623,144 is summarised to 13.45623.
We still debating the need for precision. At present we've decided to go for one data point/event.
is that literally the event? If so, i would probably prepend each int with the timestamp and ingest them separately
2019-01-01T01:01:01Z 12.345
2019-01-01T01:01:01Z 30.314
2019-01-01T01:01:01Z 52.143
imo trying to do what you want with everything tossed into one event is going to be annoying and inefficient. Having them in each in their own event should allow you to do everything you want with less of a headache. I mean, that is basically what mvexpand is going to do anyway, right? So just do it up front...
@maciep we are moving that way, one data point/event but I just want to know if there is any other structure that I didn't think about. Agreed, having all 50k data points in one event is inefficient and inflexible.
if you expect low cardinality maybe you could aggregate some of the data first and save in a json, e.g. like an array of "int: count" objects or something...but i think that would still make your life harder once it's in Splunk.
@maciep we explored aggregation but by doing so we lose the granularity required from the data points.
@splunked38
I think we can change limit on mvexpand. The 2nd approach seems good.
you can adjust the limit by editing the max_mem_usage_mb
setting in the limits.conf
file to increase the limit of mvexpand.
@vishaltaneja07011993 , we'd rather not increase the limit as the number will grow from 50k, this will become a maintenance task