Getting Data In

How to store lots of metadata that would be the same for each event?

PeteRichardson
New Member

We measure 50 values every 5 seconds during each hour long experiment. We do these experiments many times under different conditions (different ambient temp, SW build, HW type, etc. A dozen or so different metadata parameters). Each run saves measurements in a csv file I want to import into splunk but I'd like some advice on where to put the metadata about the experiment. The SW build number, for example, applies to the whole experiment (it doesn't change per row) and I want to be able to search on it.

Initially I thought I would just add a new column in the csv for each metadata value. That works fine, but the values are all the same in the column, so that seems wasteful. Then I thought of encoding the metadata into a simple string and using that for the source value, but then At some point I have to parse. Then I thought about a separate lookup table with some foreign key in the csv. That seems too database-y. (Not that there's anything wrong with that. Some of my best friends are DBAs)

If my goal is filter on the metadata values and analyze the measurements for just those experiments (e.g. "For all experiments run at 30C with SW build 1234, plot values for measurement x"), where/how should I store the metadata?
Thanks for any advice

Tags (3)
0 Karma
1 Solution

lguinn2
Legend

Keep it simple. Your initial idea is good. The values (or abbreviations) for the metadata in CSV files is clean and efficient.

Splunk compresses the raw data, so some space will be saved. The fact that some variables exhibit little variety in their values might even mean a smaller than average index size.

In the end, if you make it complicated, you will spend lots of your valuable time getting it set up. And you will probably have to ask Splunk to do more complicated searches, which will cost more in CPU and disk I/O. And what did you save? A few gigabytes of disk? I think it is very likely that you would not "save" when you take everything into account!

Here is Splunk's advice (not all of which seems applicable in this particular case): Logging Best Practices

View solution in original post

0 Karma

lguinn2
Legend

Keep it simple. Your initial idea is good. The values (or abbreviations) for the metadata in CSV files is clean and efficient.

Splunk compresses the raw data, so some space will be saved. The fact that some variables exhibit little variety in their values might even mean a smaller than average index size.

In the end, if you make it complicated, you will spend lots of your valuable time getting it set up. And you will probably have to ask Splunk to do more complicated searches, which will cost more in CPU and disk I/O. And what did you save? A few gigabytes of disk? I think it is very likely that you would not "save" when you take everything into account!

Here is Splunk's advice (not all of which seems applicable in this particular case): Logging Best Practices

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...