Getting Data In

How to store lots of metadata that would be the same for each event?

PeteRichardson
New Member

We measure 50 values every 5 seconds during each hour long experiment. We do these experiments many times under different conditions (different ambient temp, SW build, HW type, etc. A dozen or so different metadata parameters). Each run saves measurements in a csv file I want to import into splunk but I'd like some advice on where to put the metadata about the experiment. The SW build number, for example, applies to the whole experiment (it doesn't change per row) and I want to be able to search on it.

Initially I thought I would just add a new column in the csv for each metadata value. That works fine, but the values are all the same in the column, so that seems wasteful. Then I thought of encoding the metadata into a simple string and using that for the source value, but then At some point I have to parse. Then I thought about a separate lookup table with some foreign key in the csv. That seems too database-y. (Not that there's anything wrong with that. Some of my best friends are DBAs)

If my goal is filter on the metadata values and analyze the measurements for just those experiments (e.g. "For all experiments run at 30C with SW build 1234, plot values for measurement x"), where/how should I store the metadata?
Thanks for any advice

Tags (3)
0 Karma
1 Solution

lguinn2
Legend

Keep it simple. Your initial idea is good. The values (or abbreviations) for the metadata in CSV files is clean and efficient.

Splunk compresses the raw data, so some space will be saved. The fact that some variables exhibit little variety in their values might even mean a smaller than average index size.

In the end, if you make it complicated, you will spend lots of your valuable time getting it set up. And you will probably have to ask Splunk to do more complicated searches, which will cost more in CPU and disk I/O. And what did you save? A few gigabytes of disk? I think it is very likely that you would not "save" when you take everything into account!

Here is Splunk's advice (not all of which seems applicable in this particular case): Logging Best Practices

View solution in original post

0 Karma

lguinn2
Legend

Keep it simple. Your initial idea is good. The values (or abbreviations) for the metadata in CSV files is clean and efficient.

Splunk compresses the raw data, so some space will be saved. The fact that some variables exhibit little variety in their values might even mean a smaller than average index size.

In the end, if you make it complicated, you will spend lots of your valuable time getting it set up. And you will probably have to ask Splunk to do more complicated searches, which will cost more in CPU and disk I/O. And what did you save? A few gigabytes of disk? I think it is very likely that you would not "save" when you take everything into account!

Here is Splunk's advice (not all of which seems applicable in this particular case): Logging Best Practices

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...