Archive

Data Discovery of Index Fields

Explorer

Does anyone know how to generate a report listing all fields (from an index) and their respective info (example follows)?
% not null of each field,
different values of each field,
% of each field per value.

This would be like the information seen in the Search page, on the left hand side, that shows "selected fields" and "Interesting Fields".

However, instead of being interactive (like in Search Page), it would be a search query that can run to generate a report. And, the report could be in table format or csv file or just regular output.

Any and all help would be much appreciated.
Cheers... Al

Tags (1)
0 Karma
1 Solution

Legend

First of all, just to make it clear - there are no fields in the index itself. Well almost no fields, a few like source and sourcetype are there but pretty much all field extraction in Splunk happens at search-time, AFTER the events have been extracted from the index. Basically Splunk only indexes the data and breaks it down into segments, and when you search for, say, "myfield=foo" Splunk will look for the term "foo" in the index first and THEN check if the instances it found can be connected to the field "myfield".

Anyway, on to the rest of the question. The first thing you'll want to do is find out how many events there are in the index, so you can calculate percentages. This query will do it:

| dbinspect index=yourindex | stats sum(eventCount) as eventCount

...and to put it in a field that you can use in the rest of your search, put it in a subsearch after eval:

... | eval [| dbinspect index=yourindex | stats sum(eventCount) as eventCount | format "" "" "" "" "" ""]

This will give you an eventCount field with the total number of events in the index.

Now, as for your first example, % of non-null events, I started playing with searches to get something working and it's actually more complex than I thought at first (either that, or I forgot something obvious). The problem lies in performing calculations on an arbitrary list of fields, which is needed for calculating the percentage. I couldn't really find a solution that works on pre-6.x instances of Splunk. Splunk 6 introduces the foreach command which solves this though. Here goes:

index=yourindex | stats count(*) as * | eval [| dbinspect index=yourindex | stats sum(eventCount) as eventCount | format "" "" "" "" "" ""] | foreach * [eval <<FIELD>> = '<<FIELD>>'/eventCount*100]

This will give you a table with one column per extracted field in the search over the whole index, and a value showing how many percent of events this field was extracted in.

As for the second one, values for each field, this is much easier:

index=yourindex | stats values(*) as *

That's it.

The third example I'm not really sure exactly what you mean by, so I'll just leave that for now. Please clarify this example and I'll see if I can write something up.

Anyway the drawback to all this is that it's probably be painfully slow - it requires Splunk to read ALL EVENTS in an index in order to be able to compile the report. You're probably going to want to take just a snapshot in order to make it a bit more speedy.

View solution in original post

Legend

First of all, just to make it clear - there are no fields in the index itself. Well almost no fields, a few like source and sourcetype are there but pretty much all field extraction in Splunk happens at search-time, AFTER the events have been extracted from the index. Basically Splunk only indexes the data and breaks it down into segments, and when you search for, say, "myfield=foo" Splunk will look for the term "foo" in the index first and THEN check if the instances it found can be connected to the field "myfield".

Anyway, on to the rest of the question. The first thing you'll want to do is find out how many events there are in the index, so you can calculate percentages. This query will do it:

| dbinspect index=yourindex | stats sum(eventCount) as eventCount

...and to put it in a field that you can use in the rest of your search, put it in a subsearch after eval:

... | eval [| dbinspect index=yourindex | stats sum(eventCount) as eventCount | format "" "" "" "" "" ""]

This will give you an eventCount field with the total number of events in the index.

Now, as for your first example, % of non-null events, I started playing with searches to get something working and it's actually more complex than I thought at first (either that, or I forgot something obvious). The problem lies in performing calculations on an arbitrary list of fields, which is needed for calculating the percentage. I couldn't really find a solution that works on pre-6.x instances of Splunk. Splunk 6 introduces the foreach command which solves this though. Here goes:

index=yourindex | stats count(*) as * | eval [| dbinspect index=yourindex | stats sum(eventCount) as eventCount | format "" "" "" "" "" ""] | foreach * [eval <<FIELD>> = '<<FIELD>>'/eventCount*100]

This will give you a table with one column per extracted field in the search over the whole index, and a value showing how many percent of events this field was extracted in.

As for the second one, values for each field, this is much easier:

index=yourindex | stats values(*) as *

That's it.

The third example I'm not really sure exactly what you mean by, so I'll just leave that for now. Please clarify this example and I'll see if I can write something up.

Anyway the drawback to all this is that it's probably be painfully slow - it requires Splunk to read ALL EVENTS in an index in order to be able to compile the report. You're probably going to want to take just a snapshot in order to make it a bit more speedy.

View solution in original post

Legend

Mimicking that BEHAVIOUR is a non-trivial problem as it involves writing completely new widgets for splunkweb. I'm going to pass on that one. Sorry. 🙂

0 Karma

Explorer

Thanks... good info. First and last tips work and are helpful. But, do you have code that would mimic search page results, left hand margin, "Selected Fields" or "Interest in Fields"? Here, Splunk (by default) provides a list of fields and respective # of distinct values. Click on one, and can see the % info of each value per field, and % not-null of entire field. Regardless, previous info is useful and helps with the data profiling effort.... Thanks again!

0 Karma
Don’t Miss Global Splunk
User Groups Week!

Free LIVE events worldwide 2/8-2/12
Connect, learn, and collect rad prizes and swag!