Custom made index outside Splunk, or multivalue in...

sbsbb · ‎02-18-2013

I have logs in that form :

field field field field field <verylong xml multivalued>
field field field field field <verylong xml multivalued>

If I want a result within seconds over all my events (but only a few fields), what is the best way to achieve that ?

Have an logadapter, writer some how in the splunk api, the records I need, or writing only the few fields in a separate log file ?

Or is there a way, to have an index with multiple columns field1,Field3,Field5 that are made by the indexer at index time ? so I could search within a second in that index, and make a drilldown to the whole event (that would last 30 seconds or more...) ??

lguinn2 · ‎02-18-2013

You can make fields at index time with Splunk, but that is unnecessary and usually counter-productive. The Splunk index is unlike an RDBMS index; do not use your RDBMS experience to guide you, as it will lead you wrong. Splunk creates index entries for every keyword that it finds, so searching should always be very fast.

Test this:

1 - Define a sourcetype for your data

2 - load the data into Splunk, creating NO index-time fields. Load the entire event; do not pre-process the data.

3 - Define the fields (field1 - field5) for the sourcetype. You can do this before or after step 2.

4 - Write your search: field1=A field2=B field3=C field4=D field5=E or whatever. Run it over a fairly short time period (an hour or less)

5 - Use the Search Job Inspector to see how fast the search job ran.

As long as each event is under 10MB, I don't think you will have any problems. The actual speed of a search is dependent on many factors:

amount of data being searched
number of indexers
search operators used (for example, NOT slows a search)
number of keywords
whether the search is dense or sparse

A search that uses five simple terms (such as field1=A field2=B field3=C field4=D field5=E) is pretty trivial for Splunk and the speed should depend mostly on the amount of data being searched and how many indexers are doing the searching.

The community can probably give more advice if you could tell us more about your overall data volume, search time range, etc.

lguinn2 · ‎03-05-2013

These are not too large for Splunk. I think that events that are over 3 MB would be a problem, but 100K is not that big for Splunk. (And yes, I mean per event)

I think it is better to parse the whole file. Otherwise, how will you store and retrieve the rest of the information?

Of course, if you never want to look at the rest of the event, then sure - just index the header.

But don't separate the header from the data to make it easier for Splunk. You are not making it easier and you aren't fully utilizing Splunk, IMO.

sbsbb · ‎03-05-2013

What if the fields are in big XML-Files ?

In that case, lets say the files comes from the same source, and host... is it not good to have a some index, so that not all the files parsed ?

Actually, I have a transform, on the data to seperate the header form the log, form the xml message.

Event size in Byte is :
p10(esize) avg(esize) p90(esize)
3350 15690.074324 29600

p10(esize) avg(esize) p90(esize)
43200 66237.096045 92370

Custom made index outside Splunk, or multivalue index in Splunk ?

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

Splunk Observability for AI

🔐 Trust at Every Hop: How mTLS in Splunk Enterprise 10.0 Makes Security Simpler

Are you a member of the Splunk Community?

Custom made index outside Splunk, or multivalue index in Splunk ?

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

Splunk Observability for AI

🔐 Trust at Every Hop: How mTLS in Splunk Enterprise 10.0 Makes Security Simpler