Getting Data In

Custom made index outside Splunk, or multivalue index in Splunk ?

sbsbb
Builder

I have logs in that form :

field field field field field <verylong xml multivalued>
field field field field field <verylong xml multivalued>

If I want a result within seconds over all my events (but only a few fields), what is the best way to achieve that ?

Have an logadapter, writer some how in the splunk api, the records I need, or writing only the few fields in a separate log file ?

Or is there a way, to have an index with multiple columns field1,Field3,Field5 that are made by the indexer at index time ? so I could search within a second in that index, and make a drilldown to the whole event (that would last 30 seconds or more...) ??

0 Karma

lguinn2
Legend

You can make fields at index time with Splunk, but that is unnecessary and usually counter-productive. The Splunk index is unlike an RDBMS index; do not use your RDBMS experience to guide you, as it will lead you wrong. Splunk creates index entries for every keyword that it finds, so searching should always be very fast.

Test this:

1 - Define a sourcetype for your data

2 - load the data into Splunk, creating NO index-time fields. Load the entire event; do not pre-process the data.

3 - Define the fields (field1 - field5) for the sourcetype. You can do this before or after step 2.

4 - Write your search: field1=A field2=B field3=C field4=D field5=E or whatever. Run it over a fairly short time period (an hour or less)

5 - Use the Search Job Inspector to see how fast the search job ran.

As long as each event is under 10MB, I don't think you will have any problems. The actual speed of a search is dependent on many factors:

  • amount of data being searched
  • number of indexers
  • search operators used (for example, NOT slows a search)
  • number of keywords
  • whether the search is dense or sparse

A search that uses five simple terms (such as field1=A field2=B field3=C field4=D field5=E) is pretty trivial for Splunk and the speed should depend mostly on the amount of data being searched and how many indexers are doing the searching.

The community can probably give more advice if you could tell us more about your overall data volume, search time range, etc.

lguinn2
Legend

These are not too large for Splunk. I think that events that are over 3 MB would be a problem, but 100K is not that big for Splunk. (And yes, I mean per event)

I think it is better to parse the whole file. Otherwise, how will you store and retrieve the rest of the information?

Of course, if you never want to look at the rest of the event, then sure - just index the header.

But don't separate the header from the data to make it easier for Splunk. You are not making it easier and you aren't fully utilizing Splunk, IMO.

0 Karma

sbsbb
Builder

What if the fields are in big XML-Files ?

In that case, lets say the files comes from the same source, and host... is it not good to have a some index, so that not all the files parsed ?

Actually, I have a transform, on the data to seperate the header form the log, form the xml message.

Event size in Byte is :
p10(esize) avg(esize) p90(esize)
3350 15690.074324 29600

p10(esize) avg(esize) p90(esize)
43200 66237.096045 92370

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...