Splunk Search

(How) are fields stored in splunk in an index when extracted during ingest?

ericvdhout
Path Finder

Hi,

Am quite new to splunk, and coming from Elasticsearch, so my knowledge is biased. However I did notice that Elastic performs faster on large datasets. I think 1 of the main reasons is the on-the-fly field extractions splunk performs when searching.

 

hence we created a source_type for ingesttime fieldextraction. 
Now I would expect these field would always be available, even when choosing for fast-mode. However, this seems not to be that way.

 

So my questions:
(How) are fields stored in splunk in an index when extracted during ingest?

Can I tell splunk to NOT extract extra fields for a certain index when in fast_mode or smart mode, but just show the fileds extracted during the ingest?

Thnx

Labels (2)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @ericvdhout,

in Smart Mode, are calculated and used only the field extracted at Index Time, so the other fields (extracted at Search Time) aren't extracted so they don't  give more load to the search.

About the slower Splunk Engine, it's the first time I hear this, on the contrary I always heard that Splunk is faster than ELK.

Could you share a sample of your searches, maybe the problem is in the way to build the search.

Ciao.

Giuseppe

View solution in original post

0 Karma

PickleRick
SplunkTrust
SplunkTrust

It's a long story. In general - unless explicitly defined as index-time fields - no fields are extracted during ingestion. That's the first important thing about Splunk.

Second thing is that due to that fact, Splunk works differently that, for example, Elastic (although in latest versions it is supposed to have some "schema on the fly" functionality but I haven't seen it in action yet). If you search for a condition "field=value" Splunk doesn't - as many other solutions - scan an index of the "field" field for an occurrence of the string "value" but (simplifying a bit) rather scans for all occurrences of the string "value" and then checks in which events from the resulting set this value is in a proper spot within the event so that it matches the field "field".

With indexed fields Splunk works more or less similarily to a "classic" database search which is waaaaaay faster (especially with some types of searches) but at the cost of the field being immutable after the initial ingest-time extraction.

0 Karma

ericvdhout
Path Finder

Thank you all for your insights, I will dig into your suggestions. 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @ericvdhout,

good for you, see next time!

please accept one answer for the other people of Community.

Ciao and happy splunking.

Giuseppe

P.S.: Karma Points are appreciated by all Contributors 😉

0 Karma

isoutamo
SplunkTrust
SplunkTrust

When/if you have a distributed environment and have separate search heads and indexers, you must add fields.conf to SH layer to told that you have own indexed fields, otherwise splunk handle those as non indexed. Another option is use in search format "field::value" instead of "field=value".

Also as @VatsalJagani said you should use tstats to better utilise those fields if possible.

r. Ismo

ericvdhout
Path Finder

Does this field::value thingy also works for charts in dashboards?
Troed to change the working, but slow query

index="aiam_apigw_app_idx" event_type="APIEND" | timechart span=1m count by platform

to

index="aiam_apigw_app_idx" event_type::"APIEND" | timechart span=1m count by platform

and the first one gave a correct graph, the second one gave no results?

0 Karma

isoutamo
SplunkTrust
SplunkTrust
As this :: replaced the information of fields.conf (indexed field). I'm not sure if it works with non indexed fields?
0 Karma

ericvdhout
Path Finder

event_type should be an indexed field. However I need to ask how and where the sourcetype/fields are confugured for indexing during indextime.

Will get back on that.

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Those configurations must be in the 1st full Splunk Enterprise instance starting from UF/source system. This can be a HF (e.g. you are using HF instead of UF, or you have some HF as intermediate forwarders) or indexer if/when you haven't any HF between your UF and indexer.

r. Ismo

0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

@ericvdhout - If you are just dealing with index time fields, use the tstats query rather than running a regular search. 

For example, if you want to see the count of events per source_type field (that you extracted during ingest time).

| tstats count where index=<your-index> <any other search criteria based on ingest-level fields> by source_type

 

https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/Tstats 

If you looking at tstats in the doc, please look for "Indexed fields in tstats searches with WHERE" as tstats can be used with data models and other acceleration as well.

 

On a side note, fast-mode & smart-mode are not related to whether fields are search time or index time.

 

I hope this helps!!!

ericvdhout
Path Finder

Hmmm, funny,

 

This is by no means my experience.

I tried splunk on 3 environments now and keep getting to the same conclusion. However, I do have to say we have quite a lot of data (The number of events per 30 minutes fluctuates between 25 mln and 50 mln events per half an hour, depending on time of the day/nr. of requests done to our platform.

As for that search ...   When I keep it simple, and just request all events in a certain timeframe, Splunk cannot keep up with Elastic. The main issue is then when needing an ad hoc search (for example when under pressure because of a serious incident), I may need to search quickly on a certain ID or ip-address or some other field.

When in Elastic, I just filter on that ID for the last 2 days and the resulting few lines are there in a few seconds. Splunk needs way more time for that since it seems just to walk through all events and match the 'fields' or something lime that?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @ericvdhout,

when you have many events, the best approach is to accelerate your searches, probably in ELK you're using this approach.

To accelerate a search in Splunk you have many methods that you can see at https://docs.splunk.com/Documentation/Splunk/8.2.6/Knowledge/Aboutsummaryindexing

Another choice is to use an accelerated Data Models, with a similar approach: in few words, you schedule a search with the frequence you need (e.g. 5 minutes, one hour, one day, it depends on your needs) and then you make a search on the scheduled search results that are stored in a MongoDB, this approach is very performant.

Ciao.

Giuseppe

gcusello
SplunkTrust
SplunkTrust

Hi @ericvdhout,

in Smart Mode, are calculated and used only the field extracted at Index Time, so the other fields (extracted at Search Time) aren't extracted so they don't  give more load to the search.

About the slower Splunk Engine, it's the first time I hear this, on the contrary I always heard that Splunk is faster than ELK.

Could you share a sample of your searches, maybe the problem is in the way to build the search.

Ciao.

Giuseppe

0 Karma

ericvdhout
Path Finder

As again, thnx all for the many replies and suggestions.
I choose gcusello as accepted solution, not because it is the best suggestion, but it just gives an answer to the question I asked about the storage/extraction of the fields in the modes.

Al the other suggestions dig deeper into the problem that is behind my question and I highly appreciate this.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi good for you, see next time!

Ciao and happy splunking.

Giuseppe

P.S.: Karma Points are appreciated by all the Contributors 😉

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...