Getting Data In

Correlation-Indexed Fields, Indexed Time Field Extraction, HF/UF, Deployment Server, and Performance

SplunkDash
Motivator

Hello,

I was trying to find out the correlation among Indexed Fields, Indexed Time Field Extraction, HF/UF, Deployment Server, and Performance.

Do we need to have Indexed Time Field Extraction to create Indexed Fields? When we have the Indexed Time Field Extraction, do we have to have HF installed there, and does it have to be on deployment server? What would be the computational overload having the Indexed Time Field Extraction in compared to Search Time Field Extraction as SPLUNK highly recommend avoiding Indexed Time Field Extraction?

Thank you so much for your thoughts and support in findings this correlation.

Labels (3)
0 Karma
1 Solution

gcusello
Esteemed Legend

Hi @SplunkDash.

I already answered to some of your questions in my answers to you previous question, but evidently I wasn't so clear.

going by order:

Do we need to have Indexed Time Field Extraction to create Indexed Fields?

Yes if you mean fields created before searches,

No if you mean fields to use in searches.

When we have the Indexed Time Field Extraction, do we have to have HF installed there?

Time Fields extraction is an activity of Indexers or when present of Heavy Forwarders, this means that you don't need Heavy Forwarders (because you always have Indexers!), HFs are useful if you have to take syslogs or HEC or to concentrate logs from segregated networks, but they are also useful to take some load of Indexers: parsing, merging, filtering and Indexed Time Field Extractions.

The correct question is: are your Indexers overloaded or not for you usual logs volume?

and does it have to be on deployment server?

Deployment Server is very useful to manage all kind of Forwarders (UFs and HFs), especially useful when you mave many of them.

It's also useful when you have few of them because otherwise you have to manually distribute configurations, but it isn'r mandatory if you like to manually update each Forwarder of your network.

What's the problem to use a Deployment Server? It's very useful!

What would be the computational overload having the Indexed Time Field Extraction in compared to Search Time Field Extraction as SPLUNK highly recommend avoiding Indexed Time Field Extraction?

Indexed Time Field extraction is an overloading job for Indexers and if they have to index many logs it isn't a good idea give them this overload.

As I said in my previous answer, you have to analyze your Indexers (using the Monitoring Console) and see if they are overloaded or not, if they are overloaded, you have three choices:

  • don't use Indexed Time Field Extraction and leave fields extractionto Search Time (the usual solution also hinted by Splunk),
  • move them on HFs (if you have), there ss no sense if you haven't them and you want to use only for Fields Extractions,
  • give more resources to your Indexers.

Indexed Time Field Extractions are very useful if you have a very high load on your Search Heads, you can monitor this always using the Monitoring Console.

Also in this case the solution is similar to the previous:

  • use Indexed Time Field Extractions to move load from Search Time to Index Time,
  • give more resources to your SHs (the usual solution).

Indexed Time Fields Extraction requires also more disk space but this isn't the reason because Splunk says to avoid it, the reaon is the you have an overload that maybe isn't mandatory because the problem of your searches isn't the fields extraction but the search itself or more frequently the storage you have.

If you open the Job Inspector of your searches, you can find the time used for fields extarctions, probably it's very small.

Tell me if there is something alse not so clear.

I already discussed all these thing with the teacher of the Architecting Splunk Course Training I followed.

Ciao.

Giuseppe

View solution in original post

PickleRick
Ultra Champion

Ok, first things first. It's "indexed field" (because the field itself is separately indexed) but it's "index-time extraction", not "indexed time" (because the field is being extracted during the indexing process - "index time").

You need index time extractions to create indexed fields. That's what they are for 🙂 Search-time extractions are performed when you run a search so the fields are extracted dynamkcally from the data you're searching for. Index-time extractions, on the other hand, are run during the indexing process as part of the ingestion pipeline and resulting field values are stored in a separate index.

What's the difference? Well, the indexed fields are indeed faster but they are - as I already wrote in that other thread - extracted and stored once. And they are _not_ searched if you do a general search (without specifying field).

Heavy forwarders are not necessary for index-time extractions. But if you have HFs i  your infrastructure before your indexers, the ingestion pipeline, ezcept from the final index-writing, happens on them - all ingesting pipeline activities are perform on a first "heavy" component (HF or indexer) in event's path from source to indexer.

Indexed fields are not that "heavy" in terms of performance (unless you heavily abuse them). It's just that there are only very specific use cases where indexed fields make sense.

Due to how splunk works, they mostly make sense if you have a value for which you often search within a single field but the same value can be in many other events in other fields. Splunk normally (with search-time fields) firts searches through all the events for the appearances of the value and then parses the events to see if the value indeed is placed in the proper field.

So, for example, if you have events that contain eight fields which can have values of "on", "off" or "unknown". If you're searching for a condition of "field1=unknown", splunk first searches for all occurrences of the word "unknown" which can produce many results which need to be parsed, analyzed and ultimately discarded because the word "unknown" was contained outside of the field1 field. So in this case, if you made the field1 an indexed field, you could directly search through index of field1 to quickly find all occurrences of "unknown" in this field. The downside however is that normally if you simply search for "unknown" on its own, splunk will return all events with this word. However, if you have your field1 extracted as an indexed field , splunk will no longer bother with its contents while searching throuh the events so your search for "unknown" wouldn't find the fields i  which field1=unknown. If you extracted all fields as indexed fields, you'd have to ezplicitly search for the values in the specific fields - field1=unknown OR field2=unknown OR field3=unknown... Effectively you'd be turning your splunk into something like elasticsearch with predefined schema.

There are times when indexed fields are useful but they are rare. Usually other forms of acceleration are simply better - they don't interfere with natural splunk data indexing and processing process but still give you good performance,

gcusello
Esteemed Legend

Hi @SplunkDash.

I already answered to some of your questions in my answers to you previous question, but evidently I wasn't so clear.

going by order:

Do we need to have Indexed Time Field Extraction to create Indexed Fields?

Yes if you mean fields created before searches,

No if you mean fields to use in searches.

When we have the Indexed Time Field Extraction, do we have to have HF installed there?

Time Fields extraction is an activity of Indexers or when present of Heavy Forwarders, this means that you don't need Heavy Forwarders (because you always have Indexers!), HFs are useful if you have to take syslogs or HEC or to concentrate logs from segregated networks, but they are also useful to take some load of Indexers: parsing, merging, filtering and Indexed Time Field Extractions.

The correct question is: are your Indexers overloaded or not for you usual logs volume?

and does it have to be on deployment server?

Deployment Server is very useful to manage all kind of Forwarders (UFs and HFs), especially useful when you mave many of them.

It's also useful when you have few of them because otherwise you have to manually distribute configurations, but it isn'r mandatory if you like to manually update each Forwarder of your network.

What's the problem to use a Deployment Server? It's very useful!

What would be the computational overload having the Indexed Time Field Extraction in compared to Search Time Field Extraction as SPLUNK highly recommend avoiding Indexed Time Field Extraction?

Indexed Time Field extraction is an overloading job for Indexers and if they have to index many logs it isn't a good idea give them this overload.

As I said in my previous answer, you have to analyze your Indexers (using the Monitoring Console) and see if they are overloaded or not, if they are overloaded, you have three choices:

  • don't use Indexed Time Field Extraction and leave fields extractionto Search Time (the usual solution also hinted by Splunk),
  • move them on HFs (if you have), there ss no sense if you haven't them and you want to use only for Fields Extractions,
  • give more resources to your Indexers.

Indexed Time Field Extractions are very useful if you have a very high load on your Search Heads, you can monitor this always using the Monitoring Console.

Also in this case the solution is similar to the previous:

  • use Indexed Time Field Extractions to move load from Search Time to Index Time,
  • give more resources to your SHs (the usual solution).

Indexed Time Fields Extraction requires also more disk space but this isn't the reason because Splunk says to avoid it, the reaon is the you have an overload that maybe isn't mandatory because the problem of your searches isn't the fields extraction but the search itself or more frequently the storage you have.

If you open the Job Inspector of your searches, you can find the time used for fields extarctions, probably it's very small.

Tell me if there is something alse not so clear.

I already discussed all these thing with the teacher of the Architecting Splunk Course Training I followed.

Ciao.

Giuseppe

SplunkDash
Motivator

Hello,

Do you think following props and transforms configuration files will work on creating three indexed fields? Your recommendation will be highly appreciated. Thank you so much as always.

[myTransformsConf]
REGEX=(?P<USERID>.*?)\,\s?\w*?\,\s?\w*?\,\s?\w*?\,\s?\w*?\,(?P<NETWORKID>.*?)\,\s?.*?\,\s(?P<RTCODE>.*?)\,\s?
FORMAT=USERID::$1 NETWORKID::$2 RTCODE::$3
WRITE_META=TRUE

 

[myPropsConf]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
INDEXED_EXTRACTIONS=csv
HEADER_FIELD_LINE_NUMBER=1
HEADER_FIELD_DELIMITER=,
FIELD_DELIMITER=,
TIME_FORMAT=%Y%m%d%H%M%S
TIMESTAMP_FIELDS=Timestamp
TRANSFORMS-AuditLogs = myTransformsConf

 

 

0 Karma

PickleRick
Ultra Champion

You're using indexed_extractions so you should not normally need to extract additional fields manually. Just set the fields as indexed in fields.conf

SplunkDash
Motivator

Hello,

Thank you so much for your quick response, truly appreciate it. Do you think it's going to work now?

 

[myTransformsConf]
REGEX=(?P<USERID>.*?)\,\s?\w*?\,\s?\w*?\,\s?\w*?\,\s?\w*?\,(?P<NETWORKID>.*?)\,\s?.*?\,\s(?P<RTCODE>.*?)\,\s?
FORMAT=USERID::$1 NETWORKID::$2 RTCODE::$3
WRITE_META=TRUE

 

[myPropsConf]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
HEADER_FIELD_LINE_NUMBER=1
HEADER_FIELD_DELIMITER=,
FIELD_DELIMITER=,
TIME_FORMAT=%Y%m%d%H%M%S
TIMESTAMP_FIELDS=Timestamp
TRANSFORMS-AuditLogs = myTransformsConf

0 Karma

PickleRick
Ultra Champion

To be fully honest - I have no idea. Try and see. You still have to set your fields as indexed in fields.conf. Having said that - I think using indexed extractions is simply easier (and might be more efficient with simple csv than regex-based extractions)

SplunkDash
Motivator

Hello,

 

Thank you so much again. I have a quick question, what changes I have to make in fields.conf file in this context?

0 Karma

SplunkDash
Motivator

Hello @PickleRick , 

Your reference link and other provided info are really helpful to work with fields/transforms/props configuration files. But I stuck in one place on how to define my fields.conf/transforms  files to create 3 custom Indexed Fields if I take of REGEX from transforms configuration file and use INDEXED_EXTRACTIONS :

3 configuration files are given below and please advice me. Thank you so much, truly appreciate your support.

props.conf

[myPropsConf]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
INDEXED_EXTRACTIONS=csv
HEADER_FIELD_LINE_NUMBER=1
HEADER_FIELD_DELIMITER=,
FIELD_DELIMITER=,
TIME_FORMAT=%Y%m%d%H%M%S
TIMESTAMP_FIELDS=Timestamp
TRANSFORMS-AuditLogs = myTransformsConf

 

transforms.conf

[myTransformsConf]

*Please make changes if possible

REGEX=(?P<USERID>.*?)\,\s?\w*?\,\s?\w*?\,\s?\w*?\,\s?\w*?\,(?P<NETWORKID>.*?)\,\s?.*?\,\s(?P<RTCODE>.*?)\,\s?
FORMAT=USERID::$1 NETWORKID::$2 RTCODE::$3
WRITE_META=TRUE

 

fields.conf

[USERID]

INDEXED=TRUE

[NETWORKID]

INDEXED=TRUE

[RTCODE]

INDEXED=TRUE

0 Karma
Get Updates on the Splunk Community!

Splunk Training for All: Meet Aspiring Cybersecurity Analyst, Marc Alicea

Splunk Education believes in the value of training and certification in today’s rapidly-changing data-driven ...

Investigate Security and Threat Detection with VirusTotal and Splunk Integration

As security threats and their complexities surge, security analysts deal with increased challenges and ...

Observability Highlights | January 2023 Newsletter

 January 2023New Product Releases Splunk Network Explorer for Infrastructure MonitoringSplunk unveils Network ...