Getting Data In

Index-time extraction and non-indexed field

PickleRick
SplunkTrust
SplunkTrust

Hi there.

I was wondering...

All the docs and howtos regarding index-time extractions say that you need to set field to indexed in fields.conf. Fair enough - you want your field indexed, you make it an indexed field. That's understandable.

But what really happens if I do a index-time extraction (and/or an ingest-time eval) producing a new field but I don't set it as indexed one?

Does the indexer/HF simply parses the field, uses it internally in the parsing queue but doesn't pass it downstream into indexing? Or is it parsed down into indexing but ignored there? Or is it sent to indexing and indexed but if search-head(s) don't know it's indexed, it's not used in search? Or any other option?

Labels (3)
0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

Actually it's not set it in fields.conf. To get fields indexed you must use WRITE_META=true or DEST_KEY=_meta in transforms.conf. Fields.conf is used to tell search head that this field is indexed (or not, see Add an entry to fields.conf for the new field). Another way to told splunk SH that field is indexed is just use "field::value" (works without fields.conf) in SPL. You could look those by job inspector.

If you are doing "field extractions" without those meta fields, then it just wast your peers (or HF's) resources (unless you are needing those fields temporary in indexing phase). When it drops those unindexed fields is unknown to me.

r. Ismo

PickleRick
SplunkTrust
SplunkTrust

Ahh... I knew about the WRITE_META/DEST_KEY of course but wasn't aware that fields.conf is SH-only.

I've only done some index-time extractions in my test environment some time ago and completely forgot about the needed mechanics since then 😉 Thanks for clarification.

To provide some background - I'm simply considering the timestamp correction from https://community.splunk.com/t5/Knowledge-Management/Correcting-timestamp-with-ingest-time-eval/m-p/...

In order to perform an ingest-time eval I'll need to extract the timezone "specification" from the timestamp and based on that field I'll have to recalculate _time. That's pretty obvious. But I wouldn't want that "+120" or whatever value I get there to be indexed anywhere since there's no use for it.

I'll just have to test it. 🙂

0 Karma

isoutamo
SplunkTrust
SplunkTrust

My understanding is, when you are modifying _time with INGEST_EVAL it also modify that %z part and then there haven't been any +120 anymore in this field. No need to do anything else.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

No, I mean that in order to recalculate this "+120" to a proper offset to add/substract from the _time, I have to extract it to a field so I can use it in eval.

The timestamp is in a format that's mostly parseable automatically (up to the +XXX point) but it contains the timestamp in local timezone and an offset expressed in minutes (*facepalm*). So I can't simply use %Z or %z.

So I will parse the timestamp as UTC (set TZ=UTC) and then apply the appropriate connection by hand according to that "+XXX" format (just extract it, convert to number, multiply by 60 et voila ;-)) using INGEST_EVAL. The only issue is that I don't want to remember this extracted value anywhere.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Found it. I'll provide the solution for others if needed.

It seems I can either use WRITE_META which makes the transform index-time but writes the field as indexed, or not use WRITE_META which makes the extraction search-time and of no use to me.

So the trick is - to extract the field first and then forget it after use 🙂

props.conf:

[mysourcetype]
TRANSFORMS-fixtime = extract_timezone,fix_timezone,forget_timezone

 transforms.conf:

[extract_timezone]
REGEX = \d{14}\.\d{6}([+-]\d{3})
FORMAT = tzoffset::$1
WRITE_META = true

[fix_timezone]
INGEST_EVAL = _time=_time+(if(substr(tzoffset,1,1)="+",1,-1)*tonumber(substr(tzoffset,2,3))*60)

[forget_timezone]
INGEST_EVAL = tzoffset:=null()

 

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...