Getting Data In

Ingest-time lookup

tah7004
Path Finder

Hello, has anyone worked with ingest-time lookup and familiar with it?

https://docs.splunk.com/Documentation/Splunk/8.1.1/Data/IngestLookups

I'm confused on where the lookup is supposed to be.  Since this is an ingest-time process, I would think it would need to be in the indexers, but the doc isn't too clear on it.

Also regarding the actual stanza syntax, I'm trying to see if this works:

Lookup command:

lookup test field1 AS new_field1 field2 OUTPUT field3

[lookup-extract]
INGEST_EVAL= field3=json_extract(lookup("test", json_object("field1", new_field, "field2", field2), json_array("field3")),"field3")

Any help would be appreciated.

Labels (1)
0 Karma

victor1004k
Loves-to-Learn Everything

@tah7004  To use ingest-time lookup, the field you want to apply must be specified as an indexed-field. You can apply it successfully by configuring the configuration file as follows.

1. $SPLUNK_HOME/etc/apps/myapp/lookups/test.csv

field1,field2,field3
value1,value2,value3


2. $SPLUNK_HOME/etc/apps/myapp/local/props.conf

[test_ingest_lookup]
DATETIME_CONFIG =
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
category = Custom
pulldown_type = true
TRANSFORMS-ingest_time_lookup = regex_extract_av_pairs, lookup_extract

 

3. $SPLUNK_HOME/etc/apps/myapp/local/transforms.conf

[regex_extract_av_pairs]
SOURCE_KEY = _raw
REGEX = \s([a-zA-Z][a-zA-Z0-9-]+)=([^\s"',]+)
REPEAT_MATCH = true
FORMAT = $1::"$2"
WRITE_META = true

[lookup_extract]
INGEST_EVAL= field3=json_extract(lookup("test.csv", json_object("field1", new_field, "field2", field2), json_array("field3")),"field3")

 

You can refer to another solution using INDEXED_EXTRACTIONS=json in the link below.

- Splunkデータ取り込み時の絞り込み方法(リストマッチ)
https://qiita.com/chobiyu/items/aec5ef3a75a8bab96546

Tags (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

Not necessarily. You can use an output of a function operating on _raw as argument to the lookup() function.

0 Karma

victor1004k
Loves-to-Learn Everything

@tah7004  OK! Bellow is the answer you talk about.

1.  $SPLUNK_HOME/etc/apps/myapp/local/props.conf

TRANSFORMS-ingest_time_lookup = lookup_extract

 

2.  $SPLUNK_HOME/etc/apps/myapp/local/transforms.conf

[lookup_extract]
INGEST_EVAL= field1=replace(_raw, ".*field1=([0-9A-Za-Z.]+).*", "\1"), field2=replace(_raw, ".*field2=([0-9A-Za-Z.]+).*", "\1"), field3=json_extract(lookup("test.csv", json_object("field1", new_field, "field2", field2), json_array("field3")),"field3")

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust
One another comment. Don't use .../etc/system/local for (almost) anything! Create your own app and use it to store your conf files. In that way everything is working much better in long run.
0 Karma

tah7004
Path Finder

Some updates.  

I was able to get the lookup() function working via test searches.  My original lookup didn't work because it was too big at 1.5 G and I had to increase the max_mem_bytes in limits.conf.

Now, for the actual ingest-time lookup, I'm still not able to get it working with a test lookup file I created.  I think my initial struggles were due to some of the fields used for lookup are not indexed fields.  

I converted those fields as indexed fields using ingest_eval and also increased the ingest_max_mem_bytes as suggested by the doc.

Is there specific internal logs to watch out for as to why the ingest-time lookup failed?

I'm not having any luck digging through the _internal logs.

0 Karma

jpathak_splunk
Splunk Employee
Splunk Employee

You should be able to see relevant messages in splunkd.log which should be visible in _internal. As you pointed out, ingest time lookups depend on fields being present when events are retrieved from the index, are you sure those fields are index time fields ?

0 Karma

tah7004
Path Finder

Does the lookup have to be in $SPLUNK_HOME/etc/system/lookups?

I tried putting the lookup file and the props/transforms. conf in the indexers as an app, but that didn't work for me.

I also tried the lookup() function as an eval in test searches, but that isn't working.  I was following the lookup function guide here:

https://docs.splunk.com/Documentation/Splunk/8.1.1/SearchReference/ConditionalFunctions

 

0 Karma

The_Simko
Path Finder

Ingest-time lookups have to be on whatever server is first performing the parsing phase.  Normally that will be your indexer, but could also be on a heavy forwarder (or other Splunk Enterprise if they are where the data is being ingested).  
  
The Indexer (or other) will use their own knowledge objects, so get the lookup, props, and transforms on the server doing parsing.  

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...