Getting Data In

How to enrich GeoIP at index time?

BeefSupreme
New Member

I am sure this is a pretty common use case, mainly because IP addresses move, the data is not static so for security retro hunts etc or even just searching a few days of data, the Geo data needs to be static in the data and can't be a search lookup.  Technically i can't even think of a use case where you would ever want Geo data to be a search lookup but I am sure there are some use cases out there. 

Elasticsearch has a couple options to do this, IE ingest nodes or logstash so I am sure a millions people are doing this in Splunk. If someone could point me at the documentation I would appreciate it.

Closest thing I could find is ingest time eval but not sure how that does geoip enrichment

Labels (1)
Tags (2)
0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

There is an option for lookup at index time (https://docs.splunk.com/Documentation/Splunk/8.2.4/Data/IngestLookups) but the documentation says it supports only CSV lookups.

VatsalJagani_0-1644473303606.png

 

I would suggest giving it a try with geo lookup directly and writing a scripted lookup (write a simple python script to perform the geo lookup) to see if that will work that is the best option you have over other options like DataModel and all which will take lots of resources.

 

If that does not work then I would suggest a middle ground between log source and Splunk.

0 Karma

moliminous
Path Finder

The csv lookup at index time is only to ingest  a csv and would not help in this case. 

The iplocation command in SPL is actually a scripted lookup, but just like your scripted lookup suggestion, would still be at search time and would not help in this case. 

0 Karma

moliminous
Path Finder

Splunk only offers geoip via the command iplocation at search time.
You could add the data using a 3rd part product before Splunk ingests it, but as far as Splunk doing that, the closest options you have (that I'm aware of) are:

  • Custom Accelerated Data Model
  • Modify Existing Accelerated Data Models to store those fields
  • Create a lookup table - probably using KVSTORE due to high number of records

It depends on your intended use cases for it.
For the use cases you mentioned, it sounds like it would be used more for investigations in the original logs, in which case you'd have to tie it to time.

For that purpose I would recommend using an Accelerated Data Model, though it wouldn't contain the raw logs.

If you really need it in the raw logs, I would have either Logstash or CRIBL enrich the raw logs before ingest to Splunk or off to raw log storage depending on your needs.

 

0 Karma
Get Updates on the Splunk Community!

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...

Observability protocols to know about

Observability protocols define the specifications or formats for collecting, encoding, transporting, and ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...