Splunk Search

lookup data permanent?

pbenner
Explorer

I have created a csv lookup table and have successfully loaded it into splunk and used it in a search command sourcetype="access_log-2" | lookup title_lookup isbn_tag OUTPUTNEW title, pub

now when I go off and use other search commands that dont use this lookup, the data that was looked up now longer is available in the searches. I have to re-issue the lookup command to have the new fields available.

Do the new fields get added to the events permanently so I dont need to issue the lookup command on every search command?

I wanted to be able to enrich the event log with one days worth of lookup data, then create a new lookup table the next day and enrich the new events, and so on, having this data permanently part of the events and the old lookup tables would not be needed anymore.

Is this possible?

Another way I might do this is to nightly append onto the web log events the new fields that are retrieved from remote DB's. Done in Perl outside of splunk. Since this web data is ingested into splunk in real time, would a search on the web logs see the new fields that were added after the ingestion?

Tags (1)
2 Solutions

Lowell
Super Champion

There seem to be some misconceptions about how splunk works. Let me see if I can clear anything of this up for you:

  1. Once data is indexed by splunk it cannot be modified. So the lookup command doesn't change your indexed data at all, it simply augments your events with additional fields at search time.
  2. Using the "lookup" command only takes effect for your current search. (And technically, only for search commands that appear after the lookup command.) This how all search commands work.
  3. It sounds like you want to use a date-effective lookup table.

If you want to setup automatic lookups based on your sourcetype, then you have to add a LOOKUP entry in your props.conf file. You can also do this via the web user interface; you can navigate to Manager » Lookups » Automatic lookups

Your perl-based approach would likely accomplish what you are looking for too, however once you get more familiar with what all splunk can do, you probably will find such methods unnecessary. (I know I've had quite a number of ad-hoc scripts I've been able to decommission since we started using splunk.)


Update: Just a couple other things to think about. I don't fully understand everything your are trying to do, but splunk is really flexible and gives you a bunch of options and (based on your question) you seem like the person clever enough to whip up your own solution to problems.... So, if you haven't considered either of these two other options, I'd suggest that you take a look and see if either of these give you a good starting point:

  1. Scripted inputs -- You could use a perl script to read events from your database and simple write out textual events to stdout which is then indexed by splunk. This is very legit way of getting data in. (You don't have to pump data into your existing web access log. In fact putting into a different source/sourcetype would be much better from a splunk management perspective and may help with search efficiency.)
  2. Scripted lookups -- It's possible to write a python script to call your database and pull back your lookup values on the fly. (Sorry, no perl support for this at this time. You could of course have a python script that simply run a perl script.) The script itself operates with inputing and output CSV content. There are some posts on this site with some examples to check out: I want sample code to connect to Oracle database, and lookup a table at search time

View solution in original post

the_wolverine
Champion

As mentioned by Lowell, you can configure AUTOMATIC lookups which would allow your custom lookup fields to be appended to your data automatically.

The props.conf file would be configured something like the following:

[access_log-2]
lookup-title_lookup = title_lookup isbn_tag AS ISBN OUTPUTNEW title pub

(Or, use the UI to complete the task - a restart is not necessary)

Once props.conf (and assuming transforms.conf) is properly configured your lookup will be automatic for your sourcetype=access_log-2.

If you can refresh your lookup table via external script that will take care of making sure the table data is current. You may want to configure a scheduled search that triggers your external script on a daily basis to take care of this automatically.

View solution in original post

0 Karma

the_wolverine
Champion

As mentioned by Lowell, you can configure AUTOMATIC lookups which would allow your custom lookup fields to be appended to your data automatically.

The props.conf file would be configured something like the following:

[access_log-2]
lookup-title_lookup = title_lookup isbn_tag AS ISBN OUTPUTNEW title pub

(Or, use the UI to complete the task - a restart is not necessary)

Once props.conf (and assuming transforms.conf) is properly configured your lookup will be automatic for your sourcetype=access_log-2.

If you can refresh your lookup table via external script that will take care of making sure the table data is current. You may want to configure a scheduled search that triggers your external script on a daily basis to take care of this automatically.

0 Karma

Lowell
Super Champion

There seem to be some misconceptions about how splunk works. Let me see if I can clear anything of this up for you:

  1. Once data is indexed by splunk it cannot be modified. So the lookup command doesn't change your indexed data at all, it simply augments your events with additional fields at search time.
  2. Using the "lookup" command only takes effect for your current search. (And technically, only for search commands that appear after the lookup command.) This how all search commands work.
  3. It sounds like you want to use a date-effective lookup table.

If you want to setup automatic lookups based on your sourcetype, then you have to add a LOOKUP entry in your props.conf file. You can also do this via the web user interface; you can navigate to Manager » Lookups » Automatic lookups

Your perl-based approach would likely accomplish what you are looking for too, however once you get more familiar with what all splunk can do, you probably will find such methods unnecessary. (I know I've had quite a number of ad-hoc scripts I've been able to decommission since we started using splunk.)


Update: Just a couple other things to think about. I don't fully understand everything your are trying to do, but splunk is really flexible and gives you a bunch of options and (based on your question) you seem like the person clever enough to whip up your own solution to problems.... So, if you haven't considered either of these two other options, I'd suggest that you take a look and see if either of these give you a good starting point:

  1. Scripted inputs -- You could use a perl script to read events from your database and simple write out textual events to stdout which is then indexed by splunk. This is very legit way of getting data in. (You don't have to pump data into your existing web access log. In fact putting into a different source/sourcetype would be much better from a splunk management perspective and may help with search efficiency.)
  2. Scripted lookups -- It's possible to write a python script to call your database and pull back your lookup values on the fly. (Sorry, no perl support for this at this time. You could of course have a python script that simply run a perl script.) The script itself operates with inputing and output CSV content. There are some posts on this site with some examples to check out: I want sample code to connect to Oracle database, and lookup a table at search time

bhawkins1
Communicator

"It sounds like you want to use a date-effective lookup table." Incidentally, googling "date-effective lookup table" splunk only returns one result - a link to this answers page. For clarity, it might be worth re-wording this as a lookuptable that indexes on "Effective Date"

0 Karma

Lowell
Super Champion

In terms of managing external lookup files, you may find this post useful: http://answers.splunk.com/questions/3769/does-outputlookup-append-or-overwrite (but you may have too many events to make this option practical, idduno)

0 Karma

Lowell
Super Champion

I added some additional thoughts to my post, but they may not be what you need. Not sure. I do want to point out that you can update your lookup .csv file anytime you want. You don't have to restart splunk to get the updates. Of course if your are looking at adding millions of new records to your lookup table each day,then you probably need to also consider an expiration policy which you would probably want to manage externally as well.

0 Karma

pbenner
Explorer

Thanks. Since I will have a new 5 million event csv file every day generated from the previous days web events, I guess loading that into splunk and adding it to the config file and restarting splunk
is not an effient way to do this.
Real time lookups are not practical either, would take forever to retrieve all the data from Oracle.

Is there a way to manage the addition of a lookup file dayly without manual intervention?

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...