Getting Data In

Manipulating data before indexing

omerl
Path Finder

I have multiple forwarders (heavy and universal) and I want to manipulate the data they send to my indexers.
For each event I want to add a field, which the value is based on the event content and other information.

It is possible for me to add this field on search, but I prefer to do it before indexing the event and make the search easier.

Is it possible to do so?

nick405060
Motivator

Am also interested in doing this. My data comes in as mv data with the values being duplicated twice. In all my dashboards and searches, I mvexpand and dedup out each time to fix the problem, but that is rather messay/unelegant, especially with 20+ columns. (Technically 20 mvexpands makes the total number of rows num_rows*(2^20) before you dedup.... so you have to dedup every like 5 mvexpands to not get bogged down)

If I could just clean the data as it comes in.... that would make my life so much easier

<<<<< Additional tags: Duo app, Duo add-on >>>>>

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi omerl,
I see three ways to do this:

  • a calculated field if the new field is a manipulation of a field already present in an event;
  • a lookup to enrich an event with data from internal or external source, this is applicable when you have static or not much dynamic data;
  • a join or (better) a stats count from both the sources.

I think that probably the second and the third are the solution for your problem.

If your data aren't so dynamic, you can take information from the second place (DB) scheduling a search with an outputlookup command and storing them in a lookup, then you can use them in a manual or automatic lookup (use automatic lookup is like to have the additional fields in each event that match the condition).
I don't like automatic lookups because there's the risk to loose control in search debugging, but it's an own way to write searches!

If instead you have dynamic data, you can use join command (I don't like it because it's slow) or better stats from both the sources, e.g. to have the count of events in index A with the name of the user from index B, you can write something like this:

index=indexA OR index=IndexB
| stats values(user) AS user count(eval(index="IndexA") AS Events BY host

(beware to the quotes in the eval)
If you have data from a DB you can schedule extraction from the DB using DB-Connect App.

Bye.
Giuseppe

0 Karma

omerl
Path Finder

Thanks, I know it's possible in search time, but I want the searches to be easier and faster, by already indexing the data with this field.

gcusello
SplunkTrust
SplunkTrust

I don't think, but you could use a summary index creating a new index as you want, or use a Data Model.
Bye.
Giuseppe

0 Karma

horsefez
Motivator

Hi omerl,

doesn't look good when it comes to indexing time.
Look here: https://answers.splunk.com/answers/442921/is-there-any-way-to-do-calculated-fields-before-se.html

0 Karma

Jeremiah
Motivator

Can you provide an example? It depends on the field value you're trying to add. You can extract an indexed field from an event with a regex. But you can't perform an index time lookup of that field and attach another field like you can at search time.

0 Karma

omerl
Path Finder

Lets say I have an event from host A.
I have, on another place (DB), the information about who is the user that connected to host A
I want to add to this event a field with the user who was connected to the host in the event.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

 (view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...