Getting Data In

Is it possible to preserve sourcetype, host, and source when using the collect command?

Path Finder

We have an index with access logs from multiple hosts and systems with different sourcetypes.
When I trying to add information from a dynamic lookup to events and save them in a summary index with the collect command, I can't save original information about source, sourcetype, and host because collect command arguments take values as text, but not field values.

For example, search:

 index=access sourcetype=*_type_access | 
 lookup xxx AS yyy |
 collect index=enriched_access sourcetype=sourcetype

saves results with sourcetype equal "sourcetype", but not the original sourcetype.
When I try to rename sourcetype, result is the same.

Where a, I going wrong?

Path Finder

Have a sourcetype value in anohther field like "origSourceType" and push this value in summary index. From summary index you can search based on origSourceType field.

0 Karma

SplunkTrust
SplunkTrust

Totally different approach: Keep the lookup data in the lookup, enrich at search time, skip indexing things twice through collect?

What you're doing feels quite wrong, considering collect would index _raw while the lookup is just adding fields - have you checked that those lookup output fields are actually retained in the second index?

That being said, https://answers.splunk.com/answers/88926/modify-raw-collect-into-second-index-how-to-best-retain-hos...

0 Karma

Motivator

Since there are perhaps several sourcetypes I would try the map command

| metasearch index=access sourcetype=*_type_access | stats count by sourcetype | map [ search index=access sourcetype=$sourcetype$ | lookup xxx AS yyy | collect index=enriched_access sourcetype=$sourcetype$ ]

At least that works in theory; I haven't tested it. It should work though. I used the metasearch command for speed and the stats command is just to get the unique list of sourcetypes. Tstats might be a hair faster still but I'm not spun up on that one /shrug. There are folks who are kinda anti map but it is a tool in the tool chest. What you are doing is for each result line from your initial search is passing the sourcetype as a token to the included search.

0 Karma

Path Finder

I tried this out with "host=$host$" in my collect statement and no-dice.

Any other ideas?

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!