Splunk Search

extract key-value pairs that match or contain value, where keys vary across indices

alancalvitti
Path Finder

What's a scalable to extract key-value pairs where the value matches via exact or substring match but the field is not known ahead of time, and could be in _raw only?

Eg, search for the string "alan", which may be associated to fields as follows:

index=indexA  user=alan

index=indexB username=alan

index=indexC loginId=corp\alan

index=indexD   _raw=<unstructured text, alan : user>

 

Labels (2)
0 Karma

maciep
Champion

Ah ok.  I'm not sure how expensive it is, but i'm wondering if fieldsummary might be helpful here?  Something like this maybe?  I'm sure the match would need to be a bit more complicated....

index IN (indexA,indexB,indexC) alan
| fieldsummary
| where match(values,"(?i)alan")

 

 

 

0 Karma

alancalvitti
Path Finder

Thanks, that is close to what is needed, but the fields for the various indexes seem mixed together.

Is it possible to add a column to show the index (and other fields eg sourcetype) next to the field?

0 Karma

maciep
Champion

maybe someone else out here will have a better answer, but i can see maybe getting the index by using the map command....but if you don't know the sources/sourcetypes/etc, then that could get trickier.  But here is maybe the next step for your path.

This will create an event for each index and then use the map function to the same search over each, adding a field for index to the results.  I'm sure there is a way with tstats or eventcount to build the indexes/sourcetype you want and just expanding this.  Although, not sure how expensive this becomes either.

 

| stats count
| eval index=split("indexA,indexB,indexC",",")
| mvexpand index
| map search="search index=$index$ | fieldsummary | eval index=\"$index$\""

 

0 Karma

alancalvitti
Path Finder

I'm not getting the same results w/ this version of the query.   Although the index does show up, now the counts are way higher and many or most values seem not to match the user named before "| stats count"

0 Karma

maciep
Champion

well i did forget to the put the "alan" keyword back in the search - did you put it back in there?  If not, try adding it in the search in the map function.  And also i guess the where clause as well.  all of that needs to back into that search.....

0 Karma

alancalvitti
Path Finder

i did add the user value at the beginning, but can't get it to work when reinserting the where clause. I tried putting it after the fieldsummary and after the last eval, - getting no results.

0 Karma

maciep
Champion

ok...so I guess you didn't get that working then?  Here is an example of how maybe you could use tstats to get the index, source and sourcetypes.....just not sure if it will be too much for the map function.  But also including the where clause that you need - same as before, just have to escape the quotes.  But I'm still not sure what you tried that didn't work.....

Not sure how many combinations of index/source/sourcetype you have, so you may need to bump up the maxsearches as well.  And if it is too much, I think you should know how to go back just using the index.

| tstats count where index IN (indexA,indexB,indexC) by index source, sourcetype
| map maxsearches=100 search="search index=$index$ source=$source$ sourcetype=$sourcetype$ alan | fieldsummary | where match(values,\"(?i)alan\") | eval index=\"$index$\",source=\"$source$\",sourcetype=\"$sourcetype$\""

 

0 Karma

alancalvitti
Path Finder

Thanks, - this tstats query seems to return results that contain the desired fields and values, but it also includes "false positive" values that don't include the query account. 

For example, (and I'm having to copy/paste made up values here...) one statistic might say:

field count distinct_count index is_exact max mean min numeric_count sourcetype stdev values

event.Account_Name118051ad_index1   0ADindex [{"value":"alan","count":8874},{"value":"bob","count":2931},]

 

 There are typically other values (eg bob) other than the input query argument. For example, for the field avent.Account_Name , the "count" of 11805 is the sum of the individual "value" counts (alan 8874, bob 2931) in the "values" column.  How to interpret these results? Is it that alan matches 2931 events in which it shows up in a different field than event.Account_Name?

Also, what if the match is on the _raw event but not extracted in a kv field - would _raw be among the fields returned?

 

 

0 Karma

maciep
Champion

friday firedrills over here.....can you just toss the where after the map command?  or it might helpful if you can share what you tried that failed. 

0 Karma

maciep
Champion

Obviously, you could just add the keyword "alan" to your search to get the results you may want....but those may contain results you're not interested in.

i don't know if you would consider it scalable, but typically the approach i think is to normalize the data as you onboard it.  That could be with the Common Information Model app or just on your own.  

For example, in the scenario above, for each of the sourcetypes, create a new field called username - either alias existing fields to username or extract/eval/etc username in the case that it's just in the raw data.  Once you have a username field across all sourcetypes, then just add username="alan" to your search.

If you find yourself onboarding a new sourcetype that contains a user that you want to search in a similar way, ensure a username field exists or is created.

0 Karma

alancalvitti
Path Finder

My fault for not being specific enough.:

First, We cannot normalize the data before onboarding as it reflects hundreds of sources including 3rd party vendors and other constraints. 

Second, I'm only looking to extract the  fields (keys) that match, no other fields (unless added optionally) - ideally avoiding rex as the index volumes are large and this query needs to be repeated 10k-100k times.

 

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...