What's a scalable to extract key-value pairs where the value matches via exact or substring match but the field is not known ahead of time, and could be in _raw only?
Eg, search for the string "alan", which may be associated to fields as follows:
index=indexA user=alan
index=indexB username=alan
index=indexC loginId=corp\alan
index=indexD _raw=<unstructured text, alan : user>
Ah ok. I'm not sure how expensive it is, but i'm wondering if fieldsummary might be helpful here? Something like this maybe? I'm sure the match would need to be a bit more complicated....
index IN (indexA,indexB,indexC) alan
| fieldsummary
| where match(values,"(?i)alan")
Thanks, that is close to what is needed, but the fields for the various indexes seem mixed together.
Is it possible to add a column to show the index (and other fields eg sourcetype) next to the field?
maybe someone else out here will have a better answer, but i can see maybe getting the index by using the map command....but if you don't know the sources/sourcetypes/etc, then that could get trickier. But here is maybe the next step for your path.
This will create an event for each index and then use the map function to the same search over each, adding a field for index to the results. I'm sure there is a way with tstats or eventcount to build the indexes/sourcetype you want and just expanding this. Although, not sure how expensive this becomes either.
| stats count
| eval index=split("indexA,indexB,indexC",",")
| mvexpand index
| map search="search index=$index$ | fieldsummary | eval index=\"$index$\""
I'm not getting the same results w/ this version of the query. Although the index does show up, now the counts are way higher and many or most values seem not to match the user named before "| stats count"
well i did forget to the put the "alan" keyword back in the search - did you put it back in there? If not, try adding it in the search in the map function. And also i guess the where clause as well. all of that needs to back into that search.....
i did add the user value at the beginning, but can't get it to work when reinserting the where clause. I tried putting it after the fieldsummary and after the last eval, - getting no results.
ok...so I guess you didn't get that working then? Here is an example of how maybe you could use tstats to get the index, source and sourcetypes.....just not sure if it will be too much for the map function. But also including the where clause that you need - same as before, just have to escape the quotes. But I'm still not sure what you tried that didn't work.....
Not sure how many combinations of index/source/sourcetype you have, so you may need to bump up the maxsearches as well. And if it is too much, I think you should know how to go back just using the index.
| tstats count where index IN (indexA,indexB,indexC) by index source, sourcetype
| map maxsearches=100 search="search index=$index$ source=$source$ sourcetype=$sourcetype$ alan | fieldsummary | where match(values,\"(?i)alan\") | eval index=\"$index$\",source=\"$source$\",sourcetype=\"$sourcetype$\""
Thanks, - this tstats query seems to return results that contain the desired fields and values, but it also includes "false positive" values that don't include the query account.
For example, (and I'm having to copy/paste made up values here...) one statistic might say:
field count distinct_count index is_exact max mean min numeric_count sourcetype stdev values
event.Account_Name | 11805 | 1 | ad_index | 1 | 0 | ADindex | [{"value":"alan","count":8874},{"value":"bob","count":2931},] |
There are typically other values (eg bob) other than the input query argument. For example, for the field avent.Account_Name , the "count" of 11805 is the sum of the individual "value" counts (alan 8874, bob 2931) in the "values" column. How to interpret these results? Is it that alan matches 2931 events in which it shows up in a different field than event.Account_Name?
Also, what if the match is on the _raw event but not extracted in a kv field - would _raw be among the fields returned?
friday firedrills over here.....can you just toss the where after the map command? or it might helpful if you can share what you tried that failed.
Obviously, you could just add the keyword "alan" to your search to get the results you may want....but those may contain results you're not interested in.
i don't know if you would consider it scalable, but typically the approach i think is to normalize the data as you onboard it. That could be with the Common Information Model app or just on your own.
For example, in the scenario above, for each of the sourcetypes, create a new field called username - either alias existing fields to username or extract/eval/etc username in the case that it's just in the raw data. Once you have a username field across all sourcetypes, then just add username="alan" to your search.
If you find yourself onboarding a new sourcetype that contains a user that you want to search in a similar way, ensure a username field exists or is created.
My fault for not being specific enough.:
First, We cannot normalize the data before onboarding as it reflects hundreds of sources including 3rd party vendors and other constraints.
Second, I'm only looking to extract the fields (keys) that match, no other fields (unless added optionally) - ideally avoiding rex as the index volumes are large and this query needs to be repeated 10k-100k times.