I've got a situation that I thought I understood but clearly don't. I have logs that look like this:
2021-11-22 14:00:00 Event=InventoryComplete ComputerName=Server1 ComputerName=Server2 ComputerName=ServerN
I thought that ComputerName would automatically be a multivalue field due to there being multiple copies of that Key=Value pair and I'd be able to search any of the values. And I thought there are instances where this works automatically, but it's not right now.
| search sourcetype=inventory_audit ComputerName=Server1 ```works```
| search sourcetype=inventory_audit ComputerName=Server2 ```no results```
| search sourcetype=inventory_audit "ComputerName=Server2" ``` forcing text search works```
Is there something I can do to make these events implicitly multivalue? Ideally for the entire sourcetype regardless of the specific field name, as this sourcetype covers a wide variety of audit logs with different object classes.
As you've discovered, Splunk does not default to using multi-value fields. Once a field has a value, it is not replaced or appended unless specifically requested.
To request it, define a transform like this:
[mytransform]
REGEX = (\S+)=(\S+)
FORMAT = $1::$2
REPEAT_MATCH = true
then invoke the transform in props.conf.
This may not solve your problem, however, since many SPL commands don't work with multi-value fields. You may have to modify your queries.
Thanks! As I thought about it further I must have been doing a search after something like...
| rex max_match=0 "ComputerName=(?<ComputerName>.+?)\b"
What is the $s in the FORMAT example you gave? Was that meant to be $2?
I should have also mentioned I have control of the scripts writing these logs and could write out the computer names in a denser format. I could also just generate more distinct events (repeat the event per computer name instead of shoving them all in one event) but I'm not sure what's worse; requiring the regex in the transform or generating more log data.
This gave me an idea too; perhaps I could call out the fields I want parsed as multivalue by giving them a suffix like ComputerName[]=Server1 ComputerName[]=Server2; unless there's a smarter way.
Yes, "$s" should have been "$2". Thanks for catching that. I've updated my answer.
If you have control over how the events are generated then I suggest generating them in a way that best fits how you plan to use the data (without painting yourself into a corner). I prefer to have one event represent one thing that happened in one place. If the thing happens in many places then many events would be generated.
Another option is to log the events in JSON format, which better handles multi-value fields.