I want to write transactions with full list of pages accessed into summary index in this manner:
... | transaction ip maxpause=15m mvlist=page | fields _time, ip, page | fields - _raw | collect index=my_summary
But the resulting summary index contains the 'page' field in flattened format, no longer multivalue.
Is this documented behavior? Can I force my summary index to keep fields in multivalue format, or do I need to do makemv
every time I want to search my summary index?
One of the working solution is to add this to ./etc/system/local/fields.conf
:
[ips]
TOKENIZER = ([^\|]+)
[uas]
TOKENIZER = ([^\|]+)
[usernames]
TOKENIZER = ([^\|]+)
This would do the same job as this: ... | makemv delim="|" ips | makemv delim="|" uas | makemv delim="|" usernames ...
(doesn't seems to require using of MV_ADD
in transforms)
One of the working solution is to add this to ./etc/system/local/fields.conf
:
[ips]
TOKENIZER = ([^\|]+)
[uas]
TOKENIZER = ([^\|]+)
[usernames]
TOKENIZER = ([^\|]+)
This would do the same job as this: ... | makemv delim="|" ips | makemv delim="|" uas | makemv delim="|" usernames ...
(doesn't seems to require using of MV_ADD
in transforms)
Nitpicking point: If you're searching for johnsmith
as a username
value, searching for username=*johnsmith*
will lead to tears if you have a johnsmithy
...
Usually, doing field=value
in a search will be translated to "value is in field" if field is a multivalue field, so make sure it really doesn't work for you. Sample search:
| stats count | eval foo = "a b c" | makemv foo | search foo="b"
That'll keep the one row, because foo contains a value of b.
To note: there is actually benefit of having multivalues flattened and separated by some character.
"Flattened" values (say 'usernames') is searchable via index=logs usernames=*johnsmith* | ...
query vs. multivalues are not.
So in above case if I'd need to find only events where one of the username is (or contains) 'johnsmith' - that would work nicely and reduce number of events before pipe.
If usernames would be stored in multivalued format - we'd need to use slower logic to either flatten usernames first or use functions like mvfilter
to search everything.
try with table
command.
...|table _time ip page | fields - _raw | collect index=my_summar
Same result. Multivalues flattened to single string.
One way to tackle this could be to un-mv your field before collecting, adding a delimiter between the values. Using that delimiter you could then set up field extractions with MV_ADD
to avoid doing the mv dance in the search itself.
Well, the idea is to do complicated stuff once - when collecting - and do simple stuff many times - when searching.
Right now I ending up with this:
index=my_summary | makemv delim="|" ips | makemv delim="|" uas | makemv delim="|" usernames ...
to recreated multivalue fields in a proper manner. Which doesn't seems to impact performance much, so temporarily I'm happy with this.
Thanks for the tip though - good to be aware of alternatives.
Hi Martin,
with MV_ADD
approach what else do i need to do to make it happen automagically?
I've looked into DELIM
param but still not sure if it applies to my case, or whether I need any other params to customize?
Hi, Try using mvlist=t
mvlist =< bool > |
Description: Flag controlling
whether the multivalued fields of
the transaction are (mvlist = t) a
list of the original events ordered
in arrival order or (mvlist = f ) a
set of unique field values ordered
lexigraphically . If a comma / space
delimited list of fields is provided
only those fields are rendered as
lists . Defaults to f .
http://docs.splunk.com/Documentation/Splunk/latest/
SearchReference/Transaction
Thanks for your effort, but my question is about multivalue field losing it's format when transferred into summary index, and not about the way transaction creates these fields.