Knowledge Management

How do I prevent the collect command from flattening multivalue fields when writing into a summary index?

Communicator

I want to write transactions with full list of pages accessed into summary index in this manner:

... | transaction ip maxpause=15m mvlist=page | fields _time, ip, page | fields - _raw | collect index=my_summary

But the resulting summary index contains the 'page' field in flattened format, no longer multivalue.

Is this documented behavior? Can I force my summary index to keep fields in multivalue format, or do I need to do makemv every time I want to search my summary index?

1 Solution

Communicator

One of the working solution is to add this to ./etc/system/local/fields.conf:

[ips]
TOKENIZER = ([^\|]+)

[uas]
TOKENIZER = ([^\|]+)

[usernames]
TOKENIZER = ([^\|]+)

This would do the same job as this: ... | makemv delim="|" ips | makemv delim="|" uas | makemv delim="|" usernames ...
(doesn't seems to require using of MV_ADD in transforms)

View solution in original post

Communicator

One of the working solution is to add this to ./etc/system/local/fields.conf:

[ips]
TOKENIZER = ([^\|]+)

[uas]
TOKENIZER = ([^\|]+)

[usernames]
TOKENIZER = ([^\|]+)

This would do the same job as this: ... | makemv delim="|" ips | makemv delim="|" uas | makemv delim="|" usernames ...
(doesn't seems to require using of MV_ADD in transforms)

View solution in original post

SplunkTrust
SplunkTrust

Nitpicking point: If you're searching for johnsmith as a username value, searching for username=*johnsmith* will lead to tears if you have a johnsmithy...

Usually, doing field=value in a search will be translated to "value is in field" if field is a multivalue field, so make sure it really doesn't work for you. Sample search:

| stats count | eval foo = "a b c" | makemv foo | search foo="b"

That'll keep the one row, because foo contains a value of b.

0 Karma

Communicator

To note: there is actually benefit of having multivalues flattened and separated by some character.
"Flattened" values (say 'usernames') is searchable via index=logs usernames=*johnsmith* | ... query vs. multivalues are not.

So in above case if I'd need to find only events where one of the username is (or contains) 'johnsmith' - that would work nicely and reduce number of events before pipe.

If usernames would be stored in multivalued format - we'd need to use slower logic to either flatten usernames first or use functions like mvfilter to search everything.

0 Karma

Motivator

try with table command.

...|table _time   ip page | fields - _raw | collect index=my_summar
0 Karma

Communicator

Same result. Multivalues flattened to single string.

0 Karma

SplunkTrust
SplunkTrust

One way to tackle this could be to un-mv your field before collecting, adding a delimiter between the values. Using that delimiter you could then set up field extractions with MV_ADD to avoid doing the mv dance in the search itself.

SplunkTrust
SplunkTrust

Well, the idea is to do complicated stuff once - when collecting - and do simple stuff many times - when searching.

0 Karma

Communicator

Right now I ending up with this:
index=my_summary | makemv delim="|" ips | makemv delim="|" uas | makemv delim="|" usernames ...
to recreated multivalue fields in a proper manner. Which doesn't seems to impact performance much, so temporarily I'm happy with this.

Thanks for the tip though - good to be aware of alternatives.

0 Karma

Communicator

Hi Martin,
with MV_ADD approach what else do i need to do to make it happen automagically?
I've looked into DELIM param but still not sure if it applies to my case, or whether I need any other params to customize?

0 Karma

Hi, Try using mvlist=t

mvlist =< bool > |
Description: Flag controlling
whether the multivalued fields of
the transaction are (mvlist = t) a
list of the original events ordered
in arrival order or (mvlist = f ) a
set of unique field values ordered
lexigraphically . If a comma / space
delimited list of fields is provided
only those fields are rendered as
lists . Defaults to f .

http://docs.splunk.com/Documentation/Splunk/latest/
SearchReference/Transaction

0 Karma

Communicator

Thanks for your effort, but my question is about multivalue field losing it's format when transferred into summary index, and not about the way transaction creates these fields.

0 Karma