Knowledge Management

How do I prevent the collect command from flattening multivalue fields when writing into a summary index?

gesman
Communicator

I want to write transactions with full list of pages accessed into summary index in this manner:

... | transaction ip maxpause=15m mvlist=page | fields _time, ip, page | fields - _raw | collect index=my_summary

But the resulting summary index contains the 'page' field in flattened format, no longer multivalue.

Is this documented behavior? Can I force my summary index to keep fields in multivalue format, or do I need to do makemv every time I want to search my summary index?

1 Solution

gesman
Communicator

One of the working solution is to add this to ./etc/system/local/fields.conf:

[ips]
TOKENIZER = ([^\|]+)

[uas]
TOKENIZER = ([^\|]+)

[usernames]
TOKENIZER = ([^\|]+)

This would do the same job as this: ... | makemv delim="|" ips | makemv delim="|" uas | makemv delim="|" usernames ...
(doesn't seems to require using of MV_ADD in transforms)

View solution in original post

gesman
Communicator

One of the working solution is to add this to ./etc/system/local/fields.conf:

[ips]
TOKENIZER = ([^\|]+)

[uas]
TOKENIZER = ([^\|]+)

[usernames]
TOKENIZER = ([^\|]+)

This would do the same job as this: ... | makemv delim="|" ips | makemv delim="|" uas | makemv delim="|" usernames ...
(doesn't seems to require using of MV_ADD in transforms)

martin_mueller
SplunkTrust
SplunkTrust

Nitpicking point: If you're searching for johnsmith as a username value, searching for username=*johnsmith* will lead to tears if you have a johnsmithy...

Usually, doing field=value in a search will be translated to "value is in field" if field is a multivalue field, so make sure it really doesn't work for you. Sample search:

| stats count | eval foo = "a b c" | makemv foo | search foo="b"

That'll keep the one row, because foo contains a value of b.

0 Karma

gesman
Communicator

To note: there is actually benefit of having multivalues flattened and separated by some character.
"Flattened" values (say 'usernames') is searchable via index=logs usernames=*johnsmith* | ... query vs. multivalues are not.

So in above case if I'd need to find only events where one of the username is (or contains) 'johnsmith' - that would work nicely and reduce number of events before pipe.

If usernames would be stored in multivalued format - we'd need to use slower logic to either flatten usernames first or use functions like mvfilter to search everything.

0 Karma

fdi01
Motivator

try with table command.

...|table _time   ip page | fields - _raw | collect index=my_summar
0 Karma

gesman
Communicator

Same result. Multivalues flattened to single string.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

One way to tackle this could be to un-mv your field before collecting, adding a delimiter between the values. Using that delimiter you could then set up field extractions with MV_ADD to avoid doing the mv dance in the search itself.

martin_mueller
SplunkTrust
SplunkTrust

Well, the idea is to do complicated stuff once - when collecting - and do simple stuff many times - when searching.

0 Karma

gesman
Communicator

Right now I ending up with this:
index=my_summary | makemv delim="|" ips | makemv delim="|" uas | makemv delim="|" usernames ...
to recreated multivalue fields in a proper manner. Which doesn't seems to impact performance much, so temporarily I'm happy with this.

Thanks for the tip though - good to be aware of alternatives.

0 Karma

gesman
Communicator

Hi Martin,
with MV_ADD approach what else do i need to do to make it happen automagically?
I've looked into DELIM param but still not sure if it applies to my case, or whether I need any other params to customize?

0 Karma

stephane_cyrill
Builder

Hi, Try using mvlist=t

mvlist =< bool > |
Description: Flag controlling
whether the multivalued fields of
the transaction are (mvlist = t) a
list of the original events ordered
in arrival order or (mvlist = f ) a
set of unique field values ordered
lexigraphically . If a comma / space
delimited list of fields is provided
only those fields are rendered as
lists . Defaults to f .

http://docs.splunk.com/Documentation/Splunk/latest/
SearchReference/Transaction

0 Karma

gesman
Communicator

Thanks for your effort, but my question is about multivalue field losing it's format when transferred into summary index, and not about the way transaction creates these fields.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...