Splunk Search

How to compact a list of repeating field values but maintain the order of events?


I have a transaction in which field mydata contains repeating values like ("xyz","ijk","ijk","abc","abc","abc","abc","abc","lmn","def","def"). I want to compact this list by representing repeating elements only once, but preserving the order in which each repetition occurs. In effect, this is the output of transaction mvlist=true. (Default transaction implies mvlist=false. The output is a compact, but unordered list.) Is there a list command to do this? The end goal is to illustrate a chain of events like "xyz=>ijk..=>abc..=>lmn=>def.." The closest discussion was http://answers.splunk.com/answers/95363/perform-transaction-for-only-repeating-values-of-field. But the answer there was partial, and does not apply to my use case.

(Updated) Sample data are like:

16-May-2014 00:50:10.386 type=9,mydata=2.10.8
16-May-2014 00:55:23.205 type=9,mydata=2.10.8
16-May-2014 00:59:39.760 type=9,mydata=2.10.8
16-May-2014 01:12:26.410 type=9,mydata=2.10.8
16-May-2014 01:19:55.528 type=9,mydata=2.10.8
16-May-2014 01:41:33.508 type=9,mydata=2.10.8
16-May-2014 01:43:54.872 type=9,mydata=2.10.8
16-May-2014 11:53:43.119 type=9,mydata=2.14.1
16-May-2014 11:53:44.121 type=15,mydata=2.10.8
16-May-2014 11:55:46.376 type=15,mydata=3.2.2
16-May-2014 11:57:09.548 type=15,mydata=3.2.2
16-May-2014 11:58:03.658 type=15,mydata=3.2.2
16-May-2014 11:59:03.782 type=15,mydata=3.2.2
16-May-2014 11:59:06.788 type=15,mydata=3.2.2
16-May-2014 11:59:45.870 type=15,mydata=3.2.2
16-May-2014 12:00:07.914 type=15,mydata=3.2.2
16-May-2014 12:01:25.073 type=15,mydata=2.10.8
16-May-2014 17:01:07.343 type=9,mydata=3.4.6001
16-May-2014 17:19:41.923 type=9,mydata=3.4.6001
16-May-2014 17:20:58.090 type=15,mydata=2.10.8
16-May-2014 17:21:32.159 type=15,mydata=2.10.8
16-May-2014 17:21:51.198 type=15,mydata=2.10.8
16-May-2014 19:48:41.102 type=9,mydata=3.4.6001
16-May-2014 19:49:15.172 type=9,mydata=3.4.6001
16-May-2014 20:35:44.316 type=9,mydata=3.4.6001
16-May-2014 21:15:31.373 type=9,mydata=3.4.6001

With help from Perl community, I came up with the following string method. Though usable, I feel it is lame to use string to compact a list, given that Splunk is list oriented.

 | transaction type mvlist=true
 | eval flatten=mvjoin(mydata,"=>")
 | eval compact=replace(flatten,"([^=]+)(?:=>\1)+","\1..")
 | stats max(eventcount) as count by compact type
    compact                 type    count
2.10.8..=>2.14.1=>3.4.6001..     9   14
2.10.8=>3.2.2..=>2.10.8..        15  12

(Update: max(eventcount) gives the correct count, not sum(eventcount).) Actual chain of events can easily be tested in shell:

$ for t in type=9, type=15,; do fgrep $t < mydata |cut -d\  -f3-|uniq -c; done
   7 type=9,mydata=2.10.8
   1 type=9,mydata=2.14.1
   6 type=9,mydata=3.4.6001
   1 type=15,mydata=2.10.8
   7 type=15,mydata=3.2.2
   4 type=15,mydata=2.10.8

Note: The original example used mvappend to simulate output from transaction, as listed below. (source=* can be a search that returns at least one event.) But this simulation apparently lacks some important aspect of a transaction.

 | eval mydata=mvappend("xyz","ijk","ijk","abc","abc","abc","abc","abc","lmn","def","def")
 | eval flatten=mvjoin(mydata,"=>")
 | eval compact=replace(flatten,"([^=]+)(?:=>\1)+","\1..")
 | stats values(mydata) by compact flatten
compact                         flatten                                 values(mydata)
xyz=>ijk..=>abc..=>lmn=>def..    xyz=>ijk=>ijk=>abc=>abc=>abc=>abc=>abc=>lmn=>def=>def  abc
0 Karma


I wish I could understand more of what you are after/what the data means. When you say you want to keep the order in which things happen you might be able to try the following:

... | reverse | dedup type mydata

Could potentially bake in time element into the dedup statement where you've pulled out the date if that is significant.

0 Karma


Have you tried something like this?

... | stats count by type mydata | stats sum(count) as total list(mydata) as mydata list(count) as count by type

0 Karma


I'm looking at software behavior, akin to user behavior. A user loads any number of products into cart=>takes a series of steps (such as login=>open wallet), some have retries => then checks out or abandons cart. So "type" would be user session, and "mydata" would be user's steps. Final stats enumerates paths and their respective popularities.

Unfortunately, |transaction type mvlist=t | mvexpand mydata
|reverse |dedup type mydata
gives 5 transactions, 3 of type 9, 2 type 15. Stats are also incorrect.
compact type
2.10.8=>3.2.2 15...
3.4.6001=>2.14.1=>2.10.8 9...

0 Karma


Hi yuanliu!

You can also try something like this:

| stats count | eval mydata=mvappend("xyz","ijk","ijk","abc","abc","abc","abc","abc","lmn","def","def") | mvexpand mydata | dedup mydata | mvcombine mydata

This takes your list and makes multiple events out of it, on for every item in the list. Then use dedup to kick out the duplicates and finally recombine it to a mv field.

0 Karma


Thanks for help, @Tom_frosscher. Unfortunately, simulated data in the original question are inadequate. So I updated with real sample data. Oddly, test code below interferes with transaction itself. Instead of two transactions, verbose mode lists 4 transactions, two for each type.

| mvexpand mydata
| dedup mydata
| mvcombine mydata
| eval compact=mvjoin(mydata,"=>")
| stats sum(eventcount) by compact type

Stats show strange output:
compact type count
2.10.8=>3.2.2 15 15 15 15 15 15 15 15 15 15 15 15 12
2.14.1=>3.4.6001 9 9 9 9 9 9 9 9 9 9 9 9 9 9 14

0 Karma
Get Updates on the Splunk Community!

Splunk APM & RUM | Upcoming Planned Maintenance

There will be planned maintenance of the streaming infrastructure for Splunk APM and Splunk RUM in the coming ...

Part 2: Diving Deeper With AIOps

Getting the Most Out of Event Correlation and Alert Storm Detection in Splunk IT Service Intelligence   Watch ...

User Groups | Upcoming Events!

If by chance you weren't already aware, the Splunk Community is host to numerous User Groups, organized ...