How to compact a list of repeating field values bu...

yuanliu · ‎09-04-2014

I have a transaction in which field mydata contains repeating values like ("xyz","ijk","ijk","abc","abc","abc","abc","abc","lmn","def","def"). I want to compact this list by representing repeating elements only once, but preserving the order in which each repetition occurs. In effect, this is the output of transaction mvlist=true. (Default transaction implies mvlist=false. The output is a compact, but unordered list.) Is there a list command to do this? The end goal is to illustrate a chain of events like "xyz=>ijk..=>abc..=>lmn=>def.." The closest discussion was http://answers.splunk.com/answers/95363/perform-transaction-for-only-repeating-values-of-field. But the answer there was partial, and does not apply to my use case.

(Updated) Sample data are like:

16-May-2014 00:50:10.386 type=9,mydata=2.10.8
16-May-2014 00:55:23.205 type=9,mydata=2.10.8
16-May-2014 00:59:39.760 type=9,mydata=2.10.8
16-May-2014 01:12:26.410 type=9,mydata=2.10.8
16-May-2014 01:19:55.528 type=9,mydata=2.10.8
16-May-2014 01:41:33.508 type=9,mydata=2.10.8
16-May-2014 01:43:54.872 type=9,mydata=2.10.8
16-May-2014 11:53:43.119 type=9,mydata=2.14.1
16-May-2014 11:53:44.121 type=15,mydata=2.10.8
16-May-2014 11:55:46.376 type=15,mydata=3.2.2
16-May-2014 11:57:09.548 type=15,mydata=3.2.2
16-May-2014 11:58:03.658 type=15,mydata=3.2.2
16-May-2014 11:59:03.782 type=15,mydata=3.2.2
16-May-2014 11:59:06.788 type=15,mydata=3.2.2
16-May-2014 11:59:45.870 type=15,mydata=3.2.2
16-May-2014 12:00:07.914 type=15,mydata=3.2.2
16-May-2014 12:01:25.073 type=15,mydata=2.10.8
16-May-2014 17:01:07.343 type=9,mydata=3.4.6001
16-May-2014 17:19:41.923 type=9,mydata=3.4.6001
16-May-2014 17:20:58.090 type=15,mydata=2.10.8
16-May-2014 17:21:32.159 type=15,mydata=2.10.8
16-May-2014 17:21:51.198 type=15,mydata=2.10.8
16-May-2014 19:48:41.102 type=9,mydata=3.4.6001
16-May-2014 19:49:15.172 type=9,mydata=3.4.6001
16-May-2014 20:35:44.316 type=9,mydata=3.4.6001
16-May-2014 21:15:31.373 type=9,mydata=3.4.6001

With help from Perl community, I came up with the following string method. Though usable, I feel it is lame to use string to compact a list, given that Splunk is list oriented.

 source=mydata
 | transaction type mvlist=true
 | eval flatten=mvjoin(mydata,"=>")
 | eval compact=replace(flatten,"([^=]+)(?:=>\1)+","\1..")
 | stats max(eventcount) as count by compact type
    compact                 type    count
2.10.8..=>2.14.1=>3.4.6001..     9   14
2.10.8=>3.2.2..=>2.10.8..        15  12

(Update: max(eventcount) gives the correct count, not sum(eventcount).) Actual chain of events can easily be tested in shell:

$ for t in type=9, type=15,; do fgrep $t < mydata |cut -d\  -f3-|uniq -c; done
   7 type=9,mydata=2.10.8
   1 type=9,mydata=2.14.1
   6 type=9,mydata=3.4.6001
   1 type=15,mydata=2.10.8
   7 type=15,mydata=3.2.2
   4 type=15,mydata=2.10.8

Note: The original example used mvappend to simulate output from transaction, as listed below. (source=* can be a search that returns at least one event.) But this simulation apparently lacks some important aspect of a transaction.

source=*
 | eval mydata=mvappend("xyz","ijk","ijk","abc","abc","abc","abc","abc","lmn","def","def")
 | eval flatten=mvjoin(mydata,"=>")
 | eval compact=replace(flatten,"([^=]+)(?:=>\1)+","\1..")
 | stats values(mydata) by compact flatten
compact                         flatten                                 values(mydata)
xyz=>ijk..=>abc..=>lmn=>def..    xyz=>ijk=>ijk=>abc=>abc=>abc=>abc=>abc=>lmn=>def=>def  abc
                                                                                        def
                                                                                        ijk
                                                                                        lmn
                                                                                        xyz

Runals · ‎09-06-2014

I wish I could understand more of what you are after/what the data means. When you say you want to keep the order in which things happen you might be able to try the following:

... | reverse | dedup type mydata

Could potentially bake in time element into the dedup statement where you've pulled out the date if that is significant.

Runals · ‎09-09-2014

Have you tried something like this?

... | stats count by type mydata | stats sum(count) as total list(mydata) as mydata list(count) as count by type

yuanliu · ‎09-08-2014

I'm looking at software behavior, akin to user behavior. A user loads any number of products into cart=>takes a series of steps (such as login=>open wallet), some have retries => then checks out or abandons cart. So "type" would be user session, and "mydata" would be user's steps. Final stats enumerates paths and their respective popularities.

Unfortunately, |transaction type mvlist=t | mvexpand mydata |reverse |dedup type mydata gives 5 transactions, 3 of type 9, 2 type 15. Stats are also incorrect.
compact type
2.10.8=>3.2.2 15...
3.4.6001=>2.14.1=>2.10.8 9...

tom_frotscher · ‎09-05-2014

Hi yuanliu!

You can also try something like this:

| stats count | eval mydata=mvappend("xyz","ijk","ijk","abc","abc","abc","abc","abc","lmn","def","def") | mvexpand mydata | dedup mydata | mvcombine mydata

This takes your list and makes multiple events out of it, on for every item in the list. Then use dedup to kick out the duplicates and finally recombine it to a mv field.

yuanliu · ‎09-05-2014

Thanks for help, @Tom_frosscher. Unfortunately, simulated data in the original question are inadequate. So I updated with real sample data. Oddly, test code below interferes with transaction itself. Instead of two transactions, verbose mode lists 4 transactions, two for each type.
| mvexpand mydata | dedup mydata | mvcombine mydata | eval compact=mvjoin(mydata,"=>") | stats sum(eventcount) by compact type

Stats show strange output:
compact type count 2.10.8=>3.2.2 15 15 15 15 15 15 15 15 15 15 15 15 12 2.14.1=>3.4.6001 9 9 9 9 9 9 9 9 9 9 9 9 9 9 14

How to compact a list of repeating field values but maintain the order of events?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

How to compact a list of repeating field values but maintain the order of events?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits