I have a transaction in which field mydata contains repeating values like ("xyz","ijk","ijk","abc","abc","abc","abc","abc","lmn","def","def"). I want to compact this list by representing repeating elements only once, but preserving the order in which each repetition occurs. In effect, this is the output of transaction mvlist=true
. (Default transaction
implies mvlist=false
. The output is a compact, but unordered list.) Is there a list command to do this? The end goal is to illustrate a chain of events like "xyz=>ijk..=>abc..=>lmn=>def.." The closest discussion was http://answers.splunk.com/answers/95363/perform-transaction-for-only-repeating-values-of-field. But the answer there was partial, and does not apply to my use case.
(Updated) Sample data are like:
16-May-2014 00:50:10.386 type=9,mydata=2.10.8
16-May-2014 00:55:23.205 type=9,mydata=2.10.8
16-May-2014 00:59:39.760 type=9,mydata=2.10.8
16-May-2014 01:12:26.410 type=9,mydata=2.10.8
16-May-2014 01:19:55.528 type=9,mydata=2.10.8
16-May-2014 01:41:33.508 type=9,mydata=2.10.8
16-May-2014 01:43:54.872 type=9,mydata=2.10.8
16-May-2014 11:53:43.119 type=9,mydata=2.14.1
16-May-2014 11:53:44.121 type=15,mydata=2.10.8
16-May-2014 11:55:46.376 type=15,mydata=3.2.2
16-May-2014 11:57:09.548 type=15,mydata=3.2.2
16-May-2014 11:58:03.658 type=15,mydata=3.2.2
16-May-2014 11:59:03.782 type=15,mydata=3.2.2
16-May-2014 11:59:06.788 type=15,mydata=3.2.2
16-May-2014 11:59:45.870 type=15,mydata=3.2.2
16-May-2014 12:00:07.914 type=15,mydata=3.2.2
16-May-2014 12:01:25.073 type=15,mydata=2.10.8
16-May-2014 17:01:07.343 type=9,mydata=3.4.6001
16-May-2014 17:19:41.923 type=9,mydata=3.4.6001
16-May-2014 17:20:58.090 type=15,mydata=2.10.8
16-May-2014 17:21:32.159 type=15,mydata=2.10.8
16-May-2014 17:21:51.198 type=15,mydata=2.10.8
16-May-2014 19:48:41.102 type=9,mydata=3.4.6001
16-May-2014 19:49:15.172 type=9,mydata=3.4.6001
16-May-2014 20:35:44.316 type=9,mydata=3.4.6001
16-May-2014 21:15:31.373 type=9,mydata=3.4.6001
With help from Perl community, I came up with the following string method. Though usable, I feel it is lame to use string to compact a list, given that Splunk is list oriented.
source=mydata
| transaction type mvlist=true
| eval flatten=mvjoin(mydata,"=>")
| eval compact=replace(flatten,"([^=]+)(?:=>\1)+","\1..")
| stats max(eventcount) as count by compact type
compact type count
2.10.8..=>2.14.1=>3.4.6001.. 9 14
2.10.8=>3.2.2..=>2.10.8.. 15 12
(Update: max(eventcount) gives the correct count, not sum(eventcount).) Actual chain of events can easily be tested in shell:
$ for t in type=9, type=15,; do fgrep $t < mydata |cut -d\ -f3-|uniq -c; done
7 type=9,mydata=2.10.8
1 type=9,mydata=2.14.1
6 type=9,mydata=3.4.6001
1 type=15,mydata=2.10.8
7 type=15,mydata=3.2.2
4 type=15,mydata=2.10.8
Note: The original example used mvappend
to simulate output from transaction, as listed below. (source=*
can be a search that returns at least one event.) But this simulation apparently lacks some important aspect of a transaction.
source=*
| eval mydata=mvappend("xyz","ijk","ijk","abc","abc","abc","abc","abc","lmn","def","def")
| eval flatten=mvjoin(mydata,"=>")
| eval compact=replace(flatten,"([^=]+)(?:=>\1)+","\1..")
| stats values(mydata) by compact flatten
compact flatten values(mydata)
xyz=>ijk..=>abc..=>lmn=>def.. xyz=>ijk=>ijk=>abc=>abc=>abc=>abc=>abc=>lmn=>def=>def abc
def
ijk
lmn
xyz
I wish I could understand more of what you are after/what the data means. When you say you want to keep the order in which things happen you might be able to try the following:
... | reverse | dedup type mydata
Could potentially bake in time element into the dedup statement where you've pulled out the date if that is significant.
Have you tried something like this?
... | stats count by type mydata | stats sum(count) as total list(mydata) as mydata list(count) as count by type
I'm looking at software behavior, akin to user behavior. A user loads any number of products into cart=>takes a series of steps (such as login=>open wallet), some have retries => then checks out or abandons cart. So "type" would be user session, and "mydata" would be user's steps. Final stats enumerates paths and their respective popularities.
Unfortunately, |transaction type mvlist=t | mvexpand mydata
gives 5 transactions, 3 of type 9, 2 type 15. Stats are also incorrect.
|reverse |dedup type mydata
compact type
2.10.8=>3.2.2 15...
3.4.6001=>2.14.1=>2.10.8 9...
Hi yuanliu!
You can also try something like this:
| stats count | eval mydata=mvappend("xyz","ijk","ijk","abc","abc","abc","abc","abc","lmn","def","def") | mvexpand mydata | dedup mydata | mvcombine mydata
This takes your list and makes multiple events out of it, on for every item in the list. Then use dedup to kick out the duplicates and finally recombine it to a mv field.
Thanks for help, @Tom_frosscher. Unfortunately, simulated data in the original question are inadequate. So I updated with real sample data. Oddly, test code below interferes with transaction itself. Instead of two transactions, verbose mode lists 4 transactions, two for each type.
| mvexpand mydata
| dedup mydata
| mvcombine mydata
| eval compact=mvjoin(mydata,"=>")
| stats sum(eventcount) by compact type
Stats show strange output:
compact type count
2.10.8=>3.2.2 15 15 15 15 15 15 15 15 15 15 15 15 12
2.14.1=>3.4.6001 9 9 9 9 9 9 9 9 9 9 9 9 9 9 14