Splunk Search

How do you use the arules search command?

whl329
Engager

I can't get any output data. My test dataset includes two fields f1 and f2:

| inputcsv tmp1030.csv | arules f1 f2

How do it? thx

tmp1030.csv:
f1 f2
a 1
a 2
a 3
a 4
b 1
b 2
c 2
c 3
c 4
d 2
d 3
e 1
e 2
e 4
f 3
f 4
g 2
g 4


Updated:
I found use table is fail, but use fields is pass. So, I add temp1030.csv to test index. Then:

index=test source="/opt/splunk/var/spool/splunk/dd7e0d3b0d032b1a_events.stash_new" | fields + f1 f2 | arules f1 f2 sup=2 conf=.3

Result:

Given fields    Implied fields  Strength    Given fields support    Implied fields support
1   a   0.333333    3   1
1   b   0.333333    3   1
1   e   0.333333    3   1
b   1   0.500000    2   1
b   2   0.500000    2   1
c   2   0.333333    3   1
c   3   0.333333    3   1
c   4   0.333333    3   1
d   2   0.500000    2   1
d   3   0.500000    2   1
e   1   0.333333    3   1
e   2   0.333333    3   1
e   4   0.333333    3   1
f   3   0.500000    2   1
f   4   0.500000    2   1
g   2   0.500000    2   1
g   4   0.500000    2   1

Please ignore my English.

whl329
Engager

@inventsekar

I want use splunk to do arules analysis base on data in http://www.salemmarafi.com/code/market-basket-analysis-with-r/.

First:
I download Groceries data ,like this . But splunk don't support one fields to do arules analysis.

id  items
1   {citrus fruit,semi-finished bread,margarine,ready soups}
2   {tropical fruit,yogurt,coffee}
3   {whole milk}
4   {pip fruit,yogurt,cream cheese ,meat spreads}
5   {other vegetables,whole milk,condensed milk,long life bakery product}

Second:
I create splunk custom command.

combin.py:

import itertools, re, sys, time, splunk.Intersplunk


def combinations(results):
    try:
        # get list of fields, and hash of arguments
        fields, argvals = splunk.Intersplunk.getKeywordsAndOptions()

        # for each result, add fields set to message
        for r in results:
            str1 = r["items"].split(",")
            str2 = list(itertools.combinations(str1, 2))
            str3 = '; '.join(','.join(s) for s in str2)
            for f in fields:
                r[f] = str3

        # return the results
        splunk.Intersplunk.outputResults(results)

    except:
        import traceback

        stack = traceback.format_exc()
        results = splunk.Intersplunk.generateErrorResults("Error : Traceback: " + str(stack))


results, dummyresults, settings = splunk.Intersplunk.getOrganizedResults()
results = combinations(results)

Third:
In splunk:

    | inputcsv Groceries.csv
    | eval top1item=if(match(items,"whole milk"),"whole milk",null()) | search top1item="whole milk"
    | eval items=replace(items,"{([^}]*)}","\1")
    | eval items=replace(items,"whole milk,","")
    | eval items=replace(items,",whole milk","")
    | combin item2c
    | makemv delim=";" item2c | fields - _time items
    | mvexpand item2c
    | collect index=test marker="id=t3"

Final:

index=test id=t3 
| arules item2c top1item sup=1
| sort 20  - "Given fields support"

Result:

    Given fields    Implied fields  Strength    Given fields support    Implied fields support
    root vegetables,other vegetables    whole milk  1.000000    228 228
    other vegetables,yogurt whole milk  1.000000    219 219
    other vegetables,rolls/buns whole milk  1.000000    176 176
    tropical fruit,other vegetables whole milk  1.000000    168 168
    yogurt,rolls/buns   whole milk  1.000000    153 153
    tropical fruit,yogurt   whole milk  1.000000    149 149
    other vegetables,whipped/sour cream whole milk  1.000000    144 144
    root vegetables,yogurt  whole milk  1.000000    143 143
    other vegetables,soda   whole milk  1.000000    137 137
    pip fruit,other vegetables  whole milk  1.000000    133 133
    citrus fruit,other vegetables   whole milk  1.000000    128 128
    root vegetables,rolls/buns  whole milk  1.000000    125 125
    other vegetables,domestic eggs  whole milk  1.000000    121 121
    tropical fruit,root vegetables  whole milk  1.000000    118 118
    other vegetables,butter whole milk  1.000000    113 113
    tropical fruit,rolls/buns   whole milk  1.000000    108 108
    yogurt,whipped/sour cream   whole milk  1.000000    107 107
    other vegetables,bottled water  whole milk  1.000000    106 106
    other vegetables,pastry whole milk  1.000000    104 104
    other vegetables,fruit/vegetable juice  whole milk  1.000000    103 103

Conclusion: splunk do aruls analysis is not mature, so temporarily abandoned.
The above information is for reference

0 Karma

inventsekar
SplunkTrust
SplunkTrust

well, not much related to splunk arules command.. but an interesting read on this arules topic.

as arules command says this - Implements arules agorithm as discussed in Michael Hahsler, Bettina Gruen and Kurt Hornik (2012). arules: Mining Association Rules and Frequent Itemsets. R package version 1.0-12 (http://docs.splunk.com/Documentation/Splunk/6.4.2/SearchReference/arules)

did google and found this -
http://www.salemmarafi.com/code/market-basket-analysis-with-r/

A little bit of Math
We already discussed the concept of Items and Item Sets.

We can represent our items as an item set as follows:
I = { i1,i2,…,in }

Therefore a transaction is represented as follows:
tn = { ij,ik,…,in }

This gives us our rules which are represented as follows:
{ i1,i2} => { ik}

Which can be read as “if a user buys an item in the item set on the left hand side, then the user will likely buy the item on the right hand side too”.

A more human readable example is:
{coffee,sugar} => {milk}
If a customer buys coffee and sugar, then they are also likely to buy milk.

With this we can understand three important ratios; the support, confidence and lift. We describe the significance of these in the following bullet points, but if you are interested in a formal mathematical definition you can find it on wikipedia.

Support: The fraction of which our item set occurs in our dataset.
Confidence: probability that a rule is correct for a new transaction with items on the left.
Lift: The ratio by which by the confidence of a rule exceeds the expected confidence.

Note: if the lift is 1 it indicates that the items on the left and right are independent.

thanks and best regards,
Sekar

PS - If this or any post helped you in any way, pls consider upvoting, thanks for reading !
0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...