Reporting

is there a way to create a new variable within a stats command?

cindygibbs_08
Communicator

Hello people!

 

I hope you are doing well... Sometimes when we work with unstructured data the only way to visualize  correlations is using the stats command, and only at this point we can " give some structure" to the data. There are some instances in which I find myself in need of creating a new variable  using the current information in the stats command

Lets suppose I have this stats and it gives me the following  (mind you that TAG1 and TAG2 come from different events/logs/indices)

code

 

| stats list(TAG1) as TAG1 list(TAG2) as TAG2 list(HOTEL) as HOTE list(REVIEW) as REVIEW count(eval(TAG1=="X01")) as total_tag1(X01) by CLIENT

 

result

CLIENTTAG1total_tag1(X01)TAG2HOTELREVIEW
ANA

X01
X01

2X05
X09
GRANT-P
HOLLIZ
5
3
LUKEX011X05HOLLIZ5
ALEXX011X05UTI3

 

at this point this stats command is the only way for me to visualize the correlation or "interaction" between TAG1 and TAG2 due to the fact that they come from different logs so... what if I wanted to create a variable named CHAIN thats just TAG1 and TAG2 joined by a "-" and after that have a count by CLIENT of the combination "X01-X05" so that in the previous table I could see for ANA the column total"X01-X05" to be equal to 1?

Thank you everyone!


Kindly,

Cindy

Labels (1)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

When the event is indexed, it will be given a timestamp. This is either the time it was indexed or splunk will try and work out a suitable value from the data, although this is usually assisted by some configuration for each sourcetype; the assumption being that each sourcetype represents a different format of data, so can be parsed to extract interesting fields e.g. timestamps. If you just do a search on your index, you will see the timestamps associated with each event. The timestamp is available in a special field called _time. So, you could do something like the following to search both your indexes and drag the latest TAG1 value down to the following TAG2 event.

 

(index="Hotel_stay" code="TAG1" AND CLIENT=* AND review=*) OR (index="Hotel_comp_pack_adult" code="TAG2" AND CLIENT=*)
| sort CLIENT _time
| eval TAG1_time=if(code=="TAG1",_time,null)
| streamstats latest(TAG1_time) as TAG1_time by CLIENT
| stats list(*) as * by CLIENT TAG1_time

 

 

View solution in original post

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Are you guaranteed to have the same number of entries in the multi-value fields e.g. your example shows 2 for TAG1, TAG2, HOTEL and REVIEW for ANA?

cindygibbs_08
Communicator

Hello my dear @ITWhisperer  To answer your question.. No. I am not guaranteed  that for each client there would be a TAG2 but... there will always be a TAG1... TAG1 and TAG2 come from different events

lets say TAG1 comes from:

index="Hotel_stay"
| search code="TAG1" AND CLIENT=* AND review=*

 while TAG2 comes from 

index="Hotel_comp_pack_adult"
| search code="TAG2" AND CLIENT=*

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

OK so your issue is going to be knowing which TAG2 belongs to which TAG1 - is there some sort of unique identify used in both events, e.g. booking reference number?

0 Karma

cindygibbs_08
Communicator

@ITWhisperer  Hello! my dear..No, the only common value would be CLIENT 😞 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Are you bothered about which TAG2 is associated with which TAG1? For example:

CLIENTTAG1total_tag1(X01)TAG2HOTELREVIEW
ANA

X01
X01
X01

3X05
X09
GRANT-P
HOLLIZ
5
3

Does X05 belong with the first X01 or the second and does X09 belong with the second or third X01? Or is it sufficient to know that there are more TAG1 than TAG2 even if you can't tell which belongs to which?

Another thing to consider is timestamps, will a TAG2 always be associated with the immediately prior TAG1 or could you get two TAG1s followed by 2 TAG2s?

cindygibbs_08
Communicator

Hello @ITWhisperer  thank you so much for your help... Yes I am interested in correlating the TAG1 with the corresponding TAG2 .. for this particular case TAG2  always comes after TAG1 , so I believe a time stamp will be usefull in this case...but where can I get these timestamps?.... thank you so much my dear

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

When the event is indexed, it will be given a timestamp. This is either the time it was indexed or splunk will try and work out a suitable value from the data, although this is usually assisted by some configuration for each sourcetype; the assumption being that each sourcetype represents a different format of data, so can be parsed to extract interesting fields e.g. timestamps. If you just do a search on your index, you will see the timestamps associated with each event. The timestamp is available in a special field called _time. So, you could do something like the following to search both your indexes and drag the latest TAG1 value down to the following TAG2 event.

 

(index="Hotel_stay" code="TAG1" AND CLIENT=* AND review=*) OR (index="Hotel_comp_pack_adult" code="TAG2" AND CLIENT=*)
| sort CLIENT _time
| eval TAG1_time=if(code=="TAG1",_time,null)
| streamstats latest(TAG1_time) as TAG1_time by CLIENT
| stats list(*) as * by CLIENT TAG1_time

 

 

0 Karma

manjunathmeti
SplunkTrust
SplunkTrust

hi @cindygibbs_08,
You zip the values of TAG1 and TAG2, then count occurrences of "X01-X05" in the zipped field.

 

| stats list(TAG1) as TAG1 list(TAG2) as TAG2 list(HOTEL) as HOTE list(REVIEW) as REVIEW count(eval(TAG1=="X01")) as total_tag1(X01) by CLIENT
| eval zip=mvzip(TAG1, TAG2, "-") 
| rex field=zip max_match=0 "(?<matchstr>X01-X05)" 
| eval count=mvcount(matchstr)

 

If this reply helps you, a like would be appreciated.

cindygibbs_08
Communicator

Hello @manjunathmeti  Thank you so much for reaching out to me and for helping me! Your code TRULLY helping me with the case of "X01-X05" but now... I would like to also use it to identify all other combinations.... specially now "X01- " that means a CX who has TAG1 = X01 but did not have a TAG2, in those cases your code return an empty value... is there a way yo modify your code to also work when TAG2 is empty? Other than that it works for combinations where "TAG1 and TAG2" are not empty.. it just turns out that the rest of combinations will be of interest for my me .. thank you so much for helping me out THANKS A LOT

0 Karma

cindygibbs_08
Communicator

@manjunathmeti  I am so sorry for bothering agai...but I just noticed that  mvzip will only that the first values... but often times a CLIENT can have multiple combination... but mvzip will only return the firts one (it seems) 

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...