Hello people!
I hope you are doing well... Sometimes when we work with unstructured data the only way to visualize correlations is using the stats command, and only at this point we can " give some structure" to the data. There are some instances in which I find myself in need of creating a new variable using the current information in the stats command
Lets suppose I have this stats and it gives me the following (mind you that TAG1 and TAG2 come from different events/logs/indices)
code
| stats list(TAG1) as TAG1 list(TAG2) as TAG2 list(HOTEL) as HOTE list(REVIEW) as REVIEW count(eval(TAG1=="X01")) as total_tag1(X01) by CLIENT
result
CLIENT | TAG1 | total_tag1(X01) | TAG2 | HOTEL | REVIEW |
ANA | X01 | 2 | X05 X09 | GRANT-P HOLLIZ | 5 3 |
LUKE | X01 | 1 | X05 | HOLLIZ | 5 |
ALEX | X01 | 1 | X05 | UTI | 3 |
at this point this stats command is the only way for me to visualize the correlation or "interaction" between TAG1 and TAG2 due to the fact that they come from different logs so... what if I wanted to create a variable named CHAIN thats just TAG1 and TAG2 joined by a "-" and after that have a count by CLIENT of the combination "X01-X05" so that in the previous table I could see for ANA the column total"X01-X05" to be equal to 1?
Thank you everyone!
Kindly,
Cindy
When the event is indexed, it will be given a timestamp. This is either the time it was indexed or splunk will try and work out a suitable value from the data, although this is usually assisted by some configuration for each sourcetype; the assumption being that each sourcetype represents a different format of data, so can be parsed to extract interesting fields e.g. timestamps. If you just do a search on your index, you will see the timestamps associated with each event. The timestamp is available in a special field called _time. So, you could do something like the following to search both your indexes and drag the latest TAG1 value down to the following TAG2 event.
(index="Hotel_stay" code="TAG1" AND CLIENT=* AND review=*) OR (index="Hotel_comp_pack_adult" code="TAG2" AND CLIENT=*)
| sort CLIENT _time
| eval TAG1_time=if(code=="TAG1",_time,null)
| streamstats latest(TAG1_time) as TAG1_time by CLIENT
| stats list(*) as * by CLIENT TAG1_time
Are you guaranteed to have the same number of entries in the multi-value fields e.g. your example shows 2 for TAG1, TAG2, HOTEL and REVIEW for ANA?
Hello my dear @ITWhisperer To answer your question.. No. I am not guaranteed that for each client there would be a TAG2 but... there will always be a TAG1... TAG1 and TAG2 come from different events
lets say TAG1 comes from:
index="Hotel_stay"
| search code="TAG1" AND CLIENT=* AND review=*
while TAG2 comes from
index="Hotel_comp_pack_adult"
| search code="TAG2" AND CLIENT=*
OK so your issue is going to be knowing which TAG2 belongs to which TAG1 - is there some sort of unique identify used in both events, e.g. booking reference number?
@ITWhisperer Hello! my dear..No, the only common value would be CLIENT 😞
Are you bothered about which TAG2 is associated with which TAG1? For example:
CLIENT | TAG1 | total_tag1(X01) | TAG2 | HOTEL | REVIEW |
ANA | X01 | 3 | X05 X09 | GRANT-P HOLLIZ | 5 3 |
Does X05 belong with the first X01 or the second and does X09 belong with the second or third X01? Or is it sufficient to know that there are more TAG1 than TAG2 even if you can't tell which belongs to which?
Another thing to consider is timestamps, will a TAG2 always be associated with the immediately prior TAG1 or could you get two TAG1s followed by 2 TAG2s?
Hello @ITWhisperer thank you so much for your help... Yes I am interested in correlating the TAG1 with the corresponding TAG2 .. for this particular case TAG2 always comes after TAG1 , so I believe a time stamp will be usefull in this case...but where can I get these timestamps?.... thank you so much my dear
When the event is indexed, it will be given a timestamp. This is either the time it was indexed or splunk will try and work out a suitable value from the data, although this is usually assisted by some configuration for each sourcetype; the assumption being that each sourcetype represents a different format of data, so can be parsed to extract interesting fields e.g. timestamps. If you just do a search on your index, you will see the timestamps associated with each event. The timestamp is available in a special field called _time. So, you could do something like the following to search both your indexes and drag the latest TAG1 value down to the following TAG2 event.
(index="Hotel_stay" code="TAG1" AND CLIENT=* AND review=*) OR (index="Hotel_comp_pack_adult" code="TAG2" AND CLIENT=*)
| sort CLIENT _time
| eval TAG1_time=if(code=="TAG1",_time,null)
| streamstats latest(TAG1_time) as TAG1_time by CLIENT
| stats list(*) as * by CLIENT TAG1_time
hi @cindygibbs_08,
You zip the values of TAG1 and TAG2, then count occurrences of "X01-X05" in the zipped field.
| stats list(TAG1) as TAG1 list(TAG2) as TAG2 list(HOTEL) as HOTE list(REVIEW) as REVIEW count(eval(TAG1=="X01")) as total_tag1(X01) by CLIENT
| eval zip=mvzip(TAG1, TAG2, "-")
| rex field=zip max_match=0 "(?<matchstr>X01-X05)"
| eval count=mvcount(matchstr)
If this reply helps you, a like would be appreciated.
Hello @manjunathmeti Thank you so much for reaching out to me and for helping me! Your code TRULLY helping me with the case of "X01-X05" but now... I would like to also use it to identify all other combinations.... specially now "X01- " that means a CX who has TAG1 = X01 but did not have a TAG2, in those cases your code return an empty value... is there a way yo modify your code to also work when TAG2 is empty? Other than that it works for combinations where "TAG1 and TAG2" are not empty.. it just turns out that the rest of combinations will be of interest for my me .. thank you so much for helping me out THANKS A LOT
@manjunathmeti I am so sorry for bothering agai...but I just noticed that mvzip will only that the first values... but often times a CLIENT can have multiple combination... but mvzip will only return the firts one (it seems)