Solved: Re: is there a way to create a new variable within...

cindygibbs_08 · ‎04-25-2021

Hello people!

I hope you are doing well... Sometimes when we work with unstructured data the only way to visualize correlations is using the stats command, and only at this point we can " give some structure" to the data. There are some instances in which I find myself in need of creating a new variable using the current information in the stats command

Lets suppose I have this stats and it gives me the following (mind you that TAG1 and TAG2 come from different events/logs/indices)

code

| stats list(TAG1) as TAG1 list(TAG2) as TAG2 list(HOTEL) as HOTE list(REVIEW) as REVIEW count(eval(TAG1=="X01")) as total_tag1(X01) by CLIENT

result

CLIENT	TAG1	total_tag1(X01)	TAG2	HOTEL	REVIEW
ANA	X01 X01	2	X05 X09	GRANT-P HOLLIZ	5 3
LUKE	X01	1	X05	HOLLIZ	5
ALEX	X01	1	X05	UTI	3

at this point this stats command is the only way for me to visualize the correlation or "interaction" between TAG1 and TAG2 due to the fact that they come from different logs so... what if I wanted to create a variable named CHAIN thats just TAG1 and TAG2 joined by a "-" and after that have a count by CLIENT of the combination "X01-X05" so that in the previous table I could see for ANA the column total"X01-X05" to be equal to 1?

Thank you everyone!

Kindly,

Cindy

ITWhisperer · ‎04-26-2021

When the event is indexed, it will be given a timestamp. This is either the time it was indexed or splunk will try and work out a suitable value from the data, although this is usually assisted by some configuration for each sourcetype; the assumption being that each sourcetype represents a different format of data, so can be parsed to extract interesting fields e.g. timestamps. If you just do a search on your index, you will see the timestamps associated with each event. The timestamp is available in a special field called _time. So, you could do something like the following to search both your indexes and drag the latest TAG1 value down to the following TAG2 event.

(index="Hotel_stay" code="TAG1" AND CLIENT=* AND review=*) OR (index="Hotel_comp_pack_adult" code="TAG2" AND CLIENT=*)
| sort CLIENT _time
| eval TAG1_time=if(code=="TAG1",_time,null)
| streamstats latest(TAG1_time) as TAG1_time by CLIENT
| stats list(*) as * by CLIENT TAG1_time

View solution in original post

ITWhisperer · ‎04-26-2021

Are you guaranteed to have the same number of entries in the multi-value fields e.g. your example shows 2 for TAG1, TAG2, HOTEL and REVIEW for ANA?

cindygibbs_08 · ‎04-26-2021

Hello my dear @ITWhisperer To answer your question.. No. I am not guaranteed that for each client there would be a TAG2 but... there will always be a TAG1... TAG1 and TAG2 come from different events

lets say TAG1 comes from:

index="Hotel_stay"
| search code="TAG1" AND CLIENT=* AND review=*

while TAG2 comes from

index="Hotel_comp_pack_adult"
| search code="TAG2" AND CLIENT=*

ITWhisperer · ‎04-26-2021

OK so your issue is going to be knowing which TAG2 belongs to which TAG1 - is there some sort of unique identify used in both events, e.g. booking reference number?

cindygibbs_08 · ‎04-26-2021

@ITWhisperer Hello! my dear..No, the only common value would be CLIENT 😞

ITWhisperer · ‎04-26-2021

Are you bothered about which TAG2 is associated with which TAG1? For example:

CLIENT	TAG1	total_tag1(X01)	TAG2	HOTEL	REVIEW
ANA	X01 X01 X01	3	X05 X09	GRANT-P HOLLIZ	5 3

Does X05 belong with the first X01 or the second and does X09 belong with the second or third X01? Or is it sufficient to know that there are more TAG1 than TAG2 even if you can't tell which belongs to which?

Another thing to consider is timestamps, will a TAG2 always be associated with the immediately prior TAG1 or could you get two TAG1s followed by 2 TAG2s?

cindygibbs_08 · ‎04-26-2021

Hello @ITWhisperer thank you so much for your help... Yes I am interested in correlating the TAG1 with the corresponding TAG2 .. for this particular case TAG2 always comes after TAG1 , so I believe a time stamp will be usefull in this case...but where can I get these timestamps?.... thank you so much my dear

ITWhisperer · ‎04-26-2021

When the event is indexed, it will be given a timestamp. This is either the time it was indexed or splunk will try and work out a suitable value from the data, although this is usually assisted by some configuration for each sourcetype; the assumption being that each sourcetype represents a different format of data, so can be parsed to extract interesting fields e.g. timestamps. If you just do a search on your index, you will see the timestamps associated with each event. The timestamp is available in a special field called _time. So, you could do something like the following to search both your indexes and drag the latest TAG1 value down to the following TAG2 event.

(index="Hotel_stay" code="TAG1" AND CLIENT=* AND review=*) OR (index="Hotel_comp_pack_adult" code="TAG2" AND CLIENT=*)
| sort CLIENT _time
| eval TAG1_time=if(code=="TAG1",_time,null)
| streamstats latest(TAG1_time) as TAG1_time by CLIENT
| stats list(*) as * by CLIENT TAG1_time

manjunathmeti · ‎04-25-2021

hi @cindygibbs_08,
You zip the values of TAG1 and TAG2, then count occurrences of "X01-X05" in the zipped field.

| stats list(TAG1) as TAG1 list(TAG2) as TAG2 list(HOTEL) as HOTE list(REVIEW) as REVIEW count(eval(TAG1=="X01")) as total_tag1(X01) by CLIENT
| eval zip=mvzip(TAG1, TAG2, "-") 
| rex field=zip max_match=0 "(?<matchstr>X01-X05)" 
| eval count=mvcount(matchstr)

If this reply helps you, a like would be appreciated.

cindygibbs_08 · ‎04-26-2021

Hello @manjunathmeti Thank you so much for reaching out to me and for helping me! Your code TRULLY helping me with the case of "X01-X05" but now... I would like to also use it to identify all other combinations.... specially now "X01- " that means a CX who has TAG1 = X01 but did not have a TAG2, in those cases your code return an empty value... is there a way yo modify your code to also work when TAG2 is empty? Other than that it works for combinations where "TAG1 and TAG2" are not empty.. it just turns out that the rest of combinations will be of interest for my me .. thank you so much for helping me out THANKS A LOT

cindygibbs_08 · ‎04-26-2021

@manjunathmeti I am so sorry for bothering agai...but I just noticed that mvzip will only that the first values... but often times a CLIENT can have multiple combination... but mvzip will only return the firts one (it seems)

is there a way to create a new variable within a stats command?

summary indexing

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

is there a way to create a new variable within a stats command?

summary indexing

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits