Solved: make multivalue field from job ID matching multipl...

jonzatlmi · ‎09-29-2020

In events that we extract CID and JID from, I would like to have an output of all JID that interacted with multiple CID

JID is a job ID
CID is a customer ID

I want to know where the same job interacted with more than one customer, and would like to output it in a MV field. I achieve roughly what I want with this:

index="index-34" host="jobserver12-*" "Concurrent:" cmd="invite"
| eval _raw=log | rex (cid:\s(?<cid>\d+)\s) | rex (jid:\s(?<jid>\d+)\s)
| stats count values(cid) by jid

But I want to know how to do this directly, I tried with mvcombine but it looks like the fields have to have the exact same values. Both JID and CID vary.

Thanks!

Richfez · ‎09-29-2020

I am not sure why you would think this isn't "doing it directly". Perhaps if you describe what you think that would look like, and why the stats way feels roundabout it would help?

There may be a few tweaks to make it better, though.

index="index-34" host="jobserver12-*" "Concurrent:" cmd="invite"
| rex field=log (cid:\s+(?<cid>\d+)\s) | rex field=log (jid:\s+(?<jid>\d+)\s)
| stats dc(cid) values(cid) by jid

Try that one, it

a) uses the `field=` parameter of `rex` instead of changing _raw.

b) also does a `dc(cid)` for a distinct count of the values of cid, per jid.

Happy Splunking! And be sure to let us know if that helps or not.

-Rich

View solution in original post

Richfez · ‎09-29-2020

I am not sure why you would think this isn't "doing it directly". Perhaps if you describe what you think that would look like, and why the stats way feels roundabout it would help?

There may be a few tweaks to make it better, though.

index="index-34" host="jobserver12-*" "Concurrent:" cmd="invite"
| rex field=log (cid:\s+(?<cid>\d+)\s) | rex field=log (jid:\s+(?<jid>\d+)\s)
| stats dc(cid) values(cid) by jid

Try that one, it

a) uses the `field=` parameter of `rex` instead of changing _raw.

b) also does a `dc(cid)` for a distinct count of the values of cid, per jid.

Happy Splunking! And be sure to let us know if that helps or not.

-Rich

jonzatlmi · ‎09-29-2020

To me, it feels like a non-direct way of getting the desired result because I couldn't specifically say that I want the results of these fields to be combined into a multivalue styleconcatenation. I was hoping to learn something (which I did from your reply, thank you very much) that I could use to understand multivalue fields better. But in the end, these are the exact results I think I'm looking for so I will pick my battles, particular the ones where I succeed I guess.

Thanks again!

Richfez · ‎09-29-2020

Ah yes, understood.

MV fields are sort of weird little things in Splunk-land. I love 'em, they're really useful, but they sometimes behave in a way common sense would say they shouldn't. On the other hand, I think it's mostly sensible. 🙂

So! Maybe a tiny pointer to help with understanding, and maybe how to play with some mv fields.

One thing we find ourselves doing a lot of, in order to create a run-anywhere search for examples here in Answers and in Slack, is something like the following.

| makeresults 
| eval dates="2020-04-18:2 2020-04-24:5 2020-05-02:9 2020-05-09:7 2020-05-16:11 2020-05-23:8 2020-05-30:11 2020-06-06:9 2020-06-13:14" 
| makemv delim=" " dates 
| mvexpand dates 
| makemv delim=":" dates 
| eval date=mvindex(dates,0), count=mvindex(dates,1) 
| eval _time = strptime(date, "%Y-%m-%d")

The explanation may give you a kick start here.

Do note, you can - and I recommend! - running this by starting with the first line, then add the second and run it again, and so on. That way you can see each line and what it does to the previous results.

makeresults generates an empty event.

The eval just creates a simple field called "dates" with that big string in it.

We then make the field "dates" into a multi-valued field by splitting it on spaces.

Now that it's an mv field, we can 'mvexpand' it into separate events so now I have a series of precise dates in the events.

Now comes the fun! Now we make the new split-up field into a mv field by telling it to split it on the colon.

We don't actually want to expand that again, because then we'd have dates and "counts" on separate events. Instead, we use mvindex() to pull out the first mv-value of the mv-field dates and call it 'date', and again pull out a count from the second mv-value of the mv-field.

Then last I set _time to be that date we pulled out.

So, that's just a bit of playing around with MV stuff, thought you might find it fun or at least useful.

For what it's worth, that search was a run anywhere I created to illustrate the bug that if the Splunk command "predict" is fed data where it has empty "left side" data (e.g. earlier), then it goofs up the graphs.

It's pretty funny and if you trendline some data with smoothing it over 5 periods or something (sma5) before predicting it...

You can see it for yourself. Take the whole search below

| makeresults 
| eval dates="2020-04-18:2 2020-04-24:5 2020-05-02:9 2020-05-09:7 2020-05-16:11 2020-05-23:8 2020-05-30:11 2020-06-06:9 2020-06-13:14" 
| makemv delim=" " dates 
| mvexpand dates 
| makemv delim=":" dates 
| eval date=mvindex(dates,0), count=mvindex(dates,1) 
| eval _time = strptime(date, "%Y-%m-%d") 
| timechart sum(count) as count span=1w 
| trendline sma5(count) as smooth_count 
| predict smooth_count
| fields - smooth_count count

Then change to the Visualization tab, then switch it to a line chart. Follow the prediction line carefully...

Super fun. 🙂

jonzatlmi · ‎10-03-2020

Thank you for that, complex enough to dig into, and straightforward enough to figure out.

Is it right to gleam that MV is mostly a world where you're working with single events, not one where you're combining fields from separate events?

In your example we are being playful with the `dates` field that we created, putting it through MV and splitting it and reassembling it in various ways. But this is a per event result, that's what makes me thing it's not something to go across multiple events.

Thanks again, very much!

Richfez · ‎10-03-2020

Yep, exactly right - multi-value fields are those that are in a single event. By definition, when you have more than one event, ... they're just separate values for the field, not multiple values inside one field content.

Though you really can combine other events together, with a field being made multi-value too. Oh, now I think I've done it and confused it all back up again.

What you are doing is taking events that are otherwise the same, smashing them together on the field that isn't the same, and making that new single event have that one non-same-content field into an mv.

Take a look at this:

| makeresults count=2
| streamstats count
| eval name = "Myrtle"
| eval occupation = "Haberdasher"
| eval favorite_foods = if(count=1, "Ice Cream", "Pizza")

Myrtle the Haberdasher likes both Ice Cream and Pizza. Two events, one for each, sort of like if you flattened a normalized database by doing a join in your select off the "people" and "favorite_foods" tables.

If you tried to mvcombine favorite foods, you'll find you can't - and the reason IMO is very enlightening. Here's the non-working try:

| makeresults count=2
| streamstats count
| eval name = "Myrtle"
| eval occupation = "Haberdasher"
| eval favorite_foods = if(count=1, "Ice Cream", "Pizza")
| mvcombine delim="," favorite_foods

This still leaves you with two events.

An that's because not all the fields are the same yet - you'll see I left the streamstats "count" field in there, so that keeps Splunk from putting those events together because it doesn't know what to do about that "other" different field. "Favorite_foods" it could make into an mv, but count?

(And unfortunately, you can't mvcombine on two fields at once. Argh, I should check for an idea to make this better... OK there wasn't so I made one, feel free to toss a vote or two onto it https://ideas.splunk.com/ideas/EID-I-595).

So if you add in a 'fields - count' it'll now work:

| makeresults count=2
| streamstats count
| eval name = "Myrtle"
| eval occupation = "Haberdasher"
| eval favorite_foods = if(count=1, "Ice Cream", "Pizza")
| fields - count
| mvcombine delim="," favorite_foods

You'll note I made the two events in an entirely different way than the previous example, using streamstats so I could conditionally make favorite_foods be one of two things. I did this so that we had no MV-style stuff *anywhere* above that mvcombine at the bottom. I figured easier to understand if I didn't already have just used a bunch of mv-stuff to have built the events in the first place only to use mv-stuff to smash it back together, which who knows what tomfoolery I may have done in there? Or to prove that, in the words of a famous moose, that there was "nothing up my sleeve" 🙂

Lastly, you can accomplish this as well by using stats.

| makeresults count=2
| streamstats count
| eval name = "Myrtle"
| eval occupation = "Haberdasher"
| eval favorite_foods = if(count=1, "Ice Cream", "Pizza")
| fields - count
| stats values(favorite_foods) as favorite_foods by name, occupation

And in fact, with stats you can totally throw away that "count" field by just ignoring it.

| makeresults count=2
| streamstats count
| eval name = "Myrtle"
| eval occupation = "Haberdasher"
| eval favorite_foods = if(count=1, "Ice Cream", "Pizza")
| stats values(favorite_foods) as favorite_foods by name, occupation

So, stats has it better in many ways.

But the big drawback to stats is that everything you want to include has to be mentioned either in the values() or in the 'by' clause.

Which of course, when you need it, the way to mv more than one field is with stats, because you can do multiples. Or even all, like in this example.

| makeresults count=2
| streamstats count
| eval name = if(count=1, "Myrtle", "Hyacinth")
| eval occupation = if(count=1, "Haberdasher", "Homemaker")
| eval favorite_foods = if(count=1, "Ice Cream", "Pizza")
| stats values(*) as *

Anyhow, happy Splunking, and have fun!

-Rich

make multivalue field from job ID matching multiple customer IDs

eval

field extraction

lookup

rex

stats

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!