Splunk Search

Count is getting mismatched for the fields after using the mvzip, mvexpand and mvindex commands

SureshkumarD
Explorer

Hi Team,

I need to extract the values of the fields where it has multiple values. So, I used commands like mvzip, mvexpand, mvindex and eval. However the output of my spl query is not matching with the count of the interesting field. Could you please assist on this? Here is my SPL query and output screenshots below.

index="xxx" sourcetype="xxx" source=xxx events{}.application="xxx" userExperienceScore=FRUSTRATED
| rename userActions{}.application as Application, userActions{}.name as Action, userActions{}.targetUrl as Target_URL, userActions{}.duration as Duration, userActions{}.type as User_Action_Type, userActions{}.apdexCategory as useractions_experience_score
| eval x=mvzip(mvzip(Application,Action),Target_URL), y=mvzip(mvzip(Duration,User_Action_Type),useractions_experience_score)
| mvexpand x
| mvexpand y
| dedup x
| eval x=split(x,","), y=split(y,",")
| eval Application=mvindex(x,0), Action=mvindex(x,1), Target_URL=mvindex(x,2), Duration=mvindex(y,0), User_Action_Type=mvindex(y,1), useractions_experience_score=mvindex(y,2)
| eval Duration_in_Mins=Duration/60000
| eval Duration_in_Mins=round(Duration_in_Mins,2)
| table _time, Application, Action, Target_URL,Duration_in_Mins,User_Action_Type,useractions_experience_score
| sort - _time
| search useractions_experience_score=FRUSTRATED
| search Application="*"
| search Action="*"


Query Output with the statistics count:

SureshkumarD_2-1712983152463.png

 

Expected Count:

SureshkumarD_1-1712982996608.png

 

 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

mvzip, mvexpand and mvindex are simply wrong tools for your data structure. (Well, mvexpand will be needed, but only after you properly handle the array in your data.)  As everybody in this post has pointed out: You need to post sample or precise mock data to reveal the structure. (In text, never screenshot.) This is extremely important when asking question about data analytics in a forum.  When you force volunteers to read your mind, not only will they get FRUSTRATED, but even if they are willing, most of the time their mind reading will be incorrect.

This said, based on your code, I kind of picture together a rough structure of your data.  I will use JSON to illustrate.  Something like

 

{
  "userActions": [
    {
      "application": "app1",
      "name": "action1",
      "targetUrl": "url1",
      "duration": 1234,
      "type": "actiontype1",
      "apdexCategory": "SATISFIED"
    },
    {
      "application": "app1",
      "name": "action2",
      "targetUrl": "url1",
      "duration": 2345,
      "type": "actiontype1",
      "apdexCategory": "DISATISFIED"
    },
    {
      "application": "app1",
      "name": "action3",
      "targetUrl": "url2",
      "duration": 3456,
      "type": "actiontype2",
      "apdexCategory": "FRUSTRATED"
    }
  ],
  "userExperienceScore": "FRUSTRATED",
  "events": [
    {"application": "xxx", "irrelevant": "aaa"},
    {"application": "yyy", "irrelevant": "bbb"}
  ]
}

 

 

Your event could be in JSON or it could be in XML, but it contains at least two arrays, events[] and userActions[].  Is this correct?  The array events[] is not what frustrates you because its elements and components are no longer needed after initial search.  Your end goal from the above three elements of userActions[] is to pick out

 

    {
      "application": "app1",
      "name": "action3",
      "targetUrl": "url2",
      "duration": 3456,
      "type": "actiontype2",
      "apdexCategory": "FRUSTRATED"
    }

 

and display it in this format:

_timeApplicationActionTarget_URLDuration_in_MinsUser_Action_Typeuseractions_experience_score
2024-04-18 22:45:22app1action3url20.06actiontype2FRUSTRATED

If the above looks close, the first thing you need to do is to forget all about Splunk's flattened fields userActions{}.*; in fact, discard them all.  Use spath to reach elements of this array, then mvexpand over the elements, no funny mvzip business.  After that, everything becomes trivial.

Using my speculated data, I can reconstruct your SPL into the following to obtain my illustrated output:

 

 

index="xxx" sourcetype="xxx" source=xxx
 events{}.application="xxx" userExperienceScore=FRUSTRATED

| fields - userActions{}.*
| spath path=userActions{}
| mvexpand userActions{}
| spath input=userActions{}
| dedup application name targetUrl
| search apdexCategory = FRUSTRATED application = * name = *
| sort - _time
| rename application as Application, name as Action, targetUrl as Target_URL,
  type as User_Action_Type, apdexCategory as useractions_experience_score
| eval Duration_in_Mins = round(duration / 60000, 2)
| table _time, Application, Action, Target_URL,Duration_in_Mins,User_Action_Type,useractions_experience_score

 

Hope this helps.

Here is an emulation of my speculated data.  Play with it and compare with real data

| makeresults
| eval _raw = "{
  \"userActions\": [
    {
      \"application\": \"app1\",
      \"name\": \"action1\",
      \"targetUrl\": \"url1\",
      \"duration\": 1234,
      \"type\": \"actiontype1\",
      \"apdexCategory\": \"SATISFIED\"
    },
    {
      \"application\": \"app1\",
      \"name\": \"action2\",
      \"targetUrl\": \"url1\",
      \"duration\": 2345,
      \"type\": \"actiontype1\",
      \"apdexCategory\": \"DISATISFIED\"
    },
    {
      \"application\": \"app1\",
      \"name\": \"action3\",
      \"targetUrl\": \"url2\",
      \"duration\": 3456,
      \"type\": \"actiontype2\",
      \"apdexCategory\": \"FRUSTRATED\"
    }
  ],
  \"userExperienceScore\": \"FRUSTRATED\",
  \"events\": [
    {\"application\": \"xxx\", \"irrelevant\": \"aaa\"},
    {\"application\": \"yyy\", \"irrelevant\": \"bbb\"}
  ]
}"
| spath
``` data speculation for
index="xxx" sourcetype="xxx" source=xxx events{}.application="xxx" userExperienceScore=FRUSTRATED
```
Tags (1)
0 Karma

SureshkumarD
Explorer

Hi @yuanliu ,

Thanks much for your response. I tried the SPL query which you shared. If I dedup only application, name and target url, then I could see duplicates in _time field (Refer screenshot 1) and here is the real data's timestamp in screenshot 2 which inturn gives us the incorrect count between the query output (count 😎 and the real data logs (count 12)

Screenshot1 

SureshkumarD_0-1713562402473.png

 

Screenshot2:

SureshkumarD_1-1713562703227.png

 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

You are avoiding answer the question everybody asks: What is your actual data structure?  Is it close to what I speculated?  Further, you have never illustrated what is expected output.  So, the two screenshots with only timestamps means nothing to volunteers.  How can we help further?

Specifically,  WHY should the output NOT have multiple timestamps after deduping application, name and target url?  In your original SPL, the only dedup is on x, which is Application+Action+Target_URL.  How is this different?  Anything after mvexpand in my search is based on my reading of your intent based only on that complex SPL sample.  Instead of making volunteers to read your mind, how about expressing the actual dataset, the result you are trying to get from the data, and the logic to derive the desired result and dataset in plain language (without SPL)?

0 Karma

SureshkumarD
Explorer

Hi @yuanliu ,

Here are my responses below. Please let me know if I missed anything.

1) What is your actual data structure? - It is in JSON format.

userActions: [ [-]
{ [-]
apdexCategory: FRUSTRATED
application: xxx
cdnBusyTime: null
cdnResources: 0
cumulativeLayoutShift: 0.0006
customErrorCount: 0
dateProperties: [ [+]
]
documentInteractiveTime: 13175
domCompleteTime: 15421
domContentLoadedTime: 14261
domain: xxx
doubleProperties: [ [+]
]
duration: 15430
endTime: 1714043044710
firstInputDelay: 1
firstPartyBusyTime: null
firstPartyResources: 0
frontendTime: 14578
internalApplicationId: APPLICATION-B9BADE8D75A35E32
internalKeyUserActionId: APPLICATION_METHOD-E3EF5284923A6BA3
javascriptErrorCount: 0
keyUserAction: true
largestContentfulPaint: 15423
loadEventEnd: 15430
loadEventStart: 15430
longProperties: [ [+]
]
matchingConversionGoals: [ [+]
]
name: loading of page /home
navigationStart: 1714043029280
networkTime: 804
requestErrorCount: 0
requestStart: 630
responseEnd: 852
responseStart: 678
serverTime: 48
speedIndex: 13866
startTime: 1714043029280
stringProperties: [ [+]
]
targetUrl: xxx
thirdPartyBusyTime: null
thirdPartyResources: 0
totalBlockingTime: null
type: Load
userActionPropertyCount: 0
visuallyCompleteTime: 15416
}
]



2) Is it close to what I speculated? Further, you have never illustrated what is expected output. So, the two screenshots with only timestamps means nothing to volunteers. How can we help further? -Yes correct, it is relevant to the data what you predicted and here is the expected output

_time                                      Application      Action                   Target_URL          Duration       User_Action_Type
2024-04-25 07:39:53     xxx                      loading /home    www.abc.com    0.26                Load
2024-04-25 06:25:50     abc                     loading /wcc/ui/ www.xyz.com   3.00                 Load
2024-04-24 19:00:57     xyz                     keypress  policy   www.bdc.com  3.00                 Load
2024-04-24 17:05:11     abc                    loading /home      www.xyz.com   0.53                 Xhr
2024-04-24 10:14:47     bcd                    loading /prod        www.rst.com     0.02                 Load


3) Specifically, WHY should the output NOT have multiple timestamps after deduping application, name and target url? - I could see multiple timesteps after dedup the application, name and target url. Here is the screenshot below and see the yellow highlighted.

SureshkumarD_1-1714053761416.png

 

4) In your original SPL, the only dedup is on x, which is Application+Action+Target_URL. How is this different?. My bad, here is the below correct SPL query that I tried.

index="dynatrace" sourcetype="dynatrace:usersession" source=saas_prod events{}.application="participantmanagement.thehartford.com" userExperienceScore=FRUSTRATED
| rename userActions{}.application as Application, userActions{}.name as Action, userActions{}.targetUrl as Target_URL, userActions{}.duration as Duration, userActions{}.type as User_Action_Type, userActions{}.apdexCategory as useractions_experience_score
| eval x=mvzip(mvzip(Application,Action),Target_URL), y=mvzip(mvzip(Duration,User_Action_Type),useractions_experience_score)
| mvexpand x
| mvexpand y
| eval x=split(x,","), y=split(y,",")
| eval Application=mvindex(x,0), Action=mvindex(x,1), Target_URL=mvindex(x,2), Duration=mvindex(y,0), User_Action_Type=mvindex(y,1), useractions_experience_score=mvindex(y,2)
| eval Duration_in_Mins=Duration/60000
| eval Duration_in_Mins=round(Duration_in_Mins,2)
| table _time, Application, Action, Target_URL,Duration_in_Mins,User_Action_Type,useractions_experience_score
| search useractions_experience_score=FRUSTRATED
| sort - _time
| dedup _time
| search Application="*"
| search Action="*"
| fields - useractions_experience_score


5) Anything after mvexpand in my search is based on my reading of your intent based only on that complex SPL sample. Instead of making volunteers to read your mind, how about expressing the actual dataset, the result you are trying to get from the data, and the logic to derive the desired result and dataset in plain language (without SPL)? Here is the actual dataset below.

4/25/24
7:39:53.000 AM
{ [-]
applicationType: WEB_APPLICATION
bounce: true
browserFamily: Microsoft Edge
browserMajorVersion: Microsoft Edge 122
browserType: Desktop Browser
clientType: Desktop Browser
connectionType: UNKNOWN
dateProperties: [ [+]
]
displayResolution: HD
doubleProperties: [ [+]
]
duration: 15430
endReason: TIMEOUT
endTime: 1714043044710
errors: [ [+]
]
events: [ [-]
{ [-]
application: xxx
internalApplicationId: APPLICATION-B9BADE8D75A35E32
name: Page change
page: /index.html
pageGroup: /index.html
startTime: 1714043029280
type: PageChange
}
{ [-]
application: xxx
internalApplicationId: APPLICATION-B9BADE8D75A35E32
name: Page change
page: /home
pageGroup: /home
pageReferrer: /index.html
pageReferrerGroup: /index.html
startTime: 1714043043405
type: PageChange
}
{ [-]
application: xxx
internalApplicationId: APPLICATION-B9BADE8D75A35E32
name: Page change
page: /employee/details
pageGroup: /employee/details
pageReferrer: /employee/coverage-list
pageReferrerGroup: /employee/coverage-list
startTime: 1714043088821
type: PageChange
}
{ [-]
application: xxx
internalApplicationId: APPLICATION-B9BADE8D75A35E32
name: Page change
page: /employee/coverage-list
pageGroup: /employee/coverage-list
pageReferrer: /employee/details
pageReferrerGroup: /employee/details
startTime: 1714043403199
type: PageChange
}
]
hasCrash: false
hasError: false
hasSessionReplay: false
internalUserId: 17140430425327CQRENIQATV2OT5DV5BTJ2UB3MQF2ALH
ip: 10.215.67.0
longProperties: [ [+]
]
matchingConversionGoals: [ [+]
]
matchingConversionGoalsCount: 0
newUser: true
numberOfRageClicks: 0
numberOfRageTaps: 0
osFamily: Windows
osVersion: Windows 10
partNumber: 0
screenHeight: 720
screenOrientation: LANDSCAPE
screenWidth: 1280
startTime: 1714043029280
stringProperties: [ [+]
]
syntheticEvents: [ [+]
]
tenantId: vhz76055
totalErrorCount: 0
totalLicenseCreditCount: 0
userActionCount: 1
userActions: [ [-]
{ [-]
apdexCategory: FRUSTRATED
application: xxx
cdnBusyTime: null
cdnResources: 0
cumulativeLayoutShift: 0.0006
customErrorCount: 0
dateProperties: [ [+]
]
documentInteractiveTime: 13175
domCompleteTime: 15421
domContentLoadedTime: 14261
domain: xxx
doubleProperties: [ [+]
]
duration: 15430
endTime: 1714043044710
firstInputDelay: 1
firstPartyBusyTime: null
firstPartyResources: 0
frontendTime: 14578
internalApplicationId: APPLICATION-B9BADE8D75A35E32
internalKeyUserActionId: APPLICATION_METHOD-E3EF5284923A6BA3
javascriptErrorCount: 0
keyUserAction: true
largestContentfulPaint: 15423
loadEventEnd: 15430
loadEventStart: 15430
longProperties: [ [+]
]
matchingConversionGoals: [ [+]
]
name: loading of page /home
navigationStart: 1714043029280
networkTime: 804
requestErrorCount: 0
requestStart: 630
responseEnd: 852
responseStart: 678
serverTime: 48
speedIndex: 13866
startTime: 1714043029280
stringProperties: [ [+]
]
targetUrl: xxx
thirdPartyBusyTime: null
thirdPartyResources: 0
totalBlockingTime: null
type: Load
userActionPropertyCount: 0
visuallyCompleteTime: 15416
}
]
userExperienceScore: FRUSTRATED
userSessionId: UEFUBRDAPRDHURTCPUKFKKPJVORTPPJA-0
userType: REAL_USER

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

OK. First things first.

As @bowesmana already pointed out - you're using dedup. This is a very tricky command. It leaves you with just the first event having a unique value of given field (or set of fields). You lose all information about data in all other fields if they differ in multiple events.

A simple run-anywhere example

| makeresults count=10
| eval constant="b"
| eval random=random()
| eventstats values(random) as all_values
| dedup constant

Here we explicitly created a summarized field called  all_values so that we can in the end see what events were generated but as we do the dedup command we're left with just one result having just one value in the random field.

That's one thing that definitely affects your results.

Another thing is that multivalued fields can be a bit unintuitive sometimes.

Another run-anywhere example

| makeresults count=10 
| eval a=split("a,b,c,d",",")
| stats count count(a)

 Since each of generated result rows has a multivalued field a with four separate values, the overall count which returns just count of results will return 10 but count(a) which will count values in the a field will return 40 because there are 40 separate values in that field.

0 Karma

SureshkumarD
Explorer

Hi @PickleRick 

Not sure If I explained my requirements in a right way. I would like to expand the multiple values present in the each row to separate rows which is having only the "Frustated" related details like the below expected output

SureshkumarD_0-1713473183933.png

 

Expected output

URLDuration TypeStatus
www.cde.com88647LoadFrustated
www.fge.com6265LoadFrustated
www.abc.com500LoadFrustated
0 Karma

bowesmana
SplunkTrust
SplunkTrust

Judging from the SPL, where you have two searches for FRUSTRATED, it seems like you have data where multiple userExperienceScores can exist for the same event, hence all the mvexpanding out.

As @PickleRick points out, it's quite tricky to deal with multivalue fields, particularly when you have 6 MV fields that you are zipping up into 2 pairs (x and y). I assume you are using 2 pairs as there is not a 1:1 correlation between the MV's in each of the pairs.

What I would suggest is to find a reasonably complex SINGLE (or two) event where you can exhibit the problem. This will make it much easier to diagnose the issue. Then we can help explain what is going on.

If you are able to share an example of the raw data (sanitised and preferably NOT a screen image - so we can produce a working example of a solution) that would be good.

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

We don't know your raw data but the main question is why do you go to all this trouble of mvzipping and joining all those values into a multivalued field when next you want to do is mvexpand.

Why not just filter on raw data in the initial search?

0 Karma

bowesmana
SplunkTrust
SplunkTrust

You are deduping 'x' so you need to understand the consequences of that.

Your search is not doing any aggregations, so without knowing what combinations of Application, Action and Target_URL you have, it's impossible to know what's going on here.

These 3 lines are most likely the source of your problem

| mvexpand x
| mvexpand y
| dedup x

 

KendallW
Contributor

Hi @SureshkumarD would it be possible to provide some sample data to go with the search?

0 Karma

SureshkumarD
Explorer

Hi @KendallW @bowesmana ,

Can I share you the raw data to go with the search? Please let me know.

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...