Splunk Search

Efficiently Expanding Multiple Multi-Value Fields Using mvexpand in Splunk

uagraw01
Motivator


Hello Splunkers !!

How can I efficiently use the mvexpand command to expand multiple multi-value fields, considering its high resource consumption and expensive command? Please guide me

uagraw01_0-1750242816511.png

 

Labels (1)
Tags (1)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

Indeed lookups often end up with multivalue.  You need to make sure that every field to include have equal number of values.

Usually I am in favor of JSON like in @livehybrid 's suggestion, although it should not be that complex; especially, one should not compose JSON with string.  More on JSON later.  There is an even simpler approach if you can enumerate fields to include: multikv.  No mvexpand needed.  Here is how:

| eval _raw = mvappend("FunctionGroup,MsgNr,alarm_severity,area,equipment",
  mvzip(mvzip(mvzip(mvzip(FunctionGroup,MsgNr, ","), alarm_severity, ","), area), equipment, ",")
  )
| multikv forceheader=1
| fields - _raw linecount

The idea is to compose a CSV table with mvzip, then extract from this table.  If composing nested mvzip is too much, or if you cannot easily enumerate fields to include, you can add foreach to your arsenal:

| rename FunctionGroup as _raw
| eval header = "FunctionGroup"
| foreach MsgNr,alarm_severity,area,equipment
    [ eval _raw = mvzip(_raw, <<FIELD>>, ","), header = header . "," . "<<FIELD>>"]
| eval _raw = mvappend(header, _raw)
| multikv forceheader=1
| fields - _raw header linecount

 Now, back to JSON - in this use case, it is more involved than multikv.  Again, with help of foreach and provided that your Splunk version is 8.1 or later, this is a semantic way to do it:

| eval jcombo = json_object()
| eval idx = mvrange(0, mvcount(FunctionGroup))
| foreach FunctionGroup MsgNr alarm_severity area equipment
    [ eval jcombo = json_set(jcombo, "<<FIELD>>", mvindex(<<FIELD>>, idx))]
| fields - FunctionGroup MsgNr alarm_severity area equipment
| mvexpand jcombo
| fields - idx jcombo

Of course, you can also do this without foreach.

Tags (1)

livehybrid
SplunkTrust
SplunkTrust

Hi @uagraw01 

Im not sure how you ended up with this list, there may be a better way, but if you really do need to split these back out then the following snippet should work - This combined the fields into a single JSON string so you can use mvexpand once.

| foreach FunctionGroup mode=multivalue [
  |eval json=mvappend(json, "{".
    "\"FunctionGroup\":\"".mvindex(FunctionGroup,<<ITER>>)."\",".
    "\"MsgNr\":\"".mvindex(MsgNr,<<ITER>>)."\",".
    "\"alarm_severity\":\"".mvindex(alarm_severity,<<ITER>>)."\",".
    "\"area\":\"".mvindex(area,<<ITER>>)."\",".
    "\"equipment\":\"".mvindex(equipment,<<ITER>>)."\"".
    "}"
    )
 ]
 | mvexpand json | eval _raw=json | fields _raw | spath

Ive tested it as best as I can:

livehybrid_0-1750247037057.png

 

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

ITWhisperer
SplunkTrust
SplunkTrust

Perhaps go back a step - how did you get these multivalue fields in the first place? Can you separate into events prior to this? What does your raw data look like?

0 Karma

uagraw01
Motivator

hi @ITWhisperer Its hard for me to share complete events.

FYI to you  Area, zone, equipment fields are coming from the index search while other fields are coming through lookups.

0 Karma

tej57
Builder

Hey @uagraw01 ,

mvexpand command works with only one field. And if you use it multiple times, it'll cause n number of duplicate values. Here's what I used in one scenario - 

Combine all the multivalue fields using mvzip and a delimeter. In your case, it would be eqvivalent to - 

| eval combined_multivalue_field = mvzip(mvzip(FunctionGroup, MsgNr, "|"),mvzip(alarm_severity,area,"|"),"|")

This will give you one single column with all the field values separated by pipe (|) delimeter. You can then use mvexpand on combined_multivalue_field

Then if you want to utilize each individual values, you can use string functions to separate out each values that you need and progress with the SPL query.

 

Thanks,
Tejas.

 

---

If the above solution helps, an upvote is appreciated.

0 Karma

uagraw01
Motivator

@tej57 Hi Tejas thanks for the answer  but I don't want chatgpt or any other AI promt answers.

0 Karma

tej57
Builder

Hey @uagraw01,

I understand that you might not have liked the solution. But it wasn't chatgpt based. If you would have read correctly, I mentioned that I used it as a solution and it worked perfectly in my scenario. Community is for helping every member and that's what I tried honestly. If you would have tried the solution, you could have observed how it would help you with the situation.  

Anyways, I hope you get the solution that you need outside of ChatGPT. Happy Splunking >.

Thanks,
Tejas.

gcusello
SplunkTrust
SplunkTrust

Hi @uagraw01 ,

could you share your search?

Ciao.

Giuseppe

0 Karma

uagraw01
Motivator

Hi @gcusello 

| datamodel Mmm_availability adapto_shuttle_alarm flat 
| lookup Alarm_list_adapto_details.csv ERROR_ID as ALARMID OUTPUTNEW DESCRIPTION Max_duration Min_duration OPERATIONAL TECHNICAL 
| rename Max_duration as MAX_DURATION Min_duration as MIN_DURATION DESCRIPTION as description ID as id 
| eval matchField = BLOCK."".SHUTTLEID 
| lookup mordc_topo modified_field as matchField OUTPUTNEW ID "Parent Description" as Name Asas as asas Weight as Weight modified_field as isc_id 
| table _time ID ALARMID description OPERATIONAL TECHNICAL Name Weight asas isc_id event_time MAX_DURATION MIN_DURATION state 
| append 
    [ search index=mess  sourcetype="EquipmentEventReport" "EquipmentEventReport.EquipmentEvent.Detail.State" IN("CAME_IN","WENT_OUT") 
    | spath input=_raw path=EquipmentEventReport.EquipmentEvent.ID.Location.PhysicalLocation.AreaID output=area 
    | spath input=_raw path=EquipmentEventReport.EquipmentEvent.ID.Location.PhysicalLocation.ZoneID output=zone 
    | spath input=_raw path=EquipmentEventReport.EquipmentEvent.ID.Location.PhysicalLocation.EquipmentID output=equipment 
    | search area=* 
    | dedup _raw 
    | lookup mordc_site_specific_scada_alarms.csv MsgNr as MsgNr OUTPUTNEW Alarmtext Functiongroup 
    | eval zone=if(len(zone)==1,"0".zone,zone), equipment=if(len(equipment)==1,"0".equipment,equipment) 
    | eval isc_id=area.".".zone.".".equipment 
    | fields _time, isc_id, area, zone, equipment start_time element error error_status description event_time state MsgNr Alarmtext Functiongroup alarm_severity 
    | fields - _raw, Alarmtext Functiongroup, MsgNr 
    | lookup isc id AS isc_id OUTPUTNEW statistical_subject mark_code 
    | eval statistical_subject = trim(statistical_subject) 
    | lookup Alarm_list_details_scada_mordc.csv component_type_id AS statistical_subject OUTPUTNEW operational_rate technical_rate maximum_duration minimum_duration alarm_severity 
    | search alarm_severity IN ("High", "Medium") 
    | lookup mordc_topology.csv modified_field as isc_id OUTPUTNEW ID "Description" as Name Asas as asas Weight as Weight 
    | rename operational_rate as OPERATIONAL technical_rate as TECHNICAL maximum_duration as MAX_DURATION minimum_duration as MIN_DURATION 
    | table _time ID description OPERATIONAL TECHNICAL Name Weight asas isc_id event_time MAX_DURATION MIN_DURATION state Alarmtext Functiongroup]


Giving below multi value field results.

uagraw01_0-1750251901091.png

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Let me jump in and offer some side notes (you're of course free to ignore me completely as this might not be related to the immediate problem at hand).

I'll leave aside for now the append command itself but the appended search might be way more efficient.

You're doing much before discarding (probably) significant portion of your data.

Firstly, you could do the two spath commands after searching for area=* and dedup. This way you'll do much less json parsing, which is quite heavy.

Secondly, you could just parse path=EquipmentEventReport.EquipmentEvent.ID.Location.PhysicalLocation as a whole and get three fields for the price of one run of spath.

Thirdly - if you want to get all events having anything in the ...area path, you could just first limit your search to results containing "area" as the search term. It might not be 100% accurate since the word can occur somewhere else in your events but it might be a pretty good way to narrow the search. (of course it won't work if you have a field named "area" in another "branch" of your jsons in 100% of your events; but it's worth checking out).

Fourthly, as I understand you have quite sizeable jsons. It's best to drop them as early as possible so you should move your fields - _raw as far up your search as possible - probably right after dedup.

And finally - your data is very very tricky to work with. You have multiple multivalued fields. I understand that the assumption is that for each of those fields first values of all those fields match the same "event" or "state" or whatever, second values of those fields create another tuple and so on. The trouble is - there is no way in Splunk to make sure of it unless you are absolutely sure that your input data is always fully populated, correct and additionally, properly ingested, parsed and so on. Otherwise a single missing value here and there squashes your values together. So relying on the order of values in multiple multivalued fields is extremely tricky.

Unfortunately sometimes the input data is simply very badly formatted and you don't have much choice but it might be worth to raise this issue with whoever or whatever produces the input data.

And of course finally it's never wrong to point out that append - especially that your appended subsearch seems quite heavy with multiple spath commands - might get silently finalized and leave you with incomplete data. You should be able to use the datamodel search with a multisearch command.

ITWhisperer
SplunkTrust
SplunkTrust

Since your multivalue fields appear to be coming from one lookup (Alarm_list_details_scada_mordc.csv) you could try something like this

| lookup Alarm_list_details_scada_mordc.csv component_type_id AS statistical_subject OUTPUTNEW operational_rate technical_rate maximum_duration minimum_duration alarm_severity
| eval row=mvrange(0,mvcount(operational_rate)) 
| mvexpand row 
| foreach operational_rate technical_rate maximum_duration minimum_duration alarm_severity
    [| eval <<FIELD>>=mvindex(<<FIELD>>,row) ]
| fields - row

 

isoutamo
SplunkTrust
SplunkTrust
I modify @uagraw01 your answer to moving your SPL inside </> block in editor. In that way it's much more readable. Can you check that it is still correct and there haven't missed anything!

Could you In future use that </> block always when you are adding some SPL, dashboards etc. With it we can ensure that what we see is what you have written, not something what editor has changed!
0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...