Help with mvexpand limits, one issue is the memory...

ITWhisperer · ‎04-23-2021

There are a couple of issues which often come up with the limits of mvexpand, one of these is the memory limit, the other is that it only applies to one field. I have developed a set of macros which go some way to solving both these issues.

First, I will share the single field version of the macro; then, I will try to explain what each step is doing; finally, I will share the double field version which can easily be expanded to multiple fields. Please excuse the long post.

fillnull value="$originull$"
| eval steps=mvcount($field1$)
| streamstats sum(steps) as toprow
| eval maxrow=toprow
| append [| makeresults | eval toprow=1 | fields - _time]
| eventstats min(maxrow) as firsttop
| where isnotnull(maxrow) or toprow != firsttop
| makecontinuous toprow
| reverse
| filldown
| reverse
| eval $field1$=mvindex($field1$,(steps - 1) - (maxrow - toprow))
| fields - maxrow toprow firsttop steps
| foreach * [eval <<FIELD>>=if('<<FIELD>>'="$originull$",null,'<<FIELD>>')]

So, what's going on?

fillnull value="$originull$"

The first thing to do is fill in the blanks. The macro takes a parameter for this since you would have to decide on a suitable string which doesn't clash with any data you want to keep, because this string is reset to null at the end. The reason for doing this is that filldown is used later and if there are any nulls in the original data, they will get overwritten with whatever happens to be above them.

| eval steps=mvcount($field1$)

Next, we count how many values are in the multivalue field for each event, i.e. how many events will we need in the expansion. Note that the field name (steps) is arbitrary and if it clashes with a field already in the data, you will lose that field, so change the name used in the macro. This goes for all the fields created by the macro.

| streamstats sum(steps) as toprow

Now we determine what will be the highest row number for each event.

| eval maxrow=toprow

We take a copy of this so it is kept for all the expanded rows.

| append [| makeresults | eval toprow=1 | fields - _time]

We add an additional event with toprow set to 1. This is because makecontinuous used later needs a place to start.

| eventstats min(maxrow) as firsttop

Now we find out what the lowest maxrow was. Note that the appended event we just added does not have a value for maxrow, so does not interfere with this calculation.

| where isnotnull(maxrow) or toprow != firsttop

If the original first event had only one row, its maxrow would be 1 and its toprow would also be 1 and the firsttop would also be 1, so, in this instance, we want to remove the event we just added.

| makecontinuous toprow

Now for the magic - we create events using toprow (which is guaranteed to start at 1) and end with the total number of rows required. Note that at this point, you could run into the rows limit, and you will need to modify your configuration to get around this.

| reverse

Reverse the order of events (they will have been put into toprow order by the makecontinuous). This is so that filldown copies the right values into the extra events.

| filldown

Copy all the fields from the events into the new events - you may hit a memory issue here as this is where most memory is taken up because all the multivalue fields are copied into their respective events.

| reverse

Return the events to original order (possibly not strictly necessary).

| eval $field1$=mvindex($field1$,(steps - 1) - (maxrow - toprow))

Select one of the multi-values.

| fields - maxrow toprow firsttop steps

Remove all the extra fields.

| foreach * [eval <<FIELD>>=if('<<FIELD>>'="$originull$",null,'<<FIELD>>')]

Restore the nulls.

For the multi-mvexpand, steps is evaluated to the maximum number of multivalue fields, and each field has the corresponding entry extracted in the expanded event. This can be extended by adding more fields.

fillnull value="$originull$"
| eval steps=max(mvcount($field1$),mvcount($field2$))
| streamstats sum(steps) as toprow
| eval maxrow=toprow
| append [| makeresults | eval toprow=1 | fields - _time]
| eventstats min(maxrow) as firsttop
| where isnotnull(maxrow) or toprow != firsttop
| makecontinuous toprow
| reverse
| filldown
| reverse
| eval $field1$=mvindex($field1$,(steps - 1) - (maxrow - toprow))
| eval $field2$=mvindex($field2$,(steps - 1) - (maxrow - toprow))
| fields - maxrow toprow firsttop steps
| foreach * [eval <<FIELD>>=if('<<FIELD>>'="$originull$",null,'<<FIELD>>')]

Splunk_321 · ‎11-29-2022

@ITWhisperer and @FrankVl

I am trying to expand couple of fields (locationId, matchRank) using the method suggested above. But it only expands last entry in the list and results doesn't contain all the entries from both the lists. My matched_locations json array can have atmost 10 items in it with the presence of both locationId and matchRank

Below is the query which I am trying out...

#############################################################

index=app_pcf AND cf_app_name="myApp" AND message_type=OUT AND msg.logger=c.m.c.d.MatchesApiDelegateImpl
| search "msg.logMessage.numReturnedMatches">0
| spath
| fillnull value="0"
| eval steps=max(mvcount('msg.logMessage.matched_locations{}.locationId'),mvcount('msg.logMessage.matched_locations{}.matchRank'))
| streamstats sum(steps) as toprow
| eval maxrow=toprow
| append
[| makeresults
| eval toprow=1
| fields - _time]
| eventstats min(maxrow) as firsttop
| where isnotnull(maxrow) or toprow != firsttop
| makecontinuous toprow
| reverse
| filldown
| reverse
| eval locationId=mvindex('msg.logMessage.matched_locations{}.locationId',(steps - 1) - (maxrow - toprow))
| eval rank=mvindex('msg.logMessage.matched_locations{}.matchRank',(steps - 1) - (maxrow - toprow))
| fields - maxrow toprow firsttop steps
| foreach *
[ eval <<FIELD>>=if('<<FIELD>>'="0",null,'<<FIELD>>')]
| table locationId,rank

#############################################################

I am getting locationId,rank from last entry of every event in the list.

Below is the sample splunk data

###########################################################

cf_app_name: myApp
cf_org_name: myOrg
cf_space_name: mySpace
job: diego_cell
message_type: OUT
msg: {
application: myApp
correlationid: 0.af277368.1669261134.5eb2322
httpmethod: GET
level: INFO
logMessage: {
apiName: Matches
apiStatus: Success
clientId: oh_HSuoA6jKe0b75gjOIL32gtt1NsygFiutBdALv5b45fe4b
error: NA
matched_locations: [
{
city: PHOENIX
countryCode: USA
locationId: bef26c03-dc5d-4f16-a3ff-957beea80482
matchRank: 1
merchantName: BIG D FLOORCOVERING SUPPLIES
postalCode: 85009-1716
state: AZ
streetAddress: 2802 W VIRGINIA AVE
}
{
city: PHOENIX
countryCode: USA
locationId: ec9b385d-6283-46f4-8c9e-dbbe41e48fcc
matchRank: 2
merchantName: BIG D FLOOR COVERING 4
postalCode: 85009
state: AZ
streetAddress: 4110 W WASHINGTON ST STE 100
}
{ [+]
}
{ [+]
}
{ [+]
}
{ [+]
}
{ [+]
}
{ [+]
}
{ [+]
}
{ [+]
}
]
numReturnedMatches: 10
}
logger: c.m.c.d.MatchesApiDelegateImpl

}
origin: rep
source_instance: 1
source_type: APP/PROC/WEB
timestamp: 1669261139716063000
}

###########################################################

Splunk_321 · ‎11-25-2022

Great Post @ITWhisperer

I have a scenario where i want to expand the field and show as individual events.

Below is my query, which works fine for smaller intervals of time, but larger intervals its not efficient.

index=app_pcf AND cf_app_name="myApp" AND message_type=OUT AND msg.logger=c.m.c.d.MatchesApiDelegateImpl
| spath "msg.logMessage.matched_locations{}.locationId"
| search "msg.logMessage.numReturnedMatches">0
| mvexpand "msg.logMessage.matched_locations{}.locationId"
| fields "msg.logMessage.matched_locations{}.locationId"
| rename "msg.logMessage.matched_locations{}.locationId" to LocationId
| table LocationId

I have a json array called matched_locations which has field locationId. I can have atmost 10 locationIds in a matched_locations I have thousands of events in the duration which will have this matched_locations json array.

Below is example of one such event with bunch of matched_locations

###########################################################

cf_app_name: myApp
cf_org_name: myOrg
cf_space_name: mySpace
job: diego_cell
message_type: OUT
msg: {
application: myApp
correlationid: 0.af277368.1669261134.5eb2322
httpmethod: GET
level: INFO
logMessage: {
apiName: Matches
apiStatus: Success
clientId: oh_HSuoA6jKe0b75gjOIL32gtt1NsygFiutBdALv5b45fe4b
error: NA
matched_locations: [
{
city: PHOENIX
countryCode: USA
locationId: bef26c03-dc5d-4f16-a3ff-957beea80482
matchRank: 1
merchantName: BIG D FLOORCOVERING SUPPLIES
postalCode: 85009-1716
state: AZ
streetAddress: 2802 W VIRGINIA AVE
}
{
city: PHOENIX
countryCode: USA
locationId: ec9b385d-6283-46f4-8c9e-dbbe41e48fcc
matchRank: 2
merchantName: BIG D FLOOR COVERING 4
postalCode: 85009
state: AZ
streetAddress: 4110 W WASHINGTON ST STE 100
}
{ [+]
}
{ [+]
}
{ [+]
}
{ [+]
}
{ [+]
}
{ [+]
}
{ [+]
}
{ [+]
}
]
numReturnedMatches: 10
}
logger: c.m.c.d.MatchesApiDelegateImpl

}
origin: rep
source_instance: 1
source_type: APP/PROC/WEB
timestamp: 1669261139716063000
}

###########################################################

Can you help me with how I can expand this field efficiently?

Thank you.

FrankVl · ‎11-25-2022

If you're really just looking at getting a list of locations, why not just do something like:

index=app_pcf AND cf_app_name="myApp" AND message_type=OUT AND msg.logger=c.m.c.d.MatchesApiDelegateImpl
| spath "msg.logMessage.matched_locations{}.locationId"
| search "msg.logMessage.numReturnedMatches">0
| rename "msg.logMessage.matched_locations{}.locationId" to LocationId
| stats count by _time,LocationId
| table LocationId

Splunk_321 · ‎11-25-2022

@FrankVl

Looks like it is serving my purpose. Thanks a lot.

Help with mvexpand limits, one issue is the memory limit, the other is that it only applies to one field

count

fields

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Join the Conversation

Help with mvexpand limits, one issue is the memory limit, the other is that it only applies to one field

count

fields

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...