Splunk Enterprise

mvexpand is heavy is there another way

robertlynch2020
Influencer

Hi

I have the following data.

I am looking to get a line per data, so I can work with it better.

If I use mvexpand I hit memory limits, as I need to do it on all the fields. Is there another way?

Or perhaps I just need to increase the mvexpand memory limits!

robertlynch2020_0-1742407509545.png

host="PMC_Sample_Data" index="murex_logs" sourcetype="Market_Risk_DT" 
| spath "resourceSpans{}.scopeSpans{}.spans{}.spanId" 
| rename resourceSpans{}.scopeSpans{}.spans{}.spanId as spanId 
| spath "resourceSpans{}.scopeSpans{}.spans{}.parentSpanId" 
| rename "resourceSpans{}.scopeSpans{}.spans{}.parentSpanId" as parentSpanId 
| spath "resourceSpans{}.scopeSpans{}.spans{}.startTimeUnixNano" 
| rename resourceSpans{}.scopeSpans{}.spans{}.startTimeUnixNano as start 
| spath "resourceSpans{}.scopeSpans{}.spans{}.endTimeUnixNano" 
| rename resourceSpans{}.scopeSpans{}.spans{}.endTimeUnixNano as end 
| spath resourceSpans{}.scopeSpans{}.spans{}.traceId 
| rename resourceSpans{}.scopeSpans{}.spans{}.traceId as traceId 
| table traceId spanId parentSpanId start end

Thanks in advance

 

 

Labels (1)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

Assuming individual scopeSpans are unique (which is likely since they contain timestamps and ids), try something like this

| spath resourceSpans{}.scopeSpans{}.spans{} output=scopeSpans
| stats count by scopeSpans
| spath input=scopeSpans

View solution in original post

robertlynch2020
Influencer

Hi

Thank you all for your future help on this.

Below is one example of an event I might have (Attached is the RAW data for 4 lines.). 

We have multiple Spans in the data. Inside that, we have various attributes.

I want to be able to put in one traceId: = XYZ and get the start and end, name, etc.. of all the Spans that I have.

robertlynch2020_0-1742464724601.png

So I was going to get it in a table format 1 - to - 1 and then when I have that data make tables, graphs, etc.. 

 I can say that when traceID = XYZ, I will have access to the other data. But as you can see I get an error if the data is too big.

robertlynch2020_0-1742465676354.png

This is the props I am using

 

[Market_Risk_DT]
DATETIME_CONFIG = 
LINE_BREAKER = ([\r\n]+)
MAX_TIMESTAMP_LOOKAHEAD = 1000000
NO_BINARY_CHECK = true
TIME_FORMAT = %s%3N
TIME_PREFIX = \"startTimeUnixNano\":"
category = Custom
description = Market_Risk_DT
disabled = false
pulldown_type = true

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Assuming individual scopeSpans are unique (which is likely since they contain timestamps and ids), try something like this

| spath resourceSpans{}.scopeSpans{}.spans{} output=scopeSpans
| stats count by scopeSpans
| spath input=scopeSpans

robertlynch2020
Influencer

Hi

Brilliant and thanks :). It working very well.

robertlynch2020_0-1742489823577.png

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Neat trick. But it moves processing to SH. I'd go for extracting spans, clearing other fields, possibly including _raw (to conserve memory), and going for mvexpand  on spans.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Yes, that might be an option, but even if it works once, there is the risk that it will still hit memory problems the next time.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

With stats you can still hit user's quota since it's not streaming but creates temp files which it later merges. 😉

Bit seriously - yes, since you're limited by memory constraints and mvexpand works in batches, there is a risk but that's why I advise stripping as much data as possible before mvexpanding.

0 Karma

livehybrid
Influencer

Do you have some raw data you can share with us? Im wondering if in your case it would be better to do the splitting before indexing the data, if possible, so that you are not relying on mvexpand. 

It isnt easy (or efficient - as you've found) to expand evants into multiple events at search-time.

Happy to try and help you index this in separate events if this would help though!

Please let me know how you get on and consider adding karma to this or any other answer if it has helped.
Regards

Will

robertlynch2020
Influencer

Hi

Thanks for your interest.

I have put a larger answer below my original one, I hope this gives you what you were looking for.

Rob

0 Karma

bowesmana
SplunkTrust
SplunkTrust

You're not actually showing us how you are using mvexpand. There is also no 1:1 relationship with parentSpanId with the other MV fields.

The general way to expand multiple MV fields in an event is to create composite fields and then expand that or to use stats by that field, but we'd need a better idea of what you're trying to end up with.

robertlynch2020
Influencer

Hi 

Thanks for getting back to me

I put more details below my original answer - There should be a 1-2-1 as A parent can be empty, this means that that SPAN has no parent. (So I might need to put in logic there, to put it as NA, to make sure the data line up correctly! )

I am not sure how to "create composite fields"?

 

Thanks in Advance

0 Karma

bowesmana
SplunkTrust
SplunkTrust

@robertlynch2020 to answer your composite field question:

Creating composite fields is simply a pattern to join MV fields where you have an equal correlation between those fields, i.e. for your example

...
| fields traceId spanId parentSpanId start end
| eval composite=mvzip(mvzip(mvzip(mvzip(traceId, spanId, "###"), parentSpanId, "###"), start, "###"), end, "###")
| fields composite
| mvexpand composite
| eval tmp=split(composite, "###"), 
| eval traceId=mvindex(tmp, 0), spanId=mvindex(tmp, 1), parentSpanId=mvindex(tmp, 2), start=mvindex(tmp, 3), end=mvindex(tmp, 4)
| fields - tmp composite

so it's just a pattern that fits the scenario where using stats will not solve your problem. Note always use fields to ensure ONLY the fields you want expanded, so as to minimise memory usage - that also will mean using

| fields - _time _raw

as they will remain after a positive fields statement because they are _ prefixed fields so are not automatically excluded. Do NOT use table before an mvexpand as table causes the data to be sent to the search head, so the expansion is done on the SH. (There is a possibility that it will be optimised away, but don't rely on that). Explicitly use fields so that it remains in the indexing tier and if you have multiple indexers, the memory footprint will be distributed.

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

You obviously have some not very pretty json structures. As @ITWhisperer said - show us a sample because for now you're extracting some fields which - as you say - are apparently multivalued.

But the values in each of them are unrelated to the values in other fields. So you're losing any connections between the values you might have had in the original json.

Another issue is that if you have four 3-valued fields and you do mvexpand on each of them you'll get the cartesian product of those fields - 3^4=81 separate result rows. I'm not sure that's what you want.

ITWhisperer
SplunkTrust
SplunkTrust

Please share some anonymised sample events (in a codeblock </> not as a picture) and a description of what you are trying to do.

Get Updates on the Splunk Community!

Mastering Data Pipelines: Unlocking Value with Splunk

 In today's AI-driven world, organizations must balance the challenges of managing the explosion of data with ...

The Latest Cisco Integrations With Splunk Platform!

Join us for an exciting tech talk where we’ll explore the latest integrations in Cisco &#43; Splunk! We’ve ...

AI Adoption Hub Launch | Curated Resources to Get Started with AI in Splunk

Hey Splunk Practitioners and AI Enthusiasts! It’s no secret (or surprise) that AI is at the forefront of ...