Splitting a large multivalue field into multiple m...

mjones414 · ‎09-07-2022

I've run into a scenario where when running stats over an index, its possible I can generate a multivalue field with over 11K unique 38 character guid values but it can be as small as 1 38 character guid. I have a need to pass those resulting guids as a string into something that has a character length limit of 999 characters.
Is there a way whereby incrementing by 38 characters, I can split the field into multiple fields up to 950 characters max per field (which should be 25 guids), dynamically since I wont know how many are going to come in at any given time?

bowesmana · ‎09-07-2022

Interesting ...

I'm not sure there's a simple way to do this - but someone will no doubt come up with a one liner.

Here's a stab at an example

| makeresults count=10
``` Generate a random number of 38 character random guids ```
| eval joint_id=random()
| eval count=mvrange(1, random() % 500, 1)
| mvexpand count
| eval guid=printf("%08x-%08x-%08x-%08x%03x", random(), random(), random(), random(), random() % 2048)
``` Now collect these guids back to their common owner ```
| stats values(guid) as guids by joint_id
| eval guid_count=mvcount(guids), len=guid_count * 38
``` Now nested foreach will do up to 20 * each group of 25 guids 
    and create a block_X field, where X is the incrementing group ```
| eval s=0, e=24, b=0
| foreach guids [ foreach 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 [ eval block_{b}=mvindex(<<FIELD>>, s, e), b=b+1, s=e+1, e=e+25 ] ]
``` Now combine all the block_* fields into a single blocks field where
    each 38*25 sequence is joined together as a string ```
| foreach block_* [ eval blocks=mvappend(blocks, mvjoin(<<FIELD>>,"")) ]
| fields - block_*

If you wanted to cater for 11K then the inner foreach would have to be 11000/25=440

The other possible alternative I can see is to use "streamstats c" and then to make 'groups' of these guids of 25 per group. Depending on how you are collecting these guids into their MV equivalent, this may work for you, but assuming you have already the aggregated guids, then you would do

| makeresults count=10
``` Generate a random number of 38 character random guids ```
| eval joint_id=random()
| eval count=mvrange(1, random() % 10000, 1)
| mvexpand count
| eval guid=printf("%08x-%08x-%08x-%08x%03x", random(), random(), random(), random(), random() % 2048)
| streamstats count by joint_id
| eval group=floor((count-1)/25)
``` Now collect these guids back to their common owner ```
| stats values(guid) as guids by joint_id group
| eval guid_count=mvcount(guids)

In your case, you would need to mvexpand the guids, then use streamstats to count by some common ID and then create the group as above, then re-stats to aggregate again by group - then each row has up to 25 guids

Hopefully this gets you somewhere

Splitting a large multivalue field into multiple multivalue fields based on character count?

troubleshooting

using Splunk Enterprise

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

Splunk Observability for AI

🔐 Trust at Every Hop: How mTLS in Splunk Enterprise 10.0 Makes Security Simpler

Are you a member of the Splunk Community?