Merge indexes with flatmap

victor_znk · ‎08-11-2021

Hello,

I'm asking your help to merge two indexes. The first index is simply JSON documents compound. The second index is made up of JSON documents too but with array of documents. For example:

First index

{

"field1": "value1",  

"field2": "value2",

}

Second index

{
  ...other fields...
  documents: [{
     "field1": "value1"
     "field2": "value2"
  }, {
     "field1": "value1"
     "field2": "value2"
  }]
}

I want to be able to retrieve and flatmap documents from the second index and then merge it with the first index to be able to do stats operations.

Thank you

tscroggins · ‎08-15-2021

@victor_znk

Are you trying to emulate a flatMap() function, or are you trying to expand the objects in the second index's events' documents array into separate events?

If the latter, you can manipulate the array into individual events using rex, mvexpand, and spath:

index=a
| append
[ search index=b
| rex "\"documents\": \\[(?<documents>.*)\\]"
| rex field=documents max_match=0 ",?(?<documents>{.*?})"
| fields _time documents
| mvexpand documents
| spath input=documents
| fields - documents ]
| stats count by field1 field2

Depending on your statistical analysis requirements, you may also get the correct result by simply searching both indexes and renaming auto-extracted fields:

index IN (a b)
| rename documents{}.* as *
| stats sum(field1) sum(field2)

victor_znk · ‎08-16-2021

Hi, I was able to solve my problem using :

index="index1" | append [search index="index2" | spath path="documents{}" output=documents| mvexpand documents| eval _raw=documents| kv]

Thank you

tscroggins · ‎08-16-2021

Nice. Much easier to read. I tend to approach problems like this with regular expressions, but the JSON parser will take care of edge cases. Not sure which performs better, though.

victor_znk · ‎09-22-2021

In fact, I'm know facing an issue with the mvexpand function :

[MULTISEARCH #2]command.mvexpand: output will be truncated at 25000 results due to excessive memory usage. Memory threshold of 500MB

Merge indexes with flatmap

stats

Tech Talk Recap | Mastering Threat Hunting

Observability for AI Applications: Troubleshooting Latency

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

Are you a member of the Splunk Community?

Merge indexes with flatmap

stats

Tech Talk Recap | Mastering Threat Hunting

Observability for AI Applications: Troubleshooting Latency

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?