I'm looking to get some json data from our anomaly detection system into the Intrusion Detection data model and thus need to map the fields to the CIM. The json events vary depending on the model being 'breached' and therefore not all events will contain the dest and src fields in the same place.
The json data contains many multi-value fields and because the required data is not always in its own single-value field (I can just alias those) but is sometimes in an array, but not always in the same place (depending on the triggers that caused each model to breach) so I need to search each array for certain indicators such as "Destination Endpoint" (lets call this 'A') and then map the actual endpoint name ('B') from another field, using the array location of A. I've been looking at the mvfind command but before I spend a great deal of time on this I was wondering if my approach is correct or if what I want to do is even possible in the first place. E.g. can I use the output of mvfind as an input of spath maybe?
Once I've got a search working I'll be looking to extract the values automatically and I'm assuming at search time would still be okay for the data model? The frequency of events is not very high (one or two every five minutes or less) so I don't think that an index-time extraction would put too much load on the HF/INDXR.
I can't really share an example event due to the nature of it's contents but let me know if there's any more info that would help. Thanks very much in advance.
When you say "I need to search each array for certain indicators such as "Destination Endpoint" (lets call this 'A') and then map the actual endpoint name ('B') from another field, using the array location of A" does that mean that you have two "parallel" arrays? Assuming this to be the case, and that you have extracted these to two multi-value fields, can you mvzip them together to at least get "Destination Endpoint" and the corresponding "actual endpoint" into a single instance of the combined field?
Hi @Dworsnop,
Not sure if this helps you, but I had fun playing around with mvfind, mvindex and spath.
Conclusion: You can not use a field value as an "index input" for spath.
So, this does not work:
| eval n=1
| spath output=somefield path=yourarray{n}
But, you can dump the whole array to a mvfield with spath and then get the desired value with mvindex, where you can use a field value for the index indicator.
To test it, I indexed a json with a drinks array and combined it with some meals.
(The AI picked the wrong drink for my burger, so I corrected it 🙂 )
source="json_drinks"
| eval meals="pasta burger pizza"
| makemv meals
| eval n=mvfind(meals,"burger")
| eval 'meal_selection'=mvindex(meals, n)
| spath output=drinks path=drinks{}
| eval 'drink_selection'=mvindex(drinks, n)
| eval noiwantbeer=n-1
| eval 'drink_selection_correction'=mvindex(drinks, noiwantbeer)
| table 'meal_selection', 'drink_selection', 'drink_selection_correction'
For completness, the drinks array I used:
{
"drinks": [
"beer",
"coke",
"water"
]
}
BR
Ralph
Hi @rnowitzki , thanks very much for the reply, I've used @ITWhisperer 's method and it's got me halfway there now.
When you say "I need to search each array for certain indicators such as "Destination Endpoint" (lets call this 'A') and then map the actual endpoint name ('B') from another field, using the array location of A" does that mean that you have two "parallel" arrays? Assuming this to be the case, and that you have extracted these to two multi-value fields, can you mvzip them together to at least get "Destination Endpoint" and the corresponding "actual endpoint" into a single instance of the combined field?
Thanks very much @ITWhisperer , I should have spent more time looking at the mv eval functions, that's worked a treat. Now for the hard part...
I now have an mv field containing thousands of details for each model breach, but because each model looks at different connections/activity, the src and dest information will be called different things (e.g. "Connection hostname:<value>", "Destination IP:<value>","Internal source device name:<value>" and so on).
What would be the best way to extract those src and dest values (once I've obviously determined the correct name depending on which model is being breached - there's going to be 50+ models I'll have to do this for)? My search currently looks like this...
... | spath output=trigger_name path=triggeredComponents{}.triggeredFilters{}.filterType | spath output=trigger_value path=triggeredComponents{}.triggeredFilters{}.trigger.value | eval new_trig=mvzip(trigger_name,trigger_value,":") | stats count by model.name, new_trig
Once I've got the matching src and dest fields for each model, where & how would I perform these extractions at search/index time? I already have a TA for the data, would I put it in props.conf, inputs.conf or somewhere else?
Thanks again! 🙂