Hi,
I've got these strange XML logs, where each log has (among other things) a username and an arbitrary number of hashes, each stored in its own XML field. A simplified version of the log is shown below.
[...]<user>hettervi</user><hash1>sdflkjsdf</hash1><hash2>sdfoiujkalw</hash2>[...]<hashn>powkerldsf</hashn>
There are usually no more than around 13-14 hashes for each event, and what I'm trying to do is to count by users and hashes. To do this I've used the foreach and mvappend command to make the XML fields into a multivalue field, and then count the by that new multivalue field, like shown in the search below.
| foreach hash* [ eval hashes=mvappend(hashes, '<<FIELD>>')]
| stats count by hashes user
The problem is this is quite slow, mostly due to the big amount of logs. I've looked into making a multivalue indexed field so that I can use tstats instead of stats, or use an accelerated datamodel with a multivalue field for the hashes, but as far as I can tell this isn't possible. Any idea on how I can make this search faster, e.g. by doing some indexing and tstats magic?
something like
[extract_hashes]
REGEX = <hash\d+>([^<]+)
FORMAT = hash::$1
REPEAT_MATCH = true
WRITE_META = true```
and obviously, props.conf TRANSFORMS-extract_hashes = extract_hashes
then you might be able to do | tstats count where foo by user hash
give that a shot in a sandbox.
something like
[extract_hashes]
REGEX = <hash\d+>([^<]+)
FORMAT = hash::$1
REPEAT_MATCH = true
WRITE_META = true```
and obviously, props.conf TRANSFORMS-extract_hashes = extract_hashes
then you might be able to do | tstats count where foo by user hash
give that a shot in a sandbox.
Thanks, this worked perfectly (in the sandbox)! Using this config we get indexed multivalue hash fields for the events, which I didn't even know was possible. Like how does the multivalue fields get stored in the metadata? Anyhow, I've requested the config to be implemented in prod now, which should speed up my search drastically.
Hey @hettervi, if they solved your problem, remember to "√Accept" an answer to award karma points 🙂
I will, but it is solved quite yet. I'm in Europe, so expect some answer lag from my side. 🙂
I'm assuming you want count by the value
of the hash, not the name
of the hash. If not, you can adjust the last line.
Try this -
| fields user hash*
| untable user hashname hashvalue
| stats count by user hashvalue
Hi. Thanks, but this wasn't quite what I was looking for. I'm looking in the docs, and I can't quite get this command to fit with my data, that is, I have no field for hash names. The multivalue field I created contains all the hash values, which is originally stored in independent fields (hash1, hash2, ... , hashn).
Anyhow, what I was really looking for was a way to index the fields so that I can do accelerated searches on them, like tstats. I'm already getting the right results, no worries, but it takes to long. I can't just index the fields in a traditional way I think, as this would still require me to retrieve all the events to the search heads before doing calculations, so the field indexing would have no real effect.