Splunk Search

How to index arbitrary number of fields and do tstats operations on them?

hettervi
Builder

Hi,

I've got these strange XML logs, where each log has (among other things) a username and an arbitrary number of hashes, each stored in its own XML field. A simplified version of the log is shown below.

[...]<user>hettervi</user><hash1>sdflkjsdf</hash1><hash2>sdfoiujkalw</hash2>[...]<hashn>powkerldsf</hashn>

There are usually no more than around 13-14 hashes for each event, and what I'm trying to do is to count by users and hashes. To do this I've used the foreach and mvappend command to make the XML fields into a multivalue field, and then count the by that new multivalue field, like shown in the search below.

| foreach hash* [ eval hashes=mvappend(hashes, '<<FIELD>>')]
| stats count by hashes user

The problem is this is quite slow, mostly due to the big amount of logs. I've looked into making a multivalue indexed field so that I can use tstats instead of stats, or use an accelerated datamodel with a multivalue field for the hashes, but as far as I can tell this isn't possible. Any idea on how I can make this search faster, e.g. by doing some indexing and tstats magic?

0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

something like

[extract_hashes]
REGEX = <hash\d+>([^<]+)
FORMAT = hash::$1
REPEAT_MATCH = true
WRITE_META = true```

and obviously, props.conf TRANSFORMS-extract_hashes = extract_hashes
then you might be able to do | tstats count where foo by user hash

give that a shot in a sandbox.

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

something like

[extract_hashes]
REGEX = <hash\d+>([^<]+)
FORMAT = hash::$1
REPEAT_MATCH = true
WRITE_META = true```

and obviously, props.conf TRANSFORMS-extract_hashes = extract_hashes
then you might be able to do | tstats count where foo by user hash

give that a shot in a sandbox.

hettervi
Builder

Thanks, this worked perfectly (in the sandbox)! Using this config we get indexed multivalue hash fields for the events, which I didn't even know was possible. Like how does the multivalue fields get stored in the metadata? Anyhow, I've requested the config to be implemented in prod now, which should speed up my search drastically.

0 Karma

lfedak_splunk
Splunk Employee
Splunk Employee

Hey @hettervi, if they solved your problem, remember to "√Accept" an answer to award karma points 🙂

0 Karma

hettervi
Builder

I will, but it is solved quite yet. I'm in Europe, so expect some answer lag from my side. 🙂

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

I'm assuming you want count by the value of the hash, not the name of the hash. If not, you can adjust the last line.

Try this -

| fields user hash*
| untable user hashname hashvalue
| stats count by user hashvalue
0 Karma

hettervi
Builder

Hi. Thanks, but this wasn't quite what I was looking for. I'm looking in the docs, and I can't quite get this command to fit with my data, that is, I have no field for hash names. The multivalue field I created contains all the hash values, which is originally stored in independent fields (hash1, hash2, ... , hashn).

Anyhow, what I was really looking for was a way to index the fields so that I can do accelerated searches on them, like tstats. I'm already getting the right results, no worries, but it takes to long. I can't just index the fields in a traditional way I think, as this would still require me to retrieve all the events to the search heads before doing calculations, so the field indexing would have no real effect.

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!