Anyone have any thoughts as to how to reorder a multi-valued field? Ideally I'd like to be able to do a "sort" or in my specific use case, a "reverse" would be perfect.
Say you have the following search:
my search | stats list(myfield) as myfields by id
The list()
stats operator preserves all values of "myfield" in the events and preserves order, which is what I want. However, I'd really like to see the values of "myfield" in time order (not reverse time order.) I know I can stick a | reverse
in there, but I was trying to figure out if there was a better approach that only modifies the "myfields" field, and doesn't require screwing with event order.
(In my non-trivial version of this search, I'm using a transaction
command as well, and it has issues when you start messing with time-order. That's just one example of why re-ordering the events is not ideal.)
Hi Lowell, I implemented the deduplication and sorting functionality in a custom command. Being your experience far greater than mine you won't have any problem to remove the deduplication logic (and maybe suggest any improvement 😉
Syntax:
| mvdedup [[+|-]fieldname ]*
with one or more fieldnames prepended by a +|- (no empty space there!): will dedup and sort ascending/descending
| mvdedup -id
Here's the configs:
commands.conf
[mvdedup]
type = python
streaming = true
maxinputs = 500000
run_in_preview = true
enableheader = true
retainsevents = true
generating = false
generates_timeorder = false
supports_multivalues = true
supports_getinfo = true
mvdedup.py
import sys
import splunk.Intersplunk as si
import string
def uniqfy(seq,sortorder=None):
seen = {}
result = []
for item in seq:
if item in seen: continue
seen[item] = 1
result.append(item)
if sortorder=='+':
result.sort()
elif sortorder=='-':
result.sort(reverse=True)
return result
(isgetinfo, sys.argv) = si.isGetInfo(sys.argv)
if isgetinfo:
#outputInfo(streaming, generating, retevs, reqsop, preop, timeorder=False):
si.outputInfo(True, False, True, False, None, False)
sys.exit(0)
results = si.readResults(None, None, True)
fields={}
if len(sys.argv) > 1:
for a in sys.argv[1:]:
a=str(a)
if a[:1] in ['+','-']:
# set sorting order
fields[a[1:]] = a[:1]
else:
# no sorting!
fields[a] = None
else:
# dedup on all the fields in the data
for k in results[0].keys():
fields[k] = None
for i in range(len(results)):
for key in results[i].keys():
if(isinstance(results[i][key], list)):
if key in fields.keys():
results[i][key] = uniqfy(results[i][key],fields[key])
si.outputResults(results)
Any suggestion is more than welcome
For anyone following along at home, who hasn't looked in the docs first, ....
There is now a builtin mvdedup eval function.
Both mvdedup and mvsort were added as evaluation functions (ie you can use them in eval and where) in 6.2.
| eval mvfield=mvsort(mvfield)
and
| eval mvfield=mvdedup(mvfield)
I have the same need, and if I solve it by adding in the "reverse" command before the stats, it reduces my search performance by almost 40%. I'm eager for another answer here that doesnt involve a custom python command.
Hi Lowell, I implemented the deduplication and sorting functionality in a custom command. Being your experience far greater than mine you won't have any problem to remove the deduplication logic (and maybe suggest any improvement 😉
Syntax:
| mvdedup [[+|-]fieldname ]*
with one or more fieldnames prepended by a +|- (no empty space there!): will dedup and sort ascending/descending
| mvdedup -id
Here's the configs:
commands.conf
[mvdedup]
type = python
streaming = true
maxinputs = 500000
run_in_preview = true
enableheader = true
retainsevents = true
generating = false
generates_timeorder = false
supports_multivalues = true
supports_getinfo = true
mvdedup.py
import sys
import splunk.Intersplunk as si
import string
def uniqfy(seq,sortorder=None):
seen = {}
result = []
for item in seq:
if item in seen: continue
seen[item] = 1
result.append(item)
if sortorder=='+':
result.sort()
elif sortorder=='-':
result.sort(reverse=True)
return result
(isgetinfo, sys.argv) = si.isGetInfo(sys.argv)
if isgetinfo:
#outputInfo(streaming, generating, retevs, reqsop, preop, timeorder=False):
si.outputInfo(True, False, True, False, None, False)
sys.exit(0)
results = si.readResults(None, None, True)
fields={}
if len(sys.argv) > 1:
for a in sys.argv[1:]:
a=str(a)
if a[:1] in ['+','-']:
# set sorting order
fields[a[1:]] = a[:1]
else:
# no sorting!
fields[a] = None
else:
# dedup on all the fields in the data
for k in results[0].keys():
fields[k] = None
for i in range(len(results)):
for key in results[i].keys():
if(isinstance(results[i][key], list)):
if key in fields.keys():
results[i][key] = uniqfy(results[i][key],fields[key])
si.outputResults(results)
Any suggestion is more than welcome
Thanks for this answer. It helped me answer my question - http://answers.splunk.com/answers/115137/joining-across-field-matrix