My use case: I want to create a timechart of the number (count) of requests to a system, split by "connection type": that is, how the requests arrived at the system.
The request type is represented in the log as a field named conn_type
containing a fixed-length string of 8 characters, where each character represents a hexadecimal digit. For example, the value "0000000A"
indicates that the request is from system XYZ.
I want the timechart legend to show descriptive labels, not these hex values. For example, instead of "0000000A", I want the legend to show something like "From system XYZ".
I could replace all of the original values with readable values before creating the timechart. For example:
... | eval conn_type=case(conn_type=="0000000A", "From system XYZ", ...) | timechart ...
but I'd prefer to use a technique that doesn't involve processing every input event for the timechart. That seems like too much processing.
I'd prefer to rename the fields after the timechart
command, like this:
... | timechart count by conn_type | rename "0000000A" as "From system XYZ", "0000000B" as "Entered on the command line" ...
That works, but I wonder, rather than coding this rename
command inline, I could use a CSV file as a lookup, and dynamically build the rename
command. Hence this question.
For example, given the CSV file connection_types.csv
with the following structure:
conn_type,description
0000000A,From system XYZ
0000000B,Entered on the command line
...
(There are a dozen or so connection types.)
can I use a subsearch to build a rename
command as a string, as done by this search string:
|inputlookup connection_types.csv | table conn_type description | eval rename_phrase=conn_type + " as " + "\"" + description + "\"" | stats values(rename_phrase) as rename_phrases | eval search="rename " + mvjoin(rename_phrases,", ") | fields search
and then - here's the trick - run that returned string as a command?
For now, I'll use rename
after the timechart
- it works - but I'm curious to know whether what I've described here is possible.
Hi Graham,
If lookup before timechart isn't what you're looking for (I guess you have many events but only few conn types),
have you consider using custom search command?
here's a simple script that might work for you (might still need to be tuned to really work for you):
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import splunk.Intersplunk as sis
(a, kwargs) = sis.getKeywordsAndOptions()
def main():
results = sis.readResults(None, None, True)
conn_types = {
'0000000A': 'From system XYZ',
'0000000B': 'Entered on the command line',
'0000000C': 'from C',
'0000000D': 'others'
}
for row in results:
for key in row:
if conn_types.get(key.strip()):
row[conn_types[key]] = row[key]
del row[key]
sis.outputResults(results)
return 0
try:
main()
except Exception, e:
import traceback
stack = traceback.format_exc()
sis.generateErrorResults("Error '{e}'. {s}".format(e=e, s=stack))
You can place this script in $SPLUNK_HOME/etc/apps/search/bin/
, say renamecolumns.py
,
and add/edit $SPLUNK_HOME/etc/app/search/local/commands.conf
:
[renamecolumns]
filename = renamecolumns.py
then search string:
source=*test.log | timechart count by conn_type | renamecolumns
Basically this reads the results and modify column names for you,
and of course you can read file from csv if you'd like to.
but if you have many events and/or large lookup,
you might need to test which is faster - custom command or lookup
HTH,
Bill
Hi Graham,
If lookup before timechart isn't what you're looking for (I guess you have many events but only few conn types),
have you consider using custom search command?
here's a simple script that might work for you (might still need to be tuned to really work for you):
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import splunk.Intersplunk as sis
(a, kwargs) = sis.getKeywordsAndOptions()
def main():
results = sis.readResults(None, None, True)
conn_types = {
'0000000A': 'From system XYZ',
'0000000B': 'Entered on the command line',
'0000000C': 'from C',
'0000000D': 'others'
}
for row in results:
for key in row:
if conn_types.get(key.strip()):
row[conn_types[key]] = row[key]
del row[key]
sis.outputResults(results)
return 0
try:
main()
except Exception, e:
import traceback
stack = traceback.format_exc()
sis.generateErrorResults("Error '{e}'. {s}".format(e=e, s=stack))
You can place this script in $SPLUNK_HOME/etc/apps/search/bin/
, say renamecolumns.py
,
and add/edit $SPLUNK_HOME/etc/app/search/local/commands.conf
:
[renamecolumns]
filename = renamecolumns.py
then search string:
source=*test.log | timechart count by conn_type | renamecolumns
Basically this reads the results and modify column names for you,
and of course you can read file from csv if you'd like to.
but if you have many events and/or large lookup,
you might need to test which is faster - custom command or lookup
HTH,
Bill
(Picks up dropped mic.) Thank you, @bchung. You just fast-forwarded my reality ;-). I knew I'd have to look at developing custom search commands in Python sooner or later, but you've handed to me on a plate an example that I can understand, use immediately, and customize. Thank you, thank you. I've accepted your answer based on this latest comment (I acknowledge this might be dodgy etiquette, but I am not going to ask you to submit this as a separate answer).
Glad to help 🙂 please feel free to ask if you got any issue with custom commands!
And thanks @MuS for converting this to an answer from comment.
I believe that you can use the approach listed here with some adjustments; the key is the map
command:
https://answers.splunk.com/answers/386488/regex-in-lookuptable.html#answer-387536
Thanks for the tip about the map
command, and for the link.
I've read and reread your answer to that question carefully, and recaffeinated ;-), but I cannot see how to adjust that approach for my use case. In the meantime, I've developed a solution for my use case that avoids this question. For details, see my comment on the answer by @richgalloway. (I'll admit, this demotivates me from spending more time right now looking into map
.)
Still, I suspect that map
is as close an answer as I'm going to get (to this specific question about dynamically building commands) without delving into developing a custom search command in Python.
If this is still the best answer after the weekend, I'll accept it. Thanks again!
I'm still grateful for the pointer to map
, but decided to accept the answer from @bchung - not because of this original answer, but because of the example script he later added in a comment. (I'll take tips on how to better manage acceptance of answers in such a situation: I want to do the right thing; I don't want to offend anyone.)
@Graham_Hannington , I just converted the comment to an answer. Fell free to accept the correct one now.
@MuS, done. Thanks!
If you do the lookup before timechart
there's no need for rename.
... | lookup connection_types.csv conn_type OUTPUT description | timechart count by description
Thanks! And, yes, you're absolutely correct.
However, the following point from my question remains:
I could replace all of the original values with readable values before creating the timechart [...] but I'd prefer to use a technique that doesn't involve processing every input event for the timechart. That seems like too much processing.
The Splunk documentation makes the same point:
Optimizing your lookup search
If you are using the lookup command in the same pipeline as a transforming command, and it is possible to retain the field you will lookup on after the transforming command, do the lookup after the transforming command.
This morning, I came up with the following solution:
sourcetype=my_log_type | timechart count by conn_type | untable _time conn_type value | lookup connection_types.csv conn_type output description | xyseries _time description value
This uses untable
and xyseries
with lookup
instead of renaming fields.
I now have a solution for my use case, so my original question about dynamically building commands is, to me, academic (until I hit a use case that needs it).
Thanks again for your input. Your answer prompted me to think harder about how to do the lookup after timechart
.
The solution I came up with (mentioned in my previous comment) was flawed, because xyseries
does not produce the same "intelligent" X-axis labels as timechart
. @Jeremiah provided the following improved solution (in response to a different question😞
sourcetype=my_log_type | bin _time | stats count by _time, conn_type | lookup connection_types.csv conn_type output description | timechart sum(count) as count by description
Please accept an answer to help others in the future who may have a similar question.
Hi Graham,
If I understand your question correctly, you should do the lookup for conn_type before timechart, and count by the output field (i.e., description), e.g.,
test.log:
Thu Apr 28 18:59:56 CST 2016 conn_type=0000000A
Thu Apr 28 19:00:06 CST 2016 conn_type=0000000B
Thu Apr 28 19:02:48 CST 2016 conn_type=0000000B
Thu Apr 28 19:02:51 CST 2016 conn_type=0000000C
Thu Apr 28 19:02:53 CST 2016 conn_type=0000000A
Thu Apr 28 19:02:56 CST 2016 conn_type=0000000D
Thu Apr 28 19:02:59 CST 2016 conn_type=0000000B
Thu Apr 28 19:02:59 CST 2016 conn_type=0000000A
Thu Apr 28 19:03:00 CST 2016 conn_type=0000000A
Thu Apr 28 19:03:00 CST 2016 conn_type=0000000A
Thu Apr 28 19:03:01 CST 2016 conn_type=0000000B
Thu Apr 28 19:03:02 CST 2016 conn_type=0000000D
Thu Apr 28 19:03:03 CST 2016 conn_type=0000000B
Thu Apr 28 19:03:04 CST 2016 conn_type=0000000B
Thu Apr 28 19:03:05 CST 2016 conn_type=0000000D
Thu Apr 28 19:03:07 CST 2016 conn_type=0000000A
search command:
source=*test.log | lookup connection_types.csv conn_type OUTPUT description | timechart count by description
then you'll get something like this:
_time Entered on the command line From system XYZ NULL
2016-04-29 09:02:40 0 0 0
2016-04-29 09:02:45 1 0 0
2016-04-29 09:02:50 0 1 1
2016-04-29 09:02:55 1 1 1
2016-04-29 09:03:00 3 2 1
2016-04-29 09:03:05 0 1 1
HTH,
Bill
Hi Bill,
Thanks very much for the detailed answer.
However:
you should do the lookup for conn_type before timechart
I disagree. For details, see my comment on the answer by @richgalloway.