My use case: I want to create a timechart of the number (count) of requests to a system, split by "connection type": that is, how the requests arrived at the system.
The request type is represented in the log as a field named
conn_type containing a fixed-length string of 8 characters, where each character represents a hexadecimal digit. For example, the value
"0000000A" indicates that the request is from system XYZ.
I want the timechart legend to show descriptive labels, not these hex values. For example, instead of "0000000A", I want the legend to show something like "From system XYZ".
I could replace all of the original values with readable values before creating the timechart. For example:
... | eval conn_type=case(conn_type=="0000000A", "From system XYZ", ...) | timechart ...
but I'd prefer to use a technique that doesn't involve processing every input event for the timechart. That seems like too much processing.
I'd prefer to rename the fields after the
timechart command, like this:
... | timechart count by conn_type | rename "0000000A" as "From system XYZ", "0000000B" as "Entered on the command line" ...
That works, but I wonder, rather than coding this
rename command inline, I could use a CSV file as a lookup, and dynamically build the
rename command. Hence this question.
For example, given the CSV file
connection_types.csv with the following structure:
conn_type,description 0000000A,From system XYZ 0000000B,Entered on the command line ...
(There are a dozen or so connection types.)
can I use a subsearch to build a
rename command as a string, as done by this search string:
|inputlookup connection_types.csv | table conn_type description | eval rename_phrase=conn_type + " as " + "\"" + description + "\"" | stats values(rename_phrase) as rename_phrases | eval search="rename " + mvjoin(rename_phrases,", ") | fields search
and then - here's the trick - run that returned string as a command?
For now, I'll use
rename after the
timechart - it works - but I'm curious to know whether what I've described here is possible.
If I understand your question correctly, you should do the lookup for conn_type before timechart, and count by the output field (i.e., description), e.g.,
Thu Apr 28 18:59:56 CST 2016 conn_type=0000000A Thu Apr 28 19:00:06 CST 2016 conn_type=0000000B Thu Apr 28 19:02:48 CST 2016 conn_type=0000000B Thu Apr 28 19:02:51 CST 2016 conn_type=0000000C Thu Apr 28 19:02:53 CST 2016 conn_type=0000000A Thu Apr 28 19:02:56 CST 2016 conn_type=0000000D Thu Apr 28 19:02:59 CST 2016 conn_type=0000000B Thu Apr 28 19:02:59 CST 2016 conn_type=0000000A Thu Apr 28 19:03:00 CST 2016 conn_type=0000000A Thu Apr 28 19:03:00 CST 2016 conn_type=0000000A Thu Apr 28 19:03:01 CST 2016 conn_type=0000000B Thu Apr 28 19:03:02 CST 2016 conn_type=0000000D Thu Apr 28 19:03:03 CST 2016 conn_type=0000000B Thu Apr 28 19:03:04 CST 2016 conn_type=0000000B Thu Apr 28 19:03:05 CST 2016 conn_type=0000000D Thu Apr 28 19:03:07 CST 2016 conn_type=0000000A
source=*test.log | lookup connection_types.csv conn_type OUTPUT description | timechart count by description
then you'll get something like this:
_time Entered on the command line From system XYZ NULL 2016-04-29 09:02:40 0 0 0 2016-04-29 09:02:45 1 0 0 2016-04-29 09:02:50 0 1 1 2016-04-29 09:02:55 1 1 1 2016-04-29 09:03:00 3 2 1 2016-04-29 09:03:05 0 1 1
Thanks very much for the detailed answer.
you should do the lookup for conn_type before timechart
I disagree. For details, see my comment on the answer by @richgalloway.
If you do the lookup before
timechart there's no need for rename.
... | lookup connection_types.csv conn_type OUTPUT description | timechart count by description
Thanks! And, yes, you're absolutely correct.
However, the following point from my question remains:
I could replace all of the original values with readable values before creating the timechart [...] but I'd prefer to use a technique that doesn't involve processing every input event for the timechart. That seems like too much processing.
The Splunk documentation makes the same point:
Optimizing your lookup search
If you are using the lookup command in the same pipeline as a transforming command, and it is possible to retain the field you will lookup on after the transforming command, do the lookup after the transforming command.
This morning, I came up with the following solution:
sourcetype=my_log_type | timechart count by conn_type | untable _time conn_type value | lookup connection_types.csv conn_type output description | xyseries _time description value
lookup instead of renaming fields.
I now have a solution for my use case, so my original question about dynamically building commands is, to me, academic (until I hit a use case that needs it).
Thanks again for your input. Your answer prompted me to think harder about how to do the lookup after
Please accept an answer to help others in the future who may have a similar question.
The solution I came up with (mentioned in my previous comment) was flawed, because
xyseries does not produce the same "intelligent" X-axis labels as
timechart. @Jeremiah provided the following improved solution (in response to a different question😞
sourcetype=my_log_type | bin _time | stats count by _time, conn_type | lookup connection_types.csv conn_type output description | timechart sum(count) as count by description
Thanks for the tip about the
map command, and for the link.
I've read and reread your answer to that question carefully, and recaffeinated ;-), but I cannot see how to adjust that approach for my use case. In the meantime, I've developed a solution for my use case that avoids this question. For details, see my comment on the answer by @richgalloway. (I'll admit, this demotivates me from spending more time right now looking into
Still, I suspect that
map is as close an answer as I'm going to get (to this specific question about dynamically building commands) without delving into developing a custom search command in Python.
If this is still the best answer after the weekend, I'll accept it. Thanks again!
I'm still grateful for the pointer to
map, but decided to accept the answer from @bchung - not because of this original answer, but because of the example script he later added in a comment. (I'll take tips on how to better manage acceptance of answers in such a situation: I want to do the right thing; I don't want to offend anyone.)