Splunk Search

Can a search string dynamically build commands, and then run them?

Graham_Hanningt
Builder

My use case: I want to create a timechart of the number (count) of requests to a system, split by "connection type": that is, how the requests arrived at the system.

The request type is represented in the log as a field named conn_type containing a fixed-length string of 8 characters, where each character represents a hexadecimal digit. For example, the value "0000000A" indicates that the request is from system XYZ.

I want the timechart legend to show descriptive labels, not these hex values. For example, instead of "0000000A", I want the legend to show something like "From system XYZ".

I could replace all of the original values with readable values before creating the timechart. For example:

... | eval conn_type=case(conn_type=="0000000A", "From system XYZ", ...) | timechart ...

but I'd prefer to use a technique that doesn't involve processing every input event for the timechart. That seems like too much processing.

I'd prefer to rename the fields after the timechart command, like this:

... | timechart count by conn_type | rename "0000000A" as "From system XYZ", "0000000B" as "Entered on the command line" ...

That works, but I wonder, rather than coding this rename command inline, I could use a CSV file as a lookup, and dynamically build the rename command. Hence this question.

For example, given the CSV file connection_types.csv with the following structure:

conn_type,description
0000000A,From system XYZ
0000000B,Entered on the command line
...

(There are a dozen or so connection types.)

can I use a subsearch to build a rename command as a string, as done by this search string:

|inputlookup connection_types.csv | table conn_type description | eval rename_phrase=conn_type + " as " + "\"" + description + "\"" | stats values(rename_phrase) as rename_phrases | eval search="rename " + mvjoin(rename_phrases,", ") | fields search

and then - here's the trick - run that returned string as a command?

For now, I'll use rename after the timechart - it works - but I'm curious to know whether what I've described here is possible.

0 Karma
1 Solution

bchung_splunk
Splunk Employee
Splunk Employee

Hi Graham,

If lookup before timechart isn't what you're looking for (I guess you have many events but only few conn types),
have you consider using custom search command?

here's a simple script that might work for you (might still need to be tuned to really work for you):

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import splunk.Intersplunk as sis
(a, kwargs) = sis.getKeywordsAndOptions()

def main():
    results = sis.readResults(None, None, True)
    conn_types = {
        '0000000A': 'From system XYZ',
        '0000000B': 'Entered on the command line',
        '0000000C': 'from C',
        '0000000D': 'others'
    }
    for row in results:
        for key in row:
            if conn_types.get(key.strip()):
                row[conn_types[key]] = row[key]
                del row[key]

    sis.outputResults(results)
    return 0

try:
    main()
except Exception, e:
    import traceback
    stack =  traceback.format_exc()
    sis.generateErrorResults("Error '{e}'. {s}".format(e=e, s=stack))

You can place this script in $SPLUNK_HOME/etc/apps/search/bin/, say renamecolumns.py,
and add/edit $SPLUNK_HOME/etc/app/search/local/commands.conf:

[renamecolumns]
filename = renamecolumns.py

then search string:

source=*test.log | timechart count by conn_type | renamecolumns

Basically this reads the results and modify column names for you,
and of course you can read file from csv if you'd like to.
but if you have many events and/or large lookup,
you might need to test which is faster - custom command or lookup

HTH,
Bill

View solution in original post

bchung_splunk
Splunk Employee
Splunk Employee

Hi Graham,

If lookup before timechart isn't what you're looking for (I guess you have many events but only few conn types),
have you consider using custom search command?

here's a simple script that might work for you (might still need to be tuned to really work for you):

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import splunk.Intersplunk as sis
(a, kwargs) = sis.getKeywordsAndOptions()

def main():
    results = sis.readResults(None, None, True)
    conn_types = {
        '0000000A': 'From system XYZ',
        '0000000B': 'Entered on the command line',
        '0000000C': 'from C',
        '0000000D': 'others'
    }
    for row in results:
        for key in row:
            if conn_types.get(key.strip()):
                row[conn_types[key]] = row[key]
                del row[key]

    sis.outputResults(results)
    return 0

try:
    main()
except Exception, e:
    import traceback
    stack =  traceback.format_exc()
    sis.generateErrorResults("Error '{e}'. {s}".format(e=e, s=stack))

You can place this script in $SPLUNK_HOME/etc/apps/search/bin/, say renamecolumns.py,
and add/edit $SPLUNK_HOME/etc/app/search/local/commands.conf:

[renamecolumns]
filename = renamecolumns.py

then search string:

source=*test.log | timechart count by conn_type | renamecolumns

Basically this reads the results and modify column names for you,
and of course you can read file from csv if you'd like to.
but if you have many events and/or large lookup,
you might need to test which is faster - custom command or lookup

HTH,
Bill

Graham_Hanningt
Builder

(Picks up dropped mic.) Thank you, @bchung. You just fast-forwarded my reality ;-). I knew I'd have to look at developing custom search commands in Python sooner or later, but you've handed to me on a plate an example that I can understand, use immediately, and customize. Thank you, thank you. I've accepted your answer based on this latest comment (I acknowledge this might be dodgy etiquette, but I am not going to ask you to submit this as a separate answer).

0 Karma

bchung_splunk
Splunk Employee
Splunk Employee

Glad to help 🙂 please feel free to ask if you got any issue with custom commands!
And thanks @MuS for converting this to an answer from comment.

0 Karma

woodcock
Esteemed Legend

I believe that you can use the approach listed here with some adjustments; the key is the map command:

https://answers.splunk.com/answers/386488/regex-in-lookuptable.html#answer-387536

Graham_Hanningt
Builder

Thanks for the tip about the map command, and for the link.

I've read and reread your answer to that question carefully, and recaffeinated ;-), but I cannot see how to adjust that approach for my use case. In the meantime, I've developed a solution for my use case that avoids this question. For details, see my comment on the answer by @richgalloway. (I'll admit, this demotivates me from spending more time right now looking into map.)

Still, I suspect that map is as close an answer as I'm going to get (to this specific question about dynamically building commands) without delving into developing a custom search command in Python.

If this is still the best answer after the weekend, I'll accept it. Thanks again!

0 Karma

Graham_Hanningt
Builder

I'm still grateful for the pointer to map, but decided to accept the answer from @bchung - not because of this original answer, but because of the example script he later added in a comment. (I'll take tips on how to better manage acceptance of answers in such a situation: I want to do the right thing; I don't want to offend anyone.)

0 Karma

MuS
SplunkTrust
SplunkTrust

@Graham_Hannington , I just converted the comment to an answer. Fell free to accept the correct one now.

Graham_Hanningt
Builder

@MuS, done. Thanks!

0 Karma

richgalloway
SplunkTrust
SplunkTrust

If you do the lookup before timechart there's no need for rename.

... | lookup connection_types.csv conn_type OUTPUT description | timechart count by description 
---
If this reply helps you, Karma would be appreciated.
0 Karma

Graham_Hanningt
Builder

Thanks! And, yes, you're absolutely correct.

However, the following point from my question remains:

I could replace all of the original values with readable values before creating the timechart [...] but I'd prefer to use a technique that doesn't involve processing every input event for the timechart. That seems like too much processing.

The Splunk documentation makes the same point:

Optimizing your lookup search
If you are using the lookup command in the same pipeline as a transforming command, and it is possible to retain the field you will lookup on after the transforming command, do the lookup after the transforming command.

This morning, I came up with the following solution:

sourcetype=my_log_type | timechart count by conn_type | untable _time conn_type value | lookup connection_types.csv conn_type output description | xyseries _time description value

This uses untable and xyseries with lookup instead of renaming fields.

I now have a solution for my use case, so my original question about dynamically building commands is, to me, academic (until I hit a use case that needs it).

Thanks again for your input. Your answer prompted me to think harder about how to do the lookup after timechart.

0 Karma

Graham_Hanningt
Builder

The solution I came up with (mentioned in my previous comment) was flawed, because xyseries does not produce the same "intelligent" X-axis labels as timechart. @Jeremiah provided the following improved solution (in response to a different question😞

sourcetype=my_log_type | bin _time | stats count by _time, conn_type | lookup connection_types.csv conn_type output description | timechart sum(count) as count by description
0 Karma

richgalloway
SplunkTrust
SplunkTrust

Please accept an answer to help others in the future who may have a similar question.

---
If this reply helps you, Karma would be appreciated.
0 Karma

bchung_splunk
Splunk Employee
Splunk Employee

Hi Graham,

If I understand your question correctly, you should do the lookup for conn_type before timechart, and count by the output field (i.e., description), e.g.,

test.log:

Thu Apr 28 18:59:56 CST 2016 conn_type=0000000A
Thu Apr 28 19:00:06 CST 2016 conn_type=0000000B
Thu Apr 28 19:02:48 CST 2016 conn_type=0000000B
Thu Apr 28 19:02:51 CST 2016 conn_type=0000000C
Thu Apr 28 19:02:53 CST 2016 conn_type=0000000A
Thu Apr 28 19:02:56 CST 2016 conn_type=0000000D
Thu Apr 28 19:02:59 CST 2016 conn_type=0000000B
Thu Apr 28 19:02:59 CST 2016 conn_type=0000000A
Thu Apr 28 19:03:00 CST 2016 conn_type=0000000A
Thu Apr 28 19:03:00 CST 2016 conn_type=0000000A
Thu Apr 28 19:03:01 CST 2016 conn_type=0000000B
Thu Apr 28 19:03:02 CST 2016 conn_type=0000000D
Thu Apr 28 19:03:03 CST 2016 conn_type=0000000B
Thu Apr 28 19:03:04 CST 2016 conn_type=0000000B
Thu Apr 28 19:03:05 CST 2016 conn_type=0000000D
Thu Apr 28 19:03:07 CST 2016 conn_type=0000000A

search command:

source=*test.log | lookup connection_types.csv conn_type OUTPUT description | timechart count by description

then you'll get something like this:

_time   Entered on the command line From system XYZ NULL
2016-04-29 09:02:40 0   0   0
2016-04-29 09:02:45 1   0   0
2016-04-29 09:02:50 0   1   1
2016-04-29 09:02:55 1   1   1
2016-04-29 09:03:00 3   2   1
2016-04-29 09:03:05 0   1   1

HTH,
Bill

0 Karma

Graham_Hanningt
Builder

Hi Bill,

Thanks very much for the detailed answer.

However:

you should do the lookup for conn_type before timechart

I disagree. For details, see my comment on the answer by @richgalloway.

0 Karma
Get Updates on the Splunk Community!

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...