Building for the Splunk Platform

Transforms and lookups - caching, chunking

Charlie
Explorer

I'd like to understand how external-cmd transforms are being called - for example:

  1. Are the results cached, i.e., are calls with the same parameters called multiple times or just the once?
  2. Is the CSV stdin sent all at once or in chunks?

I have a state lookup defined like this in my transforms.conf:

[stateLookup]
external_cmd = state.py node eventtype state
fields_list = node, eventtype, state

My query looks like this:

tag=statechange NOT jobstart | sort _time | search node="nodename" | head 8 | lookup stateLookup node eventtype

We have server nodes (node) and an eventtype (derived from things like node up and node down) such that a node up would return the node to a user "state".

So state is what we want as the output, but only if the state has "changed", this requires a previous state to be tracked. Is there a good way to do this? We were using streamstats which seems to work perfectly, but we thought we'd try tracking previous state in the script as to eliminate a step for our users.

Debug statements have led me to believe that the same node with the same eventtype does not get called again (1 - cached), likewise I see the python script being called many times (2 - chunked CSV data, i.e., not sent ALL at once to the script).

The reason that (1) is a concern is that we are tracking previous states in a global hash so that we can tell if the state changed or not based on the node and eventtype. So if the call to the script is cached, we will not get accurate results. It's just curious that I am not seeing a "row" in the CSV stdin for each line I have showing in my query.

And the reason that (2) is a concern is because when the script is called again, the global hash (dictionary) is lost and therefore the node/eventtype will not be able to find its previous state.

Any ideas? Comments? Thoughts?

Thank you.

Tags (3)
1 Solution

Charlie
Explorer

Ok, for issue (1), it's an issue with something I am doing in the script. I only want to return a "state" value for a node with a changed state. So, when I do not pass a value back, then the query is tried again just for the nodes that didn't return anything and then if it fails that lookup, then the whole query is tried once more. That's where I'm seeing the 3 distinct calls to my script. When I return something for each lookup, then I only see the one call.

And for (2), it's pretty clear that the lookups are done once for the given set of parameters, but I'd like to force the query to take place (in this case) for every row. What I am currently doing now is passing the _time param with the lookup. All events have their own lookup now.

View solution in original post

0 Karma

jrstear
Path Finder
0 Karma

Charlie
Explorer

Thanks - for (2) my new lookup is now this.

[stateLookup]
external_cmd = state.py node eventtype state
fields_list = node, eventtype, state, _time
time_field = _time

For (1) - The search command is pretty cool, seems to mean I won't need a lookup though.

0 Karma

Charlie
Explorer

Ok, for issue (1), it's an issue with something I am doing in the script. I only want to return a "state" value for a node with a changed state. So, when I do not pass a value back, then the query is tried again just for the nodes that didn't return anything and then if it fails that lookup, then the whole query is tried once more. That's where I'm seeing the 3 distinct calls to my script. When I return something for each lookup, then I only see the one call.

And for (2), it's pretty clear that the lookups are done once for the given set of parameters, but I'd like to force the query to take place (in this case) for every row. What I am currently doing now is passing the _time param with the lookup. All events have their own lookup now.

0 Karma
Get Updates on the Splunk Community!

Avoid Certificate Expiry Issues in Splunk Enterprise with Certificate Assist

This blog post is part 2 of 4 of a series on Splunk Assist. Click the links below to see the other ...

Using Machine Learning for Hunting Security Threats

REGISTER NOW Seeing the exponential hike in global cyber threat spectrum, organizations are now striving more ...

Security Highlights | November 2022 Newsletter

 November 2022 2022 Gartner Magic Quadrant for SIEM: Splunk Named a Leader for the 9th Year in a RowSplunk is ...