Solved: Can I use Splunk's built-in Python SDK in my own s...

Johnvey · ‎11-09-2009

I have existing Python scripts that pull data from various sources. I would like to use Splunk's built-in Python SDK layer in my own script so that I can run searches programmatically.

Johnvey · ‎11-09-2009

Yes, Splunk includes its own copy of Python along with modules that talk directly to the Splunk backend.

Interactive Splunk Python prompt

If you have Splunk installed already, you can use the interactive Python interpreter to try out the various modules.

1) start the Python prompt

$SPLUNK_HOME/bin/splunk cmd python

2) import the required modules

import splunk.auth, splunk.search

3) obtain a session key

key = splunk.auth.getSessionKey('admin','changeme') # replace with your credentials

The SDK will cache the session key, so you don't have to explicitly pass it while in the interactive prompt.

4) run a basic search

my_job = splunk.search.dispatch('search error | timechart span=1h count', namespace='search', earliest_time='-24h')

This command will start a search job for all occurrences of the keyword error, in the context of the search app, and count them on a per-hour basis for data that occurred over the last 24 hours. The handle to the job is represented by the my_job object, which is a splunk.search.SearchJob class.

5) inspect the various job properties

The my_job object has a multitude of properties that describe the current job. Examples are:

>>> my_job.isDone 
True
>>> my_job.eventCount
13264
>>> my_job.resultCount
24

The complete list of properties can be enumerated by printing the job object:

>>> print my_job
createTime           2009-11-09T10:48:03.000-08:00
cursorTime           2009-09-09T01:40:20.000-07:00
delegate             None
doneProgress         1.0
dropCount            0
eai:acl              {'sharing': 'global', 'perms': {'read': ['admin'], 'write': ['admin']}, 'app': 'search', 'modifiable': 'true', 'can_write': 'true', 'owner': 'admin'}
earliestTime         2002-09-12T17:02:52.000-07:00
eventAvailableCount  100
eventCount           100
eventFieldCount      17
eventIsStreaming     True
eventIsTruncated     False
eventSearch          search error  | head 100
eventSorting         desc
isDone               True
isFailed             False
isFinalized          False
isPaused             False
isRealTimeSearch     False
isSaved              False
isSavedSearch        False
isZombie             False
keywords             error
label                None
latestTime           2009-09-09T01:40:21.000-07:00
messages             {}
modifiedTime         2009-11-09T10:48:03.000-08:00
priority             5
remoteSearch         litsearch error | fields keepcolorder=t * | prehead limit=100 null=false keeplast=false
reportSearch         None
request              {'search': 'search error | head 100'}
resultCount          100
resultIsStreaming    True
resultPreviewCount   100
runDuration          3.38
scanCount            2803
search               search error | head 100
searchProviders      ['decider.local-johnvey']
sid                  1257792483.115
statusBuckets        300
ttl                  600

6) get the raw events

You can get the raw events that the index returned by:

>>> for event in my_job.events: print event

7) get the transformed results

Since the search command contains timechart, a transforming command, the relevant summarized data is contained in the results property:

>>> for result in my_job.results: print result

In this iterator, the result object houses detailed information about each result:

>>> result0 = my_job.results[0]
>>> result0.time
'2009-09-09T01:40:21-0700'
>>> result0.fields
{'count': 100, '_time': 2009-09-09T01:40:21-0700}
>>> result0['count']
100

😎 clean up

Every search job executed will retain its data for a period of time (defined by the ttl dispatch property). When you are finished with the job, mark the job for removal:

>>> my_job.cancel()

Scripting against the built-in Python SDK

You can write custom Python scripts that use the built-in SDK by executing them in the Splunk environment. Assuming that you have a script called 'my_searcher.py':

$ cd $SPLUNK_HOME/bin
$ vi my_searcher.py # create your scripts
$ splunk cmd python my_searcher.py

Installing the Python SDK on an server that doesn't have Splunk

Currently, there isn't a packaging script to easily bring the necessary Splunk Python components to a standalone machine. However, those that are familiar with Python can manually setup such an environment.

Splunk Python modules:

$SPLUNK_HOME/lib/python2.6/python-site/splunk/

Python dependencies:

1) python 2.5+ 2) lxml: http://codespeak.net/lxml/ 3) httplib2: http://code.google.com/p/httplib2/

Generating SDK documentation

You can generate documentation on the various Splunk modules by running pydoc within the Splunk environment:

$ $SPLUNK_HOME/bin/splunk cmd pydoc -p 8800

This will start a local webserver that will serve the code documentation for Splunk. Under the site-packages header there is a link for `splunk', which contains the entire SDK tree.

View solution in original post

psanford_splunk · ‎09-28-2011

There is also a new Splunk Python SDK on GitHub. You can access it here: https://github.com/splunk/splunk-sdk-python

Any questions - psanford@splunk.com or ping us on Twitter: @splunkdev

Johnvey · ‎11-09-2009

Yes, Splunk includes its own copy of Python along with modules that talk directly to the Splunk backend.

Interactive Splunk Python prompt

If you have Splunk installed already, you can use the interactive Python interpreter to try out the various modules.

1) start the Python prompt

$SPLUNK_HOME/bin/splunk cmd python

2) import the required modules

import splunk.auth, splunk.search

3) obtain a session key

key = splunk.auth.getSessionKey('admin','changeme') # replace with your credentials

The SDK will cache the session key, so you don't have to explicitly pass it while in the interactive prompt.

4) run a basic search

my_job = splunk.search.dispatch('search error | timechart span=1h count', namespace='search', earliest_time='-24h')

This command will start a search job for all occurrences of the keyword error, in the context of the search app, and count them on a per-hour basis for data that occurred over the last 24 hours. The handle to the job is represented by the my_job object, which is a splunk.search.SearchJob class.

5) inspect the various job properties

The my_job object has a multitude of properties that describe the current job. Examples are:

>>> my_job.isDone 
True
>>> my_job.eventCount
13264
>>> my_job.resultCount
24

The complete list of properties can be enumerated by printing the job object:

>>> print my_job
createTime           2009-11-09T10:48:03.000-08:00
cursorTime           2009-09-09T01:40:20.000-07:00
delegate             None
doneProgress         1.0
dropCount            0
eai:acl              {'sharing': 'global', 'perms': {'read': ['admin'], 'write': ['admin']}, 'app': 'search', 'modifiable': 'true', 'can_write': 'true', 'owner': 'admin'}
earliestTime         2002-09-12T17:02:52.000-07:00
eventAvailableCount  100
eventCount           100
eventFieldCount      17
eventIsStreaming     True
eventIsTruncated     False
eventSearch          search error  | head 100
eventSorting         desc
isDone               True
isFailed             False
isFinalized          False
isPaused             False
isRealTimeSearch     False
isSaved              False
isSavedSearch        False
isZombie             False
keywords             error
label                None
latestTime           2009-09-09T01:40:21.000-07:00
messages             {}
modifiedTime         2009-11-09T10:48:03.000-08:00
priority             5
remoteSearch         litsearch error | fields keepcolorder=t * | prehead limit=100 null=false keeplast=false
reportSearch         None
request              {'search': 'search error | head 100'}
resultCount          100
resultIsStreaming    True
resultPreviewCount   100
runDuration          3.38
scanCount            2803
search               search error | head 100
searchProviders      ['decider.local-johnvey']
sid                  1257792483.115
statusBuckets        300
ttl                  600

6) get the raw events

You can get the raw events that the index returned by:

>>> for event in my_job.events: print event

7) get the transformed results

Since the search command contains timechart, a transforming command, the relevant summarized data is contained in the results property:

>>> for result in my_job.results: print result

In this iterator, the result object houses detailed information about each result:

>>> result0 = my_job.results[0]
>>> result0.time
'2009-09-09T01:40:21-0700'
>>> result0.fields
{'count': 100, '_time': 2009-09-09T01:40:21-0700}
>>> result0['count']
100

😎 clean up

Every search job executed will retain its data for a period of time (defined by the ttl dispatch property). When you are finished with the job, mark the job for removal:

>>> my_job.cancel()

Scripting against the built-in Python SDK

You can write custom Python scripts that use the built-in SDK by executing them in the Splunk environment. Assuming that you have a script called 'my_searcher.py':

$ cd $SPLUNK_HOME/bin
$ vi my_searcher.py # create your scripts
$ splunk cmd python my_searcher.py

Installing the Python SDK on an server that doesn't have Splunk

Currently, there isn't a packaging script to easily bring the necessary Splunk Python components to a standalone machine. However, those that are familiar with Python can manually setup such an environment.

Splunk Python modules:

$SPLUNK_HOME/lib/python2.6/python-site/splunk/

Python dependencies:

1) python 2.5+ 2) lxml: http://codespeak.net/lxml/ 3) httplib2: http://code.google.com/p/httplib2/

Generating SDK documentation

You can generate documentation on the various Splunk modules by running pydoc within the Splunk environment:

$ $SPLUNK_HOME/bin/splunk cmd pydoc -p 8800

This will start a local webserver that will serve the code documentation for Splunk. Under the site-packages header there is a link for `splunk', which contains the entire SDK tree.

esachs · ‎06-01-2010

According to the latest documentation, pydoc is now at:

$SPLUNK_HOME/bin/splunk cmd $SPLUNK_HOME/lib/python2.6/pydoc.py -p 8080

Glenn · ‎03-26-2010

Is pydoc part of all Splunk distributions? I'm running a 4.0.9 indexer on RHEL5.3 x86_64, as installed by the RPM, and get this when trying to generate the docs:

$ /opt/splunk/bin/splunk cmd pydoc -p 8800
couldn't run "/opt/splunk/bin/pydoc": No such file or directory

Indeed, it doesn't exist:

$ ls -l /opt/splunk/bin/p*
parsetest pcregextest python python2.6

Can I use Splunk's built-in Python SDK in my own scripts?

Interactive Splunk Python prompt

Scripting against the built-in Python SDK

Installing the Python SDK on an server that doesn't have Splunk

Generating SDK documentation

Interactive Splunk Python prompt

Scripting against the built-in Python SDK

Installing the Python SDK on an server that doesn't have Splunk

Generating SDK documentation

Leveraging Detections from the Splunk Threat Research Team & Cisco Talos

New in Splunk Observability Cloud: Automated Archiving for Unused Metrics

Calling All Security Pros: Ready to Race Through Boston?

Are you a member of the Splunk Community?

Can I use Splunk's built-in Python SDK in my own scripts?

Interactive Splunk Python prompt

Scripting against the built-in Python SDK

Installing the Python SDK on an server that doesn't have Splunk

Generating SDK documentation

Interactive Splunk Python prompt

Scripting against the built-in Python SDK

Installing the Python SDK on an server that doesn't have Splunk

Generating SDK documentation

Leveraging Detections from the Splunk Threat Research Team & Cisco Talos

New in Splunk Observability Cloud: Automated Archiving for Unused Metrics

Calling All Security Pros: Ready to Race Through Boston?