Getting Data In

Collecting data via a python script, later putting it into Splunk

tleyden
Explorer

We have some customers which are running into memory issues, and we need to provide them a script to collect several pieces of data:

  • Netstats for a particular pid (sudo netstat -apeen | grep -i app_name)

  • Application server stats which are available at our application server's REST endpoint which returns JSON

  • Overall memory stats (eg, top output) for a particular pid

and probably a few others.

It feels like a perfect job for Splunk! But .. it also feels a bit heavyweight to tell customers to install and configure a splunk forwarder. So I'm planning to take a "middle ground" approach:

  1. Ship them a python script that they would run, and which will have little or no 3rd party dependencies (single script, possibly even bundled as an exe)

  2. The python script will collect outputs mentioned above and put them in a directory structure

  3. The customer can then run the script to collect data, and then zip up the directory, and ship that back to us

  4. We somehow get the data into our own Splunk server to analyze it. (unzip, load somehow)

Here are my questions:

  • For #2 above, what is the best directory/file structure to use? Something like this?

    /netstat/
    timestamp1.txt (contains raw netstat output, anything else needed?)
    timestamp2.txt

    /sync-gateway
    timestamp1.txt (contains raw JSON, ditto)
    timestamp2.txt
    /top
    timestamp1.txt (contains raw top output, ditto)
    timestamp2.txt

  • For #4 above, what's the easiest way to get this data into splunk?

Also, any general guidelines on the approach would be very helpful.

Tags (1)
1 Solution

MuS
Legend

Hi tleyden,

basically there is nothing to recommend for #2, it is your script so do the directory structure like you prefer. Provide the content as JSON or CSV or Key=Value pairs - Splunk can handle those without trouble.

Regarding #4:
Setup an monitor in inputs.conf for some directory (http://docs.splunk.com/Documentation/Splunk/latest/Data/Configureyourinputs) and put the zips inside of the directory. Splunk will unpack them and index the data.

Hope this helps ...

cheers, MuS

View solution in original post

MuS
Legend

Hi tleyden,

basically there is nothing to recommend for #2, it is your script so do the directory structure like you prefer. Provide the content as JSON or CSV or Key=Value pairs - Splunk can handle those without trouble.

Regarding #4:
Setup an monitor in inputs.conf for some directory (http://docs.splunk.com/Documentation/Splunk/latest/Data/Configureyourinputs) and put the zips inside of the directory. Splunk will unpack them and index the data.

Hope this helps ...

cheers, MuS

tleyden
Explorer

Thanks, that is helpful.

it is your script so do the directory
structure like you prefer. Provide the
content as JSON or CSV or Key=Value
pairs - Splunk can handle those
without trouble.

Since I have three different types of information (netstat, sync-gateway, top), how can I "tag" these files such that they show up in splunk in such a way that I can say things like "show me all the netstat readings, but ignore the other stuff"?

0 Karma

MuS
Legend
0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Thanks for the Memories! Splunk University, .conf25, and our Community

Thank you to everyone in the Splunk Community who joined us for .conf25, which kicked off with our iconic ...

Data Persistence in the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. What happens if the OpenTelemetry collector ...

Introducing Splunk 10.0: Smarter, Faster, and More Powerful Than Ever

Now On Demand Whether you're managing complex deployments or looking to future-proof your data ...