What are best practices getting data in and updati...

neilmac64 · ‎05-23-2022

I have a project where I want to use a Splunk dashboard to show how some metrics can change over time. The metrics come from a device that we log in to via CLI and run a command to show some stats. I'm new to doing this from scratch with Splunk. I would appreciate any help in understanding the best way to do it.

The workflow is as follows:

Using a script:
- script, gain access to the target device
- run a command that gathers the required data
- Capture the output
- Parse/edit to a compatible format
Ingest/Import into Splunk
Display in Dashboard
Update metrics every x minutes (provisinally 15 or 30)

Assuming we have a script that can gather the data, the questions I have are:

1) What's the best format to have the data for Splunk?

2) What's the best way to get the data into Splunk ?

3) How do I automate this process?

Thanks for any help and guidance.

neilmac64 · ‎05-25-2022

I'm running a single instance in Docker. Can you point me to any resources on scripting?

richgalloway · ‎05-25-2022

I don't have links to resources on scripting in general. I presumed from the question that writing a script was something you were already prepared to do, but just didn't know how to interface it with Splunk.

I suggest writing the script in Python because Splunk ships with a Python interpreter. You can use other languages, but you'll have to install any libraries required by that language.

---
If this reply helps you, Karma would be appreciated.

richgalloway · ‎05-23-2022

1) This depends on the format in which the data is received from the source, but I prefer Key=Value format. Splunk has the easiest time parsing that.

2) Run the script as a scripted or modular input. Whatever the script writes to stdout will be indexed. Easy.

3) A scripted or modular input will automate the data collection, but I'm not aware of a way to automate the dashboard.

---
If this reply helps you, Karma would be appreciated.

neilmac64 · ‎05-24-2022

Hi Rich. Thanks for taking time to answer - I'm afraid this hasn't helped me.

Re your points:

1) The format is determined by the script that gathers the data. It needs to be parsed as the plain output will not be useful to Splunk. We can parse it then save it as either a text file, or JSON format file, or whatever else would be compatible / optimal for Splunk.

2) I am not sure what script you are referring to here - a separate one to get data into Splunk? I can import the data manually from a flat txt file - (however one problem is every time I do it I have to identify a new sourcetype).

3) The files can be stored on a windows PC, so the Splunk Forwarder would work, but then I am not sure how to set this up. We would need to be able to add new records to the index every time the forwarder send in data. Perhaps we should save the CLI output to a single file (so it looks like a log). Currently we have a new file each time we run the CLI commands.

Thanks again for any further insights.

richgalloway · ‎05-24-2022

Yes, I understand the script controls the format of the data sent to Splunk. As I said in my original response, I recommend key=value format, but others (such as XML or JSON) can used if the conversion is easier.

The script to which I refer is your script that collects data and transforms it for ingestion by Splunk.

(BTW, one should not need to define a new sourcetype every time a file is onboarded manually, but that's another topic.)

There is no need to store any files. With your data retrieval script set up in Splunk as a scripted input, the output of the script will be indexed directly. Each time the script runs, new data that is fetched will be added to the Splunk index.

---
If this reply helps you, Karma would be appreciated.

neilmac64 · ‎05-24-2022

Thanks Rich.

The data isn't sent to splunk- the script (probably Python, still in development) will run the log in to the device, then grab the output. A second script (probably) will parse it and create a new file, this is on the local PC where the script is executed.

We can't run the script from Splunk (unless I am misunderstanding?) as the output will not be compatible with Splunk. The key/value pairs need work. For example, one of the key outputs is as follows:

keysample = 1:11044540, 2:15414363, 3:0, 4:0, 5:0, 6:0, 7:0, 8:0,

This is a counter of how many times keysample used option 1, option 2, etc.

For ingest, it would need to be something like:

keysample 1: 11044540

keysample 2: 15414363

etc...

So I am guessing we will create a script to parse the output and place it into a file (txt file?). In this file we will have:

timestamp (comes from console)
hostname (comes from console)
key/value pairs

The question then is how to get this data into splunk. I have been testing using manual import using manually created txt files. This is not without problems (ie it asks for a new sourcetype each time I do the import). Are you saying a script within Splunk can do this? We can also use a forwarder, or manually if need be.

The next challenge is to then ingest data if we run the script again so we have:

updated timestamp (comes from console)
hostname (comes from console)
updated key/value pairs

Then we have datapoints that can be charted on a dashboard.

It sounds like it should be straightforward...

It's early days so if there is a better / easier way, I'm more than happy to reset.

richgalloway · ‎05-24-2022

The data isn't sent to splunk- the script (probably Python, still in development) will run the log in to the device, then grab the output. A second script (probably) will parse it and create a new file, this is on the local PC where the script is executed.
We can't run the script from Splunk (unless I am misunderstanding?) as the output will not be compatible with Splunk.

If the output is UTF-8 text then it's compatible with Splunk. Any further refinement of the data is a bonus.

The key/value pairs need work. For example, one of the key outputs is as follows:
keysample = 1:11044540, 2:15414363, 3:0, 4:0, 5:0, 6:0, 7:0, 8:0,
This is a counter of how many times keysample used option 1, option 2, etc.
For ingest, it would need to be something like:
keysample 1: 11044540
keysample 2: 15414363
etc...

Either of those formats can be used. The choice will depend on how the information will be used later.

So I am guessing we will create a script to parse the output and place it into a file (txt file?). In this file we will have:
timestamp (comes from console)
hostname (comes from console)
key/value pairs

I envision a single script that grabs output from the device and converts it into a Splunk-friendly format.

BTW, the hostname should also be a key/value pair (host=hostname) for simpler processing.

The question then is how to get this data into splunk.

The script that converts the data is configured as a Splunk scripted input then anything written to stdout by the script gets into Splunk automatically.

I have been testing using manual import using manually created txt files. This is not without problems (ie it asks for a new sourcetype each time I do the import). Are you saying a script within Splunk can do this? We can also use a forwarder, or manually if need be.

A scripted input will have a single sourcetype assigned to it. It will run at a scheduled interval to reach out the device, collect more output, transform it, and index it in Splunk.

Please start a new thread about manually ingesting files. I suspect there's a misunderstanding somewhere, but it's best handled separately.

The next challenge is to then ingest data if we run the script again so we have:
updated timestamp (comes from console)
hostname (comes from console)
updated key/value pairs
Then we have datapoints that can be charted on a dashboard.

This is what Splunk does. Each time the script runs, new data is added to the index. The term "updated" is used, but Splunk never changes data that's already stored - it will add new data.

It sounds like it should be straightforward...

It usually is.

It's early days so if there is a better / easier way, I'm more than happy to reset.

---
If this reply helps you, Karma would be appreciated.

neilmac64 · ‎05-25-2022

Thanks Rich.

Are you saying we could actually run the script from within Splunk?

It would then gather the data and it it to the index automatically?

Neil

richgalloway · ‎05-25-2022

Yes, that's exactly what I'm saying. Which instance (if you have more than one) it runs on will depend on your architecture.

---
If this reply helps you, Karma would be appreciated.

neilmac64 · ‎05-26-2022

I have a single Splunk instance running in Docker for this project. Scripted inputs appear to require Splunk Cloud or Splunk enterprise, which I don't have.

So I guess it's back to the original plan - to have a script that gathers the data - which will then need to be ingested into Splunk.

What are best practices getting data in and updating it (new project)?

index

JSON

monitor

sourcetype

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?