Splunk Dev

Script Indexed

dgadjov
Explorer

I have a script that is collecting data and then outputs it to a directory.
The script is being run by Splunk every hour and the directory is being continuously indexed.
After the script gets the data it does a check on a temp file to make sure there is no duplicate events and only writes new events into the indexed directory.

The problem is that the data is still indexed even though the file is not being written to.
What commands in python will make splunk index the script data directly?
I do not have any print lines and the only thing I do is open and read files and output to files.

Tags (1)
0 Karma

sowings
Splunk Employee
Splunk Employee

The usual behavior of scripted inputs in Splunk is to index the STDOUT from the script. If you adapt your script to output only the new events to STDOUT, you shouldn't get that data duplication.

0 Karma

dgadjov
Explorer

I have two folders called 'temp' and 'data'. When the script runs it collects some data and then does a compare with the 'temp' folder. If there is a difference it makes a list of differences and then outputs the differences to folder 'data'. It then writes all of the collected data into 'temp' to represent the latest record. If no differences are found nothing is written to the 'temp' or 'data' folder.
'temp' is just a temp folder but 'data' is the folder that is being continuously indexed.

0 Karma

sowings
Splunk Employee
Splunk Employee

You missed my point, which was a suggestion to re-work your script expressly to write its results to STDOUT.

In any event, let's figure out why it's grabbing your events. Does Splunk know about the location where the temp files are written? Could it be indexing those files?

When you write the new outputs, are you writing the whole thing out to a new file? Reusing an existing file?

When you say "The problem is that the data is still indexed even though the file is not being written to." What file do you mean? Can you give us some sample filenames to make the scenario easier to follow?

0 Karma

dgadjov
Explorer

The issue is that I don't have STDOUT anywhere yet it is still indexing

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...