Because of network problems between my HFs and my indexing tier I have some "holes" in my data. With holes I mean missing data. These Holes need to be fixed. My Idea for this goes as follows:
index=newindex OR index=originalindex | eventstats count by raw | where count=1 | eval count=null
Generally this seems to work, but there is still one problem: My data comes from my different sources with many different source types. How can I export data with source and sourcetype and keep those fields when reimporting? I am also open to better solutions to my general problem.
Thanks in advance.
I found a solution myself in the meantime. In particular for step three and four:
3) export as json file
4.1) let the json-file run through my python-script (see below) ./script.py > /tmp/missingdata.txt
4.2) one-shot the outputs of this to the index ./splunk add oneshot /tmp/missingdata.txt -index foo -sourcetype logimport
my python3 script:
#!/usr/bin/python3
import json
fp=open('./export.json', 'r')
line = fp.readline()
while line:
parsedline=json.loads(line)
print(parsedline["result"]["_raw"])
print("HOST = "+parsedline["result"]["host"])
print("SOURCE = "+parsedline["result"]["source"])
print("SOURCETYPE = "+parsedline["result"]["sourcetype"])
print("###")
line=fp.readline()
fp.close()
props.conf:
[logimport]
LINE_BREAKER=(###\n)
TRANSFORMS = importsource, importsourcetype, importhost, importraw
transforms.conf:
[importhost]
REGEX =\nHOST = (.*)
FORMAT= host::$1
DEST_KEY = MetaData:Host
WRITE_META = true
[importsource]
REGEX=\nSOURCE = (.*)
FORMAT= source::$1
DEST_KEY = MetaData:Source
[importsourcetype]
REGEX=\nSOURCETYPE = (.*)
FORMAT= sourcetype::$1
DEST_KEY = MetaData:Sourcetype
[importraw]
REGEX=^(.*)\nHOST
DEST_KEY = _raw
FORMAT = $1
I found a solution myself in the meantime. In particular for step three and four:
3) export as json file
4.1) let the json-file run through my python-script (see below) ./script.py > /tmp/missingdata.txt
4.2) one-shot the outputs of this to the index ./splunk add oneshot /tmp/missingdata.txt -index foo -sourcetype logimport
my python3 script:
#!/usr/bin/python3
import json
fp=open('./export.json', 'r')
line = fp.readline()
while line:
parsedline=json.loads(line)
print(parsedline["result"]["_raw"])
print("HOST = "+parsedline["result"]["host"])
print("SOURCE = "+parsedline["result"]["source"])
print("SOURCETYPE = "+parsedline["result"]["sourcetype"])
print("###")
line=fp.readline()
fp.close()
props.conf:
[logimport]
LINE_BREAKER=(###\n)
TRANSFORMS = importsource, importsourcetype, importhost, importraw
transforms.conf:
[importhost]
REGEX =\nHOST = (.*)
FORMAT= host::$1
DEST_KEY = MetaData:Host
WRITE_META = true
[importsource]
REGEX=\nSOURCE = (.*)
FORMAT= source::$1
DEST_KEY = MetaData:Source
[importsourcetype]
REGEX=\nSOURCETYPE = (.*)
FORMAT= sourcetype::$1
DEST_KEY = MetaData:Sourcetype
[importraw]
REGEX=^(.*)\nHOST
DEST_KEY = _raw
FORMAT = $1
Way to go sharing your code. Come back and click Accept
on your answer to close the question and let other people know there is a good answer.
Over the time range in question, run a search like this:
index="foo" AND sourcetype="bar" | stats count BY source
Then in the shell on your HF, do this:
for FILE in /path/to/files/in/question/*
do
wc -l ${FILE}
done
Then cross-reference these 2 lists. Whichever ones are wrong, do this in Splunk:
index="foo" AND sourcetype="bar" source="bad" | delete
Then in the shell on your HF, for each bad
file, do this:
/opt/splunk/bin/splunk add oneshot /path/to/files/in/question/bad.csv -index foo -sourcetype bar -auth admin:changeme
Keep in mind that if the gaps are very large, doing this will erode your overall retention of data because delete
does not really delete anything, it just hides
it.
With this solution I would have to repeat the procedure manually for every sourcetype. Furthermore the HF only just forwards most of data, which are coming from UFs on which I have no shell-access.
Thus I am sorry, this solution does not work for me.
How many events we're talking about here? Do remember that both step 1 and step 2 will be counted against license.
There is a mistake in the SPL-query. It should be eventstats count by _raw
.