Activity Feed
- Got Karma for How to prevent duplicates in batch mode?. 09-16-2020 08:50 AM
- Karma Re: Routing to an dynamic index based on JSON field for amitm05. 06-05-2020 12:50 AM
- Karma Re: How to prevent duplicates in batch mode? for amitm05. 06-05-2020 12:50 AM
- Got Karma for How to prevent duplicates in batch mode?. 06-05-2020 12:50 AM
- Posted Re: How to prevent duplicates in batch mode? on Getting Data In. 06-17-2019 08:16 AM
- Posted Re: Routing to an dynamic index based on JSON field on Getting Data In. 06-14-2019 12:17 PM
- Posted Re: How to prevent duplicates in batch mode? on Getting Data In. 06-14-2019 12:14 PM
- Posted How to prevent duplicates in batch mode? on Getting Data In. 06-11-2019 07:16 AM
- Tagged How to prevent duplicates in batch mode? on Getting Data In. 06-11-2019 07:16 AM
- Tagged How to prevent duplicates in batch mode? on Getting Data In. 06-11-2019 07:16 AM
- Tagged How to prevent duplicates in batch mode? on Getting Data In. 06-11-2019 07:16 AM
- Tagged How to prevent duplicates in batch mode? on Getting Data In. 06-11-2019 07:16 AM
- Posted Routing to an dynamic index based on JSON field on Getting Data In. 06-10-2019 11:41 AM
- Tagged Routing to an dynamic index based on JSON field on Getting Data In. 06-10-2019 11:41 AM
- Tagged Routing to an dynamic index based on JSON field on Getting Data In. 06-10-2019 11:41 AM
- Tagged Routing to an dynamic index based on JSON field on Getting Data In. 06-10-2019 11:41 AM
- Tagged Routing to an dynamic index based on JSON field on Getting Data In. 06-10-2019 11:41 AM
- Posted Re: How to get a list of the oldest $n$ events on Splunk Dev. 04-26-2019 10:58 AM
- Posted How to get a list of the oldest or newest $n$ events to delete and save disk space? on Splunk Dev. 04-23-2019 12:09 PM
- Tagged How to get a list of the oldest or newest $n$ events to delete and save disk space? on Splunk Dev. 04-23-2019 12:09 PM
Topics I've Started
Subject | Karma | Author | Latest Post |
---|---|---|---|
2 | |||
0 | |||
0 | |||
0 |
06-17-2019
08:16 AM
I tried
crsSalt = ConstantString
Made no difference. I tried defining a crclength of 1024. No difference.
I am using batch instead of monitor because the files are one offs each containing a single event. Once ingested, the file is no longer necessary. I will be getting 10s of millions of files and don't want them to hang around in the monitor directory.
... View more
06-14-2019
12:17 PM
Thanks - I will try that. Any thoughts for how to use the Splunk JSON parsing in favour of REGEX?
... View more
06-14-2019
12:14 PM
I am manually copying the file in to the monitor directory, watching it get ingested and then manually copying the file a second time, and watching it get ingested a second time. There are now two events in Splunk UI for the exact same file. I know that this scenario will occur, so I am testing it. I thought/was hoping Splunk was able to detect duplicates with the hash and not re-ingest. There is only one indexer.
... View more
06-11-2019
07:16 AM
2 Karma
I have a number of small files, each of which maps to a single event. Since these files aren't actively added to (one-shot deals), I have been using batch mode to ingest them.
I recently saw a problem during some testing. If I added the same file to the monitor directory a second time, the event in the file was ingested again and the search now shows multiple instances of the same event. I thought that Splunk was able to avoid duplicate entries by hashing the file and doing a CRC check. However, that does not seem to be working. The file that I have added has the same contents and the same name. The only thing different is the timestamp of the file since it was recently copied into the monitor directory.
Here are the contents of my inputs.conf.
[batch:///data/metadata]
disabled = false
move_policy = sinkhole
sourcetype = meta
I haved tried with and without
crcSalt = <SOURCE>
but it made no difference; the file is re-ingested every time I copy it into the batch monitor directory creating another duplicate event.
Is there a way to avoid re-ingestion? Failing that, can I do something in search so that everyone only sees the most recent instance of the event?
Thanks!
... View more
06-10-2019
11:41 AM
I have JSON data that I am ingesting. I would like to route the event to an index based on one of the JSON fields. I've seen examples that use REGEX, but I want to avoid hard coding the indexes since I will need to update multiple config files if I start getting new types of data.
My JSON data includes the following section:
...
"collection": {
"date": "...",
"source": <Canada | US | Mexico>
},
...
I would like to have 3 seperate indexes, one for Canada, US, and Mexico. I would like to have the index determine dynamically based on the input.
I've seen examples that suggest this is easy to do with REGEX, and I think I could do this as follows that way:
indexes.conf:
[index-Canada]
...
[index-US]
...
[index-Mexico]
...
props.conf:
[default]
TRUNCATE = 0
INDEX_EXTRACTIONS = json
TIMESTAMP_FIELDS = collection.date
TRANSFORMS-SetIndex = setIndex-Canada, setIndex-US, setIndex-Mexico
transforms.conf:
[setIndex-Canada]
REGEX = "source": "Canada"
DEST_KEY = _MetaData::Index
FORMAT = index-Canada
[setIndex-US]
REGEX = "source": "US"
DEST_KEY = _MetaData::Index
FORMAT = index-US
[setIndex-Mexico]
REGEX = "source": "Mexico"
DEST_KEY = _MetaData::Index
FORMAT = index-Mexico
I think this will work. However, I would like to make it so that I don't have to hard code the transforms.conf for each index. One way is to do the following:
props.conf:
[default]
TRUNCATE = 0
INDEX_EXTRACTIONS = json
TIMESTAMP_FIELDS = collection.date
TRANSFORMS-SetIndex = setIndex
transforms.conf:
[setIndex]
REGEX = "source": "(.*)"
DEST_KEY = _MetaData::Index
FORMAT = index-$1
I have a couple questions about this:
If the data has an index I haven't configured, can I somehow setup a fallback so that events that don't match a configured index are not lost?
Can I use the SOURCE_KEY somehow to use the value of the JSON field instead of REGEX? I would rather use the JSON parsing ability of Splunk than my REGEX skills to make sure I am getting the right field. If somehow my REGEX shows up in the contents of the event later, I could get data routed to the wrong index.
... View more
04-26-2019
10:58 AM
Thanks for the reply. I am not interested in the disk space used by splunk. The metadata ingested by Splunk is minimal - the actual files on disk are orders of magnitude larger than the metadata. So, when the disk approaches full, I want to know which are the oldest and then delete them. I was hoping to use Splunk to tell me which are the oldest since I already have the metadata ingested and the event time is included in the metadata. However, it looks like I will have to stand up a mysql DB to ingest the timestamps of the files so that I can get a timely response for which are the oldest. Thanks!
... View more
04-23-2019
12:09 PM
Perhaps I am going about this the wrong way, so I am open to suggestions on how I can do this.
Basically, I've got a set of files on disk and some metadata about those files ingested by Splunk. As the number of these files grows, I would like to delete the oldest of these to keep the disk usage low. Ideally, I would like to query Splunk for the oldest events so that I can delete them on disk and then delete them from Splunk. Searching for the most recent events seems quick, but searching for the oldest seems very time-consuming. Is there a better way to do this?
Here are some ways I've thought of so far:
search index=myindex | reverse | head 100
search index=myindex | tail 100
Both are very slow while the following (the opposite of what I want) is fast
search index=myindex | head 100
Alternatively, I could look for the oldest 1%, but I doubt that will make anything easier. I want to avoid re-ingesting the metadata into something like SQL where I can make quick time-based queries, but I might have to do so if I can't find a way to use Splunk as the metadata master.
Thanks!
... View more
04-09-2019
08:09 AM
For our solution, we need to index a number of events, but delete the events when they get too old. In our implementation, this is something like 6 months to a year. Each event references a file on a file system which also needs to be deleted when the event is deleted. The events that are indexed are JSON files.
I've implemented something that works using a 'one-shot' search, but I would like some feedback to see if there are alternatives that might work better for my use case.
I have a python script which connects to my splunk instance. I have a cron job scheduled for once a day which then executes the following with the Splunk API:
import splunklib.client as client
import json
# Init connection to Splunk
service = client.connect(
host=SPLUNK_HOST,
port=SPLUNK_PORT,
username=SPLUNK_USER,
password=SPLUNK_PASSWD)
# Get all the events older than a year.
kwargs_searcholder = {"latest_time" : "-1y"}
search = "search index=custom"
search_results = service.jobs.oneshot(search, **kwargs_searcholder)
search_reader = search_results.ResultsReader(search_results)
for item in search_reader:
jsondata=json.loads(item['_raw'])
id = jsondata['id']
# Delete the file on the file system
delete_file(id)
# Delete the event in splunk
service.jobs.oneshot(search + " id=" + id + " | delete", **kwargs_deleteone)
I have a few questions about this.
Q1: is there a way to do this within splunk instead of creating a cron job? I need to do more than just run a search, I also need to delete a file on the file system.
Q2: If the search returns a lot of records, then I am invoking the API once per record to perform the delete. Is there some way I can delete the same records I got in the search without specifying them all individually? I know I could just pipe the search through delete, but since this procedure takes a non-trivial amount of time, there will be events that are not old enough at the time of the first search, but are old enough at the time of the second. Then I would have files left on disk that are orphaned because the event is already deleted in Splunk.
Q3: Is this a good candidate for a saved search?
... View more
01-23-2019
08:56 AM
Did you ever solve your problem? I think I am having the same problem.
... View more