I have previously indexed data uploaded to an s3 bucket.
I installed Splunk (full version) on an EC2 (RHEL7).
I (persistently) mounted the s3 bucket to the EC2 instance (with FUSE).
I can see all the data when I change to my_s3fs_mount_directory,
(e.g. /my_s3fs_mount_directory/index_name/db_1234567_123456_1234/rawdata/journal.gz)
My question is how I should edit the indexes.conf correctly, so that my new indexer sees this data and doesn't accidentally overwrite the existing data in my path by accident.
Here is what I have so far (in /opt/splunk/etc/system/local/)
[myindex]
homePath = /my_s3fs_mount_directory/index_name/db
coldPath = /my_s3fs_mount_directory/index_name/colddb
thawedPath =/my_s3fs_mount_directory/index_name/thaweddb
maxDataSize = 10000
maxHotBuckets = 10
The index is visible but no data in results.
Is there anything else I need to do or another conf I would also need to edit?
Any advice is appreciated.
Thank you
S3 over fuse is S. L. O. W. As well as being a fake fs
I would mount an ebs and copy the data from S3 to the ebs before doing anything else
S3 over fuse is S. L. O. W. As well as being a fake fs
I would mount an ebs and copy the data from S3 to the ebs before doing anything else
Your suggestion is probably the best solution at this point.
My current scenario was a test to see if it would read, and apparently it will not (as you have mentioned the s3fs is slow, also object based, and not listed as supported).
For those interested I started another thread (title of question below) to see if Splunk 7.0 remotePath may be a solution.
"has anyone successful setup the remotePath option in indexes.conf in Splunk 7.0 to work with indexed data in s3?"
FYI, I was able to read a test file.txt from the /s3fs dir, but as a "data Input'
I could read the file.txt via data inputs > files & directories > new (then select the /s3fs/file.txt)
Of course this would need to be automated to input loads of files.... have not worked that out but any suggestions appreciated.
I saw it!
I too am super interested in this, but as I note, i suspect it will only be for archive data
Is the data a 'copy' of the indexes which you have uploaded to s3, or was the data frozen?
If the data was frozen, you need to copy the buckets to the thawed directory - not the hot/cold db
it was a copy of warm and cold.
One thing you want to be very careful with is making sure you get your frozenTimePeriodInSecs
and maxTotalDataSizeMB
correct before you point splunk at an existing index location. If either is wrong you risk splunk thinking data needs to be frozen (which really means deleted in most cases).
After reviewing, some other posts...
It is quite possible that the s3 object based data is just not compatible (with Splunk) without some custom code making it readable for splunk.
I am using an old version of Splunk (i.e. 5.x).
I am thinking that I will try Splunk 7.x and see if it can read indexed data from a remote s3 location.
Please advise if you have any more insight on this. If/when I get results, I plan to share lessons learned.
Thank you
Is this a standalone splunk instance (or are you trying to search directly from the instance that has the data mounted)?
Can you post the output of splunk btool indexes list --debug
?
sorry sec-policy does not permit to post actual data thx
This is a standalone splunk instance on RHEL7 on the EC2 AWS instance.
I created a custom index which points to the s3 path...
When I restarted after creating the indexes.conf file for this index, I got this error
error message for /my_s3fs_mount_directory/...
homePath '/my_s3fs_mount_directory/index_name/db' is in a filesystem that Splunk cannot use. (index=index_name)
Checking indexes...
homePath '//my_s3fs_mount_directory/index_name/db' is in a filesystem that Splunk cannot use. (index=index_name)
Validating databases (splunkd validatedb) failed with code '1'.