Using Splunk 6.5.1 with either directing monitoring and indexing and search on a single machine,
or using a dedicated forwarder feeding the indexer/search head machine.
I've setup a monitoring of a directory where some binary updates a CSV file all day long:
That CSV file has 31 fields on each line like:
For the sourcetype, I'm using the built-in "csv" complemented with a TIMESTAMP_FIELDS = SUBMITTIME.
The data loaded in my index is corrupted: I am seeing that sometimes a line is only half-read, so only the first half of the fields is populated. But then, the second-half of the line is treated as a new line with the first half of the fields being populated with the second half of the fields: aka: I see some EXECHOST name values in the PROJECT field.
I cannot find any warning of interest in the splunkd.log file,
apart maybe from:
07-06-2017 11:36:53.585 -0700 INFO WatchedFile - Resetting fd to re-extract header.
... View more
Hello Splunk Gurus,
I've come up with a plan to backup my Splunk v6.6.1 indexes that I'd like to be reviewed.
I don't have a big Splunk install: Just 3 physical indexers with 8-TB local disk each configured in an indexer cluster with a VM for the index master.
My data rates are not that big, but I need to preserve hundreds of indexes against a possible corruption for several years.
Replication Factor protects me against a hardware failure, but does not protect me against an Operator Error: somebody deletes an index or uploads too much data into an index where it was not supposed to go. So I need to backup a copy of my indexes so that I can get a copy of an index as it was one week ago for example.
The hot bucket backup problem is not too much of a problem for my application since I don't have high data input rates. I can always force a roll from hot to warm before the once-a-day backup, or even ignore/lose the hot bucket altogether.
I've setup maxHotBuckets = 1 to ensure that I lose at most 1 hot bucket.
That way, I avoid setting up a ZFS filesystem on my indexers. I just use regular filesystems.
So my index data primary copy is going to be split across 3 different machines on their local disk. Also, since I run with a Replication Factor = 3 and Search Factor = 2 I will have multiple copies of the data (db_* and rb_*) mixed together in the same directories. Writing a specialized backup script sounds like a complex (re-assemble a full copy of all the primary buckets in one index?) and fragile (how does the script gets update when I add one more indexer to my cluster?) task. So I'm ruling out writing a custom script for backups.
I am relying on Splunk's internal replication mechanism to assemble for that copy of all primary buckets in one place.
Instead of a simple single cluster, I've added another VM with some NFS disks (NFS disks can be backed-up easily). M(y indexer cluster is configured in a multisite indexer cluster:
site1 / MAIN:
indexmaster: VM, RF=3, SF=2
indexsvc1: Physical with 8-TB local disks
indexsvc2: Physical with 8-TB local disks
indexsvc3: Physical with 8-TB local disks
site2 / BKP
indexbkp: VM, single machine indexer with 2 x 6-TB NFS partitions (backup1 & backup2) attached
and in the server.conf of the multisite cluster master, I add the following:
replication_factor = 3
search_factor = 2
available_sites = site1,site2
multisite = true
site_replication_factor = origin:3,total:4
site_search_factor = origin:2,total:2
Each index is defined with explicit paths in indexes.conf like:
summaryHomePath = volume:example/summary/
tsidxStatsHomePath = volume:example/tsidx_stats/
tstatsHomePath = volume:example/datamodel_summary/
homePath = volume:example/hotwarm/
coldPath = volume:example/cold/
coldToFrozenDir = $SPLUNK_DB/backup1/example/frozen/
thawedPath = $SPLUNK_DB/backup1/example/thawed/
On the site1/main indexer cluster, those backup1 & backup2 paths correspond to a plain directory under SPLUNK_DB.
On the site2/bkp indexer, those backup1 & backup2 correspond to each of the NFS partitions enabling me to share my indexes into the different NFS partitions for backup.
The search heads use site-affinity to just use site1's indexer cluster.
This setup gives me a multisite cluster with two sites: one with performance & capacity, and one which is used only for replicating a single copy of all the indexes.
How does that sound as a backup mechanism?
... View more
Thank you Steve G. for your answer.
I found that there is no way to setup a multisite indexer cluster with some indexes replicated on all sites, and some just for a particular indexer cluster described in the documentation.
Having an index on a single peer is supported though ( http://docs.splunk.com/Documentation/Splunk/6.6.1/Indexer/Managesinglepeerconfigurations#Add_an_index_to_a_single_peer )
So what I ended up doing to get a one-site-only index is to:
Create a new file on the index master:
which is part of the configuration bundle.
Then, I can use the usual:
splunk apply configuration-bundle
which ensures that new revision of that file makes it to all indexers (or none).
I do get a warning:
[Not Critical]No spec file for: /indexmaster/etc/master-apps/_cluster/local/site1-big-indexes.conf
I went on every indexer on that particular "big" site and added a symbolic link:
cd etc/system/local; ln -sf ../../slave-apps/_cluster/local/site1-big-indexes.conf indexes.conf
The only thing which is not automatically taken care of is the rolling restart if I update the site-only site-big-indexes.conf
I just manually issue a:
splunk rolling-restart cluster-peers
This seems to work, I now have indexes replicated on all sites and some that are replicated on one site only.
... View more
Hello Splunk Gurus,
I have a multisite indexing cluster in Splunk 6.6.1 spanning two sites: small & big.
The "big/site1" site is configured with RF=3/SF=2.
Due to having way less disk, the "small/site2" is configured with RF=1/SF=1.
Is there a way to define an index that would be replicated locally on "big/site1" with RF=3/SF=2, but would not be sent to the "small/site2" at all.
Would changing the per-index definition from "repFactor=auto" to "repFactor=3" deliver what I am looking for? (replicated, but on a single-site originating site)?
Could I achieve this by abandoning the index_master for distributing the indexes.conf file and managing by myself the hand copy/edit of the various index files and rolling-restart of the indexers?
... View more
Sorry if this has already been asked. It should be a common question, but I've not been able to find an answer by searching...
We would like to switch using Splunk to serve hundreds of dashboards. The built-in Splunk web interface is great for a single person to author their first dashboard. It does not seem meant for team development as I don't see any change history being preserved: a new version of a dashboard overwrites the old one, and there is no record of who changed what?
So, how do teams author dashboards together?
I've used the 'vi' editor to directly edit files in splunk/etc/apps/search/local/data/ui/views, but then it requires human intervention to go to http://splunk:8000/debug/refresh to pick up the new version of the file just edited. Is there a unix-side command line equivalent?
When you put those dashboards together using version-control, how do you setup your Splunk instances? You'll need a production one where all the users are using the production version of dashboards, maybe a staging one for testing dashboards before production, and then one or more for each of your dozen of developers for them to develop in a isolated way. Any best practices? any document on this?
... View more