I have a data source on the local file system configured as such..
/data/splunk/rrsearch/server-01/processed.1341878400.gz /data/splunk/rrsearch/server-01/processed.1341964800.gz /data/splunk/rrsearch/server-02/processed.1341878400.gz /data/splunk/rrsearch/server-02/processed.1341964800.gz /data/splunk/rrsearch/server-03/processed.1341878400.gz /data/splunk/rrsearch/server-03/processed.1341964800.gz ...etc...
The Data Inputs - Files & Directories screen shows 620 files.
The problem is none of the data ever seems to get indexed, other data in the /data/splunk path does get indexed for other projects. I feel I'm missing one small step, can anyone throw me a bone?
Per @Lamar's request, inputs.conf
[default] host = wsi-hub [monitor:///data/splunk/remote] host_segment = 4 sourcetype = syslog blacklist = .*.gz disabled = 0 host = [monitor://$SPLUNK_HOME/var/log/splunk] blacklist = *.gz disabled = false [monitor:///data/logs/rrsearch] disabled = false followTail = 0 host = host_regex = index = baseline_search whitelist = .+processed.+gz$ sourcetype = Baseline Search host_segment = 4
Index Name: baseline_search Max Size: 500,000 Frozen Archive: None Current Size: 3,807 Event Count: 54,237,503 Earliest Event: May 13, 2012 7:59:59 PM Latest Event: Jul 30, 2012 7:59:59 PM Home Path: /opt/splunk/var/lib/splunk/baseline_search/db App: search
Without being able to see your 'actual' input configuration I'll take a guess and say that you've got to make sure you're searching on index=baseline_search unless you've set your default indexes to include that one.
Include your inputs.conf and we may be able to get a bit further.
I'll give a nod to Lamar's answer, but I also notice that your whitelist doesn't match the filenames... You have
Which should be
I would first, clean up your input for the processed files.
There are a few issues with it --
First, the monitoring stanza won't pick up the data since the directory that you're monitoring is invalid (/data/logs)
Additionally, I would define the fourth segment in your monitor.
Lastly, I wouldn't put spaces in my sourcetype as Splunk doesn't respond well to spaces in sourcetypes.
[monitor:///data/splunk/rrsearch/*/] disabled = false index = baseline_search whitelist = .+processed.+gz$ sourcetype = Baseline_Search host_segment = 4
That should get you a little closer to where you want to be.
Hope it helps.
Thanks for the response, the data seems to at least be indexing now (updated in the body above), it just never appears on the Search page. Currently the only "Source type" is syslog, though there are seven other enabled data sources with files. Perhaps I am missing some step to get other source types to appear in the search?
Yeah, you'll probably want to enable this index 'baseline_search' to be searched by default by your user/role.
In particular, these two parameters:
Adding a new role worked great!
Splunk will be used by manager/marketing types making reports and such. I wanted to make my search engine data as segregated as possible from any syslog data. The search engine data is scrubbed to disassociate individual IPs from their searches. Some of the data in syslog may contain individually identifiable information which they are strictly forbidden from viewing.
I can view the data because I have ethical standards 🙂
Thanks a lot for taking time to help me with this.