Getting Data In
Highlighted

Good Data Input .. No Indexing

Explorer

I have a data source on the local file system configured as such..

Path:

/data/splunk/rrsearch/server-01/processed.1341878400.gz
/data/splunk/rrsearch/server-01/processed.1341964800.gz
/data/splunk/rrsearch/server-02/processed.1341878400.gz
/data/splunk/rrsearch/server-02/processed.1341964800.gz
/data/splunk/rrsearch/server-03/processed.1341878400.gz
/data/splunk/rrsearch/server-03/processed.1341964800.gz
...etc...
  • Path: /data/logs/rrsearch
  • Set Host: Segment on Path / 4
  • Source type: Manual / Baseline Search
  • Index: baseline_search
  • Whitelist: .+processed.+gz$
  • Blacklist: left empty

The Data Inputs - Files & Directories screen shows 620 files.

The problem is none of the data ever seems to get indexed, other data in the /data/splunk path does get indexed for other projects. I feel I'm missing one small step, can anyone throw me a bone?

Per @Lamar's request, inputs.conf

[default]
host = wsi-hub

[monitor:///data/splunk/remote]
host_segment = 4
sourcetype = syslog
blacklist = .*.gz
disabled = 0
host = 

[monitor://$SPLUNK_HOME/var/log/splunk]
blacklist = *.gz
disabled = false

[monitor:///data/logs/rrsearch]
disabled = false
followTail = 0
host = 
host_regex = 
index = baseline_search
whitelist = .+processed.+gz$
sourcetype = Baseline Search
host_segment = 4

In indexes:

Index Name: baseline_search
Max Size: 500,000
Frozen Archive: None 
Current Size: 3,807
Event Count: 54,237,503
Earliest Event: May 13, 2012 7:59:59 PM
Latest Event: Jul 30, 2012 7:59:59 PM
Home Path: /opt/splunk/var/lib/splunk/baseline_search/db
App: search
Tags (2)
0 Karma
Highlighted

Re: Good Data Input .. No Indexing

Splunk Employee
Splunk Employee

Without being able to see your 'actual' input configuration I'll take a guess and say that you've got to make sure you're searching on index=baseline_search unless you've set your default indexes to include that one.

Include your inputs.conf and we may be able to get a bit further.

Highlighted

Re: Good Data Input .. No Indexing

Explorer

Finally got sudo access on the server, I updated the question.

0 Karma
Highlighted

Re: Good Data Input .. No Indexing

Legend

I'll give a nod to Lamar's answer, but I also notice that your whitelist doesn't match the filenames... You have

Whitelist: .+processed.+gz$

Which should be

Whitelist: .+parsed.+gz$

Highlighted

Re: Good Data Input .. No Indexing

Explorer

I put in the file names incorrectly. DOH

0 Karma
Highlighted

Re: Good Data Input .. No Indexing

Splunk Employee
Splunk Employee

I would first, clean up your input for the processed files.

There are a few issues with it --
First, the monitoring stanza won't pick up the data since the directory that you're monitoring is invalid (/data/logs)
Additionally, I would define the fourth segment in your monitor.
Lastly, I wouldn't put spaces in my sourcetype as Splunk doesn't respond well to spaces in sourcetypes.

Fixes Below:

[monitor:///data/splunk/rrsearch/*/]
disabled = false
index = baseline_search
whitelist = .+processed.+gz$
sourcetype = Baseline_Search
host_segment = 4

That should get you a little closer to where you want to be.

Hope it helps.

Highlighted

Re: Good Data Input .. No Indexing

Explorer

Thanks for the response, the data seems to at least be indexing now (updated in the body above), it just never appears on the Search page. Currently the only "Source type" is syslog, though there are seven other enabled data sources with files. Perhaps I am missing some step to get other source types to appear in the search?

0 Karma
Highlighted

Re: Good Data Input .. No Indexing

Splunk Employee
Splunk Employee

Yeah, you'll probably want to enable this index 'baseline_search' to be searched by default by your user/role.

http://docs.splunk.com/Documentation/Splunk/4.3.3/Admin/Addandeditroles

In particular, these two parameters:

srchIndexesDefault
srchIndexesAllowed

Highlighted

Re: Good Data Input .. No Indexing

Splunk Employee
Splunk Employee

I would be curious why you decided to segment this data off from your syslog data.

Again, just curious.

Highlighted

Re: Good Data Input .. No Indexing

Explorer

Adding a new role worked great!

Splunk will be used by manager/marketing types making reports and such. I wanted to make my search engine data as segregated as possible from any syslog data. The search engine data is scrubbed to disassociate individual IPs from their searches. Some of the data in syslog may contain individually identifiable information which they are strictly forbidden from viewing.

I can view the data because I have ethical standards 🙂

Thanks a lot for taking time to help me with this.

0 Karma