I've struggled on this issue for the past few days and I can see to resolve it.
I've checked and rechecked my config.
None of my data gets indexed when my application is copied (with os path modification) to my linux testing environment.
The development instance of splunk on my laptop is (windows 7 - 64bit. Splunk 4.3.1) and i'm transfering over the completed app to my test environment(linux RHE 5.5 - Splunk 4.2.5 (build 113966)).
I have the following configuration as seen by splunk on my linux box and windows machine.
The difference being the monitor line. I've tried different variations on the monitor line path (ie. * / etc etc).
ie. on windows its
Linux config below.
splunk cmd btool --app=rca inputs list [monitor:///opt/input_data/rca/*] crcSalt = <SOURCE> disabled = false followTail = 0 host = rca index = rca sourcetype = rca splunk cmd btool --app=rca props list [rca] NO_BINARY_CHECK = 1 REPORT-rca_log-extractions = rca_log-extractions SHOULD_LINEMERGE = true TIME_FORMAT = %Y:%m:%d:%H:%M:%S TIME_PREFIX = ^A5, TRANSFORMS-rca_log_NoHeader = rca_log-NoHeader pulldown_type = 1 splunk cmd btool --app=rca transforms list [rca_log-NoHeader] DEST_KEY = queue FORMAT = nullQueue REGEX = [^A5] [rca_log-extractions] DELIMS = "," FIELDS = "start_Character","other fields,"IVR_end_char"
A couple of lines of sanitised sample data
I have used the tailing tool and watched the files be consumed.
I have also used a batch input+sinkhole approach on the same files and watched them be processed and disappear.
Its as if they get processed but are not then stored in the configured index ("rca").
Is there anything here in this config that would not work with the older version of splunk (4.2.5) ?
Or are there any tools to allow me to see where (if anywhere) these processed files are going?
edit: Added follow ups for the suggestions given
>MUS: does the index rca exists on your testing box and are you search that index by default?
Yes and no. I have to manually select it using an "index=rca" prefix for all my searches.
But I have checked the index size/number of objects via the manager. The rca index is totally empty.
rca 500,000 None 1 0 N/A N/A /opt/splunk/var/lib/splunk/rca/db system
>Kristian ">Go to Manager -> Access Controls -> Roles -> your_role and look at the bottom of the page. Verify that you have read access to the rca index (or All Non-internal indexes) "
Yes. I am logged in as "admin" and it has "all non-internal indexes" under my role.
>Kristian ">Another thing to check is to see if your timestamps are being parsed correctly. Run the following search (yes it starts with a pipe);"
This search returns zero results. The rca index has a zero event count in the manager\indexes interface.
>AYN "This blog post discusses and links to an excellent tool for checking the status of inputs: "
I ran the tool. Interestly it shows that it reads (and seems to get stuck- ie. forever on 14.61%) on one of the files.
Output is below.
For full status, visit: https://127.0.0.1:8089/services/admin/inputstatus/TailingProcessor:FileStatus Updated: Wed Jun 13 22:27:06 2012 (took 0.0 sec) Have seen 4 dirs. (+0) Finished with 22 tracked files. (+0) Currently reading 7 files. some open files (showing up to 5): /opt/splunk/var/log/splunk/metrics.log (100%) /opt/splunk/var/log/splunk/web_service.log (100%) /opt/splunk/var/log/splunk/btool.log (100%) /opt/splunk/var/log/splunk/splunkd_access.log (100%) /opt/input_data/rca/ain.321.2012053112_16 (14.61%) Ignoring 3 items. some of these files (showing up to 5 per type): batch processing: /opt/splunk/var/log/splunk/metrics.log.1 /opt/splunk/var/log/splunk/splunkd.log.1 /opt/splunk/var/log/splunk/splunkd.log.2
I renamed the source file to _17 yet it the script didnt show that it changed.
So i restarted splunk and ran it again.
Currently reading 1 files. some open files (showing up to 5): /opt/input_data/rca/ain.321.2012053112_17 (56.35%)
Now it sees the renamed file and started parsing it (because of the crcSalt setting) however still doesn't say 100%. The file is only about 10mb so it should take long at all for it to process. I'm not sure if this is just a script bug OR indicates that the file is only partially being read (file is currently static and doesn't change as its just sample data).
>Kristian "Another thing to fix is to use underscores instead of dashes in REPORT and TRANSFORMS statements.
If your log is single line events, you should set SHOULD_LINEMERGE=false
Not too sure about your nullQueue REGEX."
Done/done and removed. Restarted splunk instance to take up the new options.
The nullqueue entry was to remove the first line in the log file. It was a bogus line in the logs that isn't required.
In trying to cut back on the possible things that causing this data not to be indexed i've removed this nullqueue line so its not a factor in why I can't index my data.
If I totally scrap all my configs and manually use the gui (manager /data inputs) and just select csv as the sourcetype the data will index (totally incorrectly however).
edit 4: Finally got it working!
I still have no idea what the root issue was but im tipping it was in my props.conf file (can only have one underscore? )
My final working config.
inputs.conf :::::::::::::: [monitor:///opt/input_data/rca/*] disabled = false followTail = 0 index = rca host = rca sourcetype = rca crcSalt = <SOURCE> :::::::::::::: props.conf :::::::::::::: [rca] NO_BINARY_CHECK = 1 SHOULD_LINEMERGE = false TIME_FORMAT =%Y:%m:%d:%H:%M:%S TIME_PREFIX =^A5, pulldown_type = 1 REPORT-rca_extractions = rca_extractions :::::::::::::: transforms.conf :::::::::::::: [rca_extractions] DELIMS="," FIELDS="blah1","blah2"
As MuS says - have you checked that the index exists, is writable by the splunk process (i.e. file system rights) and is searched by default (or you can search for it with
Go to Manager -> Access Controls -> Roles -> your_role and look at the bottom of the page. Verify that you have read access to the
rca index (or
All Non-internal indexes)
Another thing to check is to see if your timestamps are being parsed correctly. Run the following search (yes it starts with a pipe);
| metadata type=hosts index=rca | eval firstTime = strftime(firstTime, "%Y-%m-%d %H:%M:%S")| eval lastTime = strftime(lastTime, "%Y-%m-%d %H:%M:%S")| eval recentTime = strftime(recentTime, "%Y-%m-%d %H:%M:%S")
It should list the number of events in the
rca index, along with the time information for the events.
Not too sure about your nullQueue REGEX. If you want to keep all events that start with 'A5' and throw away the rest you should look at the following section of the docs:
Another thing to fix is to use underscores instead of dashes in REPORT and TRANSFORMS statements.
REPORT-blaha-blaha = asdf-asdf
REPORT-qwer_qwer = jkl_jkl
If your log is single line events, you should set
Hope this helps,