Getting Data In

Data not being indexed

Lucas_K
Motivator

I've struggled on this issue for the past few days and I can see to resolve it.

I've checked and rechecked my config.

None of my data gets indexed when my application is copied (with os path modification) to my linux testing environment.

The development instance of splunk on my laptop is (windows 7 - 64bit. Splunk 4.3.1) and i'm transfering over the completed app to my test environment(linux RHE 5.5 - Splunk 4.2.5 (build 113966)).

I have the following configuration as seen by splunk on my linux box and windows machine.
The difference being the monitor line. I've tried different variations on the monitor line path (ie. * / etc etc).

ie. on windows its

[monitor://c:\opt\input_data\rca\*]

Linux config below.

splunk cmd btool --app=rca inputs list
[monitor:///opt/input_data/rca/*]
crcSalt = <SOURCE>
disabled = false
followTail = 0
host = rca
index = rca
sourcetype = rca

splunk cmd btool --app=rca props list
[rca]
NO_BINARY_CHECK = 1
REPORT-rca_log-extractions = rca_log-extractions
SHOULD_LINEMERGE = true
TIME_FORMAT = %Y:%m:%d:%H:%M:%S
TIME_PREFIX = ^A5,
TRANSFORMS-rca_log_NoHeader = rca_log-NoHeader
pulldown_type = 1

 splunk cmd btool --app=rca transforms list
[rca_log-NoHeader]
DEST_KEY = queue
FORMAT = nullQueue
REGEX = [^A5]
[rca_log-extractions]
DELIMS = ","
FIELDS = "start_Character","other fields,"IVR_end_char"

A couple of lines of sanitised sample data

A5,2012:05:31:11:53:36,AAAA,5,6,1111,1111,2012:05:31:11:50:58,1111,1111,1111,1111,,1111,111,100,111,11111111,,,11111111,0.00,,0.000,0.00,0.00,0,,1111,1,1,DA1111,31113,1111,11,1111,111111111,"DETAILS",2,DA11111,111111,1111,11,11,1111,,0,,111,1111,1111,,5A
A5,2012:05:31:11:57:58,AAAB,5,6,1111,1111,2012:05:31:11:51:58,1111,1111,1111,1111,,1111,111,100,111,11111111,,,11111111,0.00,,0.000,0.00,0.00,0,,1111,1,1,DA1111,31113,1111,11,1111,111111111,"DETAILS",2,DA11111,111111,1111,11,11,1111,,0,,111,1111,1111,,5A

I have used the tailing tool and watched the files be consumed.

I have also used a batch input+sinkhole approach on the same files and watched them be processed and disappear.

Its as if they get processed but are not then stored in the configured index ("rca").

Is there anything here in this config that would not work with the older version of splunk (4.2.5) ?

Or are there any tools to allow me to see where (if anywhere) these processed files are going?

Thanks.

edit: Added follow ups for the suggestions given

>MUS: does the index rca exists on your testing box and are you search that index by default?

Yes and no. I have to manually select it using an "index=rca" prefix for all my searches.
But I have checked the index size/number of objects via the manager. The rca index is totally empty.

rca 
500,000
None    
1
0
N/A
N/A
/opt/splunk/var/lib/splunk/rca/db   
system   

>Kristian ">Go to Manager -> Access Controls -> Roles -> your_role and look at the bottom of the page. Verify that you have read access to the rca index (or All Non-internal indexes) "

Yes. I am logged in as "admin" and it has "all non-internal indexes" under my role.

>Kristian ">Another thing to check is to see if your timestamps are being parsed correctly. Run the following search (yes it starts with a pipe);"

This search returns zero results. The rca index has a zero event count in the manager\indexes interface.

>AYN "This blog post discusses and links to an excellent tool for checking the status of inputs: "

I ran the tool. Interestly it shows that it reads (and seems to get stuck- ie. forever on 14.61%) on one of the files.

Output is below.

For full status, visit:
  https://127.0.0.1:8089/services/admin/inputstatus/TailingProcessor:FileStatus

Updated: Wed Jun 13 22:27:06 2012 (took 0.0 sec)
Have seen 4 dirs. (+0)
Finished with 22 tracked files. (+0)

Currently reading 7 files.
  some open files (showing up to 5):
    /opt/splunk/var/log/splunk/metrics.log      (100%)
    /opt/splunk/var/log/splunk/web_service.log  (100%)
    /opt/splunk/var/log/splunk/btool.log        (100%)
    /opt/splunk/var/log/splunk/splunkd_access.log       (100%)
    /opt/input_data/rca/ain.321.2012053112_16    (14.61%)

Ignoring 3 items.
  some of these files (showing up to 5 per type):
    batch processing:
      /opt/splunk/var/log/splunk/metrics.log.1
      /opt/splunk/var/log/splunk/splunkd.log.1
      /opt/splunk/var/log/splunk/splunkd.log.2

I renamed the source file to _17 yet it the script didnt show that it changed.
So i restarted splunk and ran it again.

Currently reading 1 files.
  some open files (showing up to 5):
    /opt/input_data/rca/ain.321.2012053112_17    (56.35%)

Now it sees the renamed file and started parsing it (because of the crcSalt setting) however still doesn't say 100%. The file is only about 10mb so it should take long at all for it to process. I'm not sure if this is just a script bug OR indicates that the file is only partially being read (file is currently static and doesn't change as its just sample data).

>Kristian "Another thing to fix is to use underscores instead of dashes in REPORT and TRANSFORMS statements.
Fixed

If your log is single line events, you should set SHOULD_LINEMERGE=false
Also fixed.
Not too sure about your nullQueue REGEX."

Done/done and removed. Restarted splunk instance to take up the new options.

No change.

The nullqueue entry was to remove the first line in the log file. It was a bogus line in the logs that isn't required.
In trying to cut back on the possible things that causing this data not to be indexed i've removed this nullqueue line so its not a factor in why I can't index my data.

edit 3:

If I totally scrap all my configs and manually use the gui (manager /data inputs) and just select csv as the sourcetype the data will index (totally incorrectly however).

edit 4: Finally got it working!

I still have no idea what the root issue was but im tipping it was in my props.conf file (can only have one underscore? )

My final working config.

   inputs.conf
::::::::::::::
[monitor:///opt/input_data/rca/*]
disabled = false
followTail = 0
index = rca
host = rca
sourcetype = rca
crcSalt = <SOURCE>
::::::::::::::
props.conf
::::::::::::::
[rca]
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false
TIME_FORMAT =%Y:%m:%d:%H:%M:%S
TIME_PREFIX =^A5,
pulldown_type = 1
REPORT-rca_extractions = rca_extractions
::::::::::::::
transforms.conf
::::::::::::::
[rca_extractions]
DELIMS=","
FIELDS="blah1","blah2"
Tags (3)
0 Karma
1 Solution

Lucas_K
Motivator

Its all working now.

No idea of the root cause however.

View solution in original post

0 Karma

Lucas_K
Motivator

you caught part of what I missed in hiding my sanitation of data and label names 😉

0 Karma

kristian_kolb
Ultra Champion

The [rcacaf_extractions] in transforms.conf is not properly referenced in props.conf - where it's called rca_extractions - spelling mistake?

/k

0 Karma

Lucas_K
Motivator

Its all working now.

No idea of the root cause however.

0 Karma

kristian_kolb
Ultra Champion

As MuS says - have you checked that the index exists, is writable by the splunk process (i.e. file system rights) and is searched by default (or you can search for it with index=rca ).

Go to Manager -> Access Controls -> Roles -> your_role and look at the bottom of the page. Verify that you have read access to the rca index (or All Non-internal indexes)

Another thing to check is to see if your timestamps are being parsed correctly. Run the following search (yes it starts with a pipe);

| metadata type=hosts index=rca | eval firstTime = strftime(firstTime, "%Y-%m-%d %H:%M:%S")| eval lastTime = strftime(lastTime, "%Y-%m-%d %H:%M:%S")| eval recentTime = strftime(recentTime, "%Y-%m-%d %H:%M:%S")

It should list the number of events in the rca index, along with the time information for the events.


UPDATE:

Not too sure about your nullQueue REGEX. If you want to keep all events that start with 'A5' and throw away the rest you should look at the following section of the docs:

http://docs.splunk.com/Documentation/Splunk/4.3.1/Deploy/Routeandfilterdatad#Keep_specific_events_an...

Another thing to fix is to use underscores instead of dashes in REPORT and TRANSFORMS statements.

BAD:
props.conf

REPORT-blaha-blaha = asdf-asdf

transforms.conf

[asdf-asdf]

GOOD:
props.conf

REPORT-qwer_qwer = jkl_jkl

transforms.conf

[jkl_jkl]

UPDATE 2:

If your log is single line events, you should set SHOULD_LINEMERGE=false

Hope this helps,

Kristian

Lucas_K
Motivator

awesome tips. thanks!!! Added all my responses to your suggestions to the original post.

0 Karma

kristian_kolb
Ultra Champion

updated - fixed some bad advice initially given..
/K

More updates.

0 Karma

Ayn
Legend

This blog post discusses and links to an excellent tool for checking the status of inputs: http://blogs.splunk.com/2011/01/02/did-i-miss-christmas-2/

MuS
Legend

hi koops, does the index rca exists on your testing box and are you search that index by default?

Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...