Getting Data In

Why is a monitored file behaving like a batch file on a Splunk 6.2.1 Universal forwarder?

eallanjr
Explorer

I have a monitored file input for a .tsv file that gets updated via a SQL query every hour. However, the data is only showing up in the index periodically (haven't been able to determine the frequency, but it isn't hourly like it should be). If I restart the forwarder I see the TailingProcessor add a watch, but the file subsequently gets handled by the BatchReader as shown in the log snippet below. For other files inputs using the [monitor://...] stanza I don't see any log entries related to the BatchReader, any ideas why this one is being treated any differently? Universal forwarder is version 6.2.1.

# grep metrics5.tsv /opt/splunkforwarder/var/log/splunk/splunkd.log
04-27-2015 09:46:13.653 -0400 INFO  TailingProcessor - Parsing configuration stanza: monitor:///data/log/hadoop_job_metrics/metrics5.tsv.
04-27-2015 09:46:13.653 -0400 INFO  TailingProcessor - Adding watch on path: /data/log/hadoop_job_metrics/metrics5.tsv.
04-27-2015 09:46:13.660 -0400 INFO  BatchReader - Removed from queue file='/data/log/hadoop_job_metrics/metrics5.tsv'.
04-27-2015 10:01:47.734 -0400 INFO  BatchReader - Removed from queue file='/data/log/hadoop_job_metrics/metrics5.tsv'.

The API also indicates it is being read in batch mode:
https://localhost:8089/services/admin/inputstatus/TailingProcessor%3AFileStatus

/data/log/hadoop_job_metrics/metrics5.tsv   
file position   23783042
file size   23783042
percent 100.00
type    done reading (batch)

inputs.conf:

[monitor:///data/log/hadoop_job_metrics/metrics5.tsv]
disabled = false
sourcetype = hadoop_job_metrics_v2
index = main
crcSalt = <SOURCE>

props.conf:

[hadoop_job_metrics_v2]
FIELD_DELIMITER = tab
FIELD_NAMES = JOB_ID,JOB_STATUS,JOB_FAILED_MAP_ATTEMPTS,JOB_FAILED_REDUCE_ATTEMPTS,JOB_FILE_BYTES_WRITTEN,JOB_FINISHED_MAP_TASKS,JOB_FINISHED_REDUCE_TASKS,JOB_PRIORITY,JOB_TOTAL_LAUNCHED_MAPS,JOB_TOTAL_LAUNCED_REDUCES,JOB_CPU_MILLISECONDS,MAP_CPU_MILLISECONDS,RED_CPU_MILLISECONDS,JOB_MAPRFS_BYTES_READ,MAP_MAPRFS_BYTES_READ,RED_MAPRFS_BYTES_READ,JOB_MAPRFS_BYTES_WRITTEN,MAP_MAPRFS_BYTES_WRITTEN,RED_MAPRFS_BYTES_WRITTEN,JOB_PHYSICAL_MEMORY_BYTES,MAP_PHYSICAL_MEMORY_BYTES,RED_PHYSICAL_MEMORY_BYTES,JOB_VIRTUAL_MEMORY_BYTES,MAP_VIRTUAL_MEMORY_BYTES,RED_VIRTUAL_MEMORY_BYTES,JOB_NAME,PARENT_JOB_ID,USER_SUBMITTED,TIME_SUBMITTED,TIME_STARTED,TIME_FINISHED,CLUSTER_ID,CREATED
HEADER_FIELD_DELIMITER = tab
INDEXED_EXTRACTIONS = tsv
KV_MODE = none
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = CREATED
category = Structured
description = Tab-separated value format. Set header and other settings in "Delimited Settings"
disabled = false
pulldown_type = true

balaji_venkat
Explorer

Any file which is greater than 25 MB in size while processing in the Universal Forwarder will be automatically assigned to BatchReader.
This is answered below already

https://answers.splunk.com/answers/109779/when-is-the-batchreader-used-and-when-is-the-tailingproces...

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Build the Future of Agentic AI: Join the Splunk Agentic Ops Hackathon

AI is changing how teams investigate incidents, detect threats, automate workflows, and build intelligent ...

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...