Getting Data In

Missing data in ingested log

phamxuantung
Communicator

Hello,

I have an input config to monitor file output of Golden Gate (basically content of a Oracle database output as text file in real time). The file is continously write then switch to a new file in a chunk of 10'. The number of event is quite large, also the speed of writing (around 100mil events a day, ~600 events/sec at peak)

How do I know the log event is missing: 

Because I counted the number of events in DB and counted the number of events in the log ingested. For example from 11:00:00 to 12:00:00, the ingested log have 1mil events, while the database have 2 mil rows.

The monitored config is as follow:

[monitor://outputfile/TEXT/RDB22/ISO/*/*/*/*]  #The log file save in a partten of year/month/day/hour

disabled = false

sourcetype = mycustomsourcetype

index = mycustomindex

ignoreOlderThan = 20m

The sourcetype is as follow

[mycustomsourcetype]
SHOULD_LINEMERGE = false
LINE_BREAKER = (\r?\n)+I\|
INDEXED_EXTRACTIONS = CSV
FIELD_DELIMITER = |
HEADER_FIELD_LINE_NUMBER = 0
KV_MODE= none
TRUNCATE = 999999
TIME_PREFIX = \|(?=20\d{2}-\d{2}-\d{2}\s)
TIME_FORMAT = %Y-%m-%d %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 19

Any help with how to troubleshoot this problem would be very appriciate.

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @phamxuantung 

When you say you have checked the number of events in the log ingested, are you checking this for the number of events in Splunk or number of lines in the log file? 

Assuming that an event is a single line in the log file, you could do something like this to get the number of events in the log file:

wc -l <pathToLog>

This will provide the number of lines in the log, you should then to a check in Splunk:

| tstats count where index=yourIndex source=pathToLog

Please can you check these values and let us know what you get back?

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to January Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...

[Puzzles] Solve, Learn, Repeat: Reprocessing XML into Fixed-Length Events

This challenge was first posted on Slack #puzzles channelFor a previous puzzle, I needed a set of fixed-length ...

Data Management Digest – December 2025

Welcome to the December edition of Data Management Digest! As we continue our journey of data innovation, the ...