Knowledge Management

Duplicate events in Splunk but not in logs

bwouters
Path Finder

Hi all

My Splunk instance is monitoring one file for new data and adds it to its database.
From these events, I build a world map dashboard.

Example of log file entry:

2018-12-20 10:25:12,938 TRACE [233] HttpInterface - [13.14.10.116] [RequestId = ce47b6e2-ffd9-4408-9b34-5e661e9f9278] HTTP request received from 123.456.3.119:96350
Method = Post
Uri = http://name.goes.here/url/url/url/url/parameter,parameter?output=json&Id=01e363e041420b915134c592c23...
Headers = 
  X-Forwarded-For: 444.333.222.111
  X-Forwarded-Proto: http
  X-Real-IP: 444.333.222.111
  Connection: close
  Content-Length: 101
  Content-Type: application/xml
  Accept-Encoding: gzip
  Cookie: load-balancer-token=351
  Host: name.goes.here
  User-Agent: blablablabla(moreblablabla)/1.0
Body = <?xml version="1.0" encoding="utf-8"?><Create><Id>37</Id></Create>

This is my search query:

sourcetype="Logs" |
rex "X-Real-IP: (?<Real_IP>(\d|\.)+)" |
iplocation Real_IP |
lookup geo_countries latitude AS lat longitude AS lon OUTPUT featureId AS country |
stats count as input by country |
sort -input |
eval input = country + " - " + input |
geom geo_countries featureIdField=country

Although in my logs I only have one such entry, for some reason it's logging that 1 entry .. 200 times ...
Some times more, some times less.

I can't seem to figure out why that is.
Any ideas where I can start looking?

Update
I checked my SourceType (using the Splunk UI) and I'm breaking my events with a regex which is only set to 'TRACE'.
The reason behind is that every piece of seperate loggin starts with a line that contains the date, time and TRACE. The only consistent there is TRACE.
This worked fine for a looong time but only started to break since a few weeks. First I thought it was the log file but now it seems that it's my configuration somehow.

Tags (1)
0 Karma

dkeck
Influencer

Hi,

Do you see any errors regarding this file in splunkd.log? Something like "will read entire file again"?

0 Karma

bwouters
Path Finder

Hi @dkeck

I grepped the log data from around the same timestamp as my example

Before 10:25:12,938

12-20-2018 10:25:05.866 +0100 WARN  LineBreakingProcessor - Truncating line because limit of 10000 bytes has been exceeded with a line length >= 12728 - data_source="/opt/splunk/share/Splunk_new.log", data_host="SPLUNK", data_sourcetype="Logs"

It seems that he's rereading the same log over and over again because there are lines in there that exceed the 10000 bytes line limit?

EDIT
Didn't grep for the date so I added way too old data.

0 Karma

dkeck
Influencer

Thats what I thought.

Typically it means that your file, rolled over. Splunk sees that there's a new file but it looks different from the old file that it had been monitoring. It assumes that your log rolled and so it starts reading it from the beginning.

0 Karma

bwouters
Path Finder

So I cleared out the whole file and removed the rolled over log files (so now there is only one).

I did some more tests to populate the monitored file (now the file is only 10kb) and again..

12-20-2018 15:23:57.015 +0100 INFO  WatchedFile - Checksum for seekptr didn't match, will re-read entire file='/opt/splunk/share/Splunk_new.log'.
12-20-2018 15:23:57.016 +0100 INFO  WatchedFile - Will begin reading at offset=0 for file='/opt/splunk/share/Splunk_new.log'.

So I don't think it's a problem with rolling over files. It seems that, when appending to the same file that he is checking the CRC every time.

0 Karma

bwouters
Path Finder

When I check from the last 10 minutes, I see 26 re-reads..
And from today already 333

There are lines that are super long.. but probably this was only done by development for testing purposes.
Could it fix my problem by cleaning the files? Or will this be re-occurring problem?

0 Karma

dkeck
Influencer
0 Karma

bwouters
Path Finder

I read through the page but I'm not entirely sure how this can help me?

0 Karma

bwouters
Path Finder

Google told me that it could be something related to CrcSalt settings but I haven't got it configured.

[monitor:///opt/splunk/share/Splunk_new.log]
disabled = false
index = main
sourcetype = SessionLogs

Although there is a file rotation configured.. (there are in total 4 files in the same directory)

Splunk_new.log
Splunk_new.log.1
Splunk_new.log.2
Splunk_new.log.3
0 Karma
Get Updates on the Splunk Community!

Introducing Ingest Actions: Filter, Mask, Route, Repeat

WATCH NOW Ingest Actions (IA) is the best new way to easily filter, mask and route your data in Splunk® ...

Splunk Forwarders and Forced Time Based Load Balancing

Splunk customers use universal forwarders to collect and send data to Splunk. A universal forwarder can send ...

NEW! Log Views in Splunk Observability Dashboards Gives Context From a Single Page

Today, Splunk Observability releases log views, a new feature for users to add their logs data from Splunk Log ...