Deployment Architecture

Enormous data duplication

NewMilenium
Path Finder

Hello splunk community,

I'm experiencing an incredibly destructive phenomenon on splunk.
Symptoms : Splunk indexes more than 10 times the supposed amount of data it's supposed to index, and in a very, very short period of time.
Details : I've looked in details at a given _indextime (that I converted with convert ctime(_indextime) as idxtime ), and I got around 335 times the same event. It seems Splunk duplicated 335 times this event at a given time, and repeated those duplications, giving me around 478,000 events in 4 hours.

Context : I'm gonna describe here what I did lately that I think may have something to do with this.
In particular, here is how I'm working with Splunk lately; I've set props.conf and transforms.conf like this :

props.conf
[fortigate]
TRANSFORMS-fw_index = index1, index2, index3, etc...
TRANSFORMS-null = setnull

transforms.conf
;[index1]
;DEST_KEY = _MetaData:Index
;REGEX = [aDeviceID]
;FORMAT = index1

[setnull]
REGEX = [aDeviceID]|[AnotherDeviceID]|etc...
DEST_KEY = queue
FORMAT = nullQueue

So, I (think I'm supposed to) have all sources going to NULL, to be deleted. Anyway, my licence says so; it indexes nothing, like it's supposed to.
And when I need data to be indexed, I change transforms.conf; I uncomment the wanted stanzas indexN, and I remove the device_ID from the REGEX of the stanza [setnull]. Then I upload the transforms.conf and restart splunk. And the data comes into the wanted indexes.
A bit too much, though...

Tags (1)
0 Karma

krugger
Communicator

As it is a firewall log it can be some sort of attack, as any kind of flood attack will generate what will seem duplicate packets, because in the same second you will have thousand or more of packets being logged. Some firewalls even protect against this type of log trashing.

Another common cause for it is a splunk forwarder that is configured to send a whole log file to the a central splunk server. This also duplicates events and can be solved at the forwarder.

0 Karma

NewMilenium
Path Finder

Well, I've commented using "#" ; ILOVEPANDA point convinced me.
I tested with other sources; no data duplication today... And, more importantly, I've let the previously duplicated data come in again : it's not duplicated anymore now.

I'm starting to think this was an attack on our customer... But if anyone has any idea of what ELSE it could be, maybe any error from my configuration of splunk, just tell me, please!

0 Karma

Ayn
Legend

I think it's "working" in the sense that Splunk will ignore it due to that it's a syntax error. You could put ILOVEPANDAS at the start of the line too and that would "work". "#" is the valid character to really use for comments, though.

kristian_kolb
Ultra Champion

Well, if it works for you with ";" by all means use it.. I just don't think it's the official way of making comments.

If your sending party re-sends the whole log file each time, splunk would have a hard time detecting that. There is no file offset that Splunk can keep track of.

Perhaps you need to reconfigure how the log is sent from the source device.

0 Karma

NewMilenium
Path Finder

Dave > I had the impression that yes, it's sending from the start again, but from what I see, it's just duplicating the events happening at the very moment they appear... Yes, it's UDP sent on port 10000, for the example I've tried and shut off as soon as I discovered I was at 89% of the licence usage.
The events received by splunk are syslogs.

kristian > well, sorry to ask, but: why? It seems ";" is working; any side effect I wouldn't have noticed?

edit : yes, sadly, this is the exact SAME event the 335 times.
I'm trying with other sources now, to see if it's gonna happen again...

0 Karma

kristian_kolb
Ultra Champion

You should use "#" for comments in the conf files, not ";".

Not sure that it would matter here, though.

0 Karma

DaveSavage
Builder

I'm not familiar with the Fortigate kit, but when you set your confs back to allow indexing, could it be that the sending equipment is forwarding (or logs being forwarded) from the start again? If you are using UDP then this wouldn't be happening, I know.
Are there actually multiple occurences of the same event i.e. real duplicates...no differences at all? If that were the case, what are you using for the forward mechanism?

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...