Getting Data In

Can someone please explain what is seekaddress and seekcrc in CRC?

blbr123
Path Finder

Hi All,

Can someone please explain what is seekaddress and seekcrc in CRC in simple terms.

I tried to check documentation but looks quit confusing.

Read the below scenario but Little confused.

The CRC from the file beginning in the database has no matching record, indicating a file that Splunk hasn’t seen before. Splunk picks it up and ingests its data from the start of the file and updates the database with the new CRCs and Seek Addresses as it ingests the file.

Tags (1)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

When Splunk is monitoring a file, it regularly re-reads the first 256 bytes (configurable in inputs.conf) to make sure the file hasn't been rewritten. Those 256 bytes pass through an algorithm to produce a numeric value, called the seekcrc (not unlike a hash).  As the file is read, Splunk remembers the current position within the file ("seekaddress") so it can pick up where it left off after a restart.

See https://www.splunk.com/en_us/blog/tips-and-tricks/what-is-this-fishbucket-thing.html

---
If this reply helps you, Karma would be appreciated.
0 Karma

blbr123
Path Finder

So when we mention CRC = <source> in inputs what actually happens.

I have created a monitor stanza for one source and it isn't sending logs to splunk.

When I checked internal logs it says failed to read file as it is too short check CRC something like that.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Firstly, it's <SOURCE>, not <source> (the case of the letters is important here).

Secondly - it means that the filename is appended to the CRC value so even if you have two files with the same header but different path they will not be considered as the same file by the input. Why would you want that? Because some files can have the same beginning part but differ somewhere later (typical use case - an app creates a new file every time it is restarted and each log file starts with the same report about the app's starting process like loading libraries and so on).

This option is rarely used but it's there in case you need it.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The initCRC = <source> setting adds the name of the input file to the algorithm used to compute the CRC.  It helps prevent duplicate CRCs.

---
If this reply helps you, Karma would be appreciated.
0 Karma

PickleRick
SplunkTrust
SplunkTrust

Ok. We are a monitor input. We see a new file. It might have been just created, it might have been renamed from another name within the same directory. We don't know that.

Firstly we check whether the filename is allowed by whitelists/blacklists combination and age limit.

If so, we're reading a beginning of the file and calculate CRC from the "header" of the file. We check the index of known files - so called fishbucket to see if we already know this CRC.

If we know this CRC it means we've already seen this file (maybe with another filename) so we're checking for the remembered position within the file where we last read its contents. And we resume reading from that position.

If we don't it's a completely new file and we start reading from the beginning.

As we're reading the file we update the remembered position within the file stored in the fishbucket so next time we encounter some file we can repeat the process.

That's a bit simplified description of how it works.

Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...