Getting Data In

Is there a known issue with importing a large number of logs at once?

tonyparreiro
Explorer

Hi,

I have setup a file/dir import input to look at a folder and injest the contents of the log files into splunk, there are a huge number of existing files (5000+) I'd like to import to analyse for history going back 10 years.

What I have noticed is that there appear to be large gaps in the data over periods of time over the last 10 years. When I query for the source as the files in the missing time period there is no data for that file which shows up, but it's marked in the system logs as being imported. The data in the file looks ok, so not sure why it wasn't imported. The only thing that I could thing was that because I copied a large number of files into the folder at once it may have gotten something confused in the indexing process, but I'm surprised if that was the case.

I have setup another index and I'm now drip feeding the log files into a folder to see if it still has issues with the same time periods as before.

Is there any other info on a best practice to import a large number of existing log files?

Thanks.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi tonyparreiro,
Have you verified that the imported logs don't exceed the dimension of your index or the retention period (if you defined it)?

the second check to do is: have you performed your search immediately or after a period? because if you acquired logs using a Forwarder it need time to send all data to the indexer.

If you want to reindex files, remember that you have to use crcSalt = <SOURCE> in your inputs.conf otherwise Splunk doesn't reindex files also deleting them from an index.

Bye.
Giuseppe

0 Karma

tonyparreiro
Explorer

Hi Giuseppe,

Thanks for the reply.

The index is configured as per the default for Splunk, max size of 500Gb and don't believe there is a limit on retention.

The files are on the indexer, I've done the search many days after the original import and has not changed.

Probably should have tried the crSalt option.

The weird thing is there is data before and after the missing sections, very weird, I initially thought it may have been something to do with the original source files maybe not being formated quite right or something but I looked at ones that worked and ones that didn't and couldn't really find a problem with them.

At the moment I'm dropping one file into the monitored folder every minute and it appears to be picking up everything, just have to wait for it catch finished dropping in another 6 years worth of files to finish.

Tony

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...