Getting Data In

Loosing logs for a specific host in several hours.

dendel
Observer

Hi All.

Using Splunk for collecting logs from different devices.  But logs from on  devices on the network , is not present on the splunk server. After some hours, the logs from that device is appearing on the Splunk server again.  In that period, where we missed logs from this device, there has not been any network changes, og changes on the client. We are looking for reason for this. The logs were missing for around 6 hours. from early in the morning. 

Could it it be some memory issues on the server, or something with the index`es ? If there was some work for preparing for some kind of maintanence on the backen, could this have any effect on the Splunk server log preformance ? Device which we are missing logs from these hours, has been online all the time.

Any tips, how and where to look/ troublshoote in the Splunk enviroment when logs are not present from on or more hosts ?

Thanks in advance.

DD

 

Labels (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

There is too little information to even blindly guess.

Firstly, how are those events getting into your Splunk infrastructure? Do you have a UF installed on remote hosts and monitor file inputs deifned on them? Or maybe those are eventlog inputs? Or are you receiving syslog data over the network? Directly or using a third party syslog daemon?

Secondly, how did you verify that the data for those "outages" isn't ingested at all? Maybe the sources (or receivers) are getting clogged so your ingestion process stops for a while but then resumes and catches up but your data onboarding is incomplete so you don't have reliable timestamps?

There are many things that can go wrong.

dendel
Observer

Hi PickleRick.

Thank you for replying on the post.

Our devices are sending syslogs to Splunk server over the network (there has not been, network issues).

Secondly, we  our supplier noticed, that they was not recieving logs from one specefic host. And after some hours (approx 5), our Supplier was recieving logs from the specific host. While the supplier was not recieving logs from this host, they recieved a lot of logs from other hosts on our network. It happend around 04.47 (am) local time, on that time, there is not load on the network 

Our supplier is maintaining indexes, and system work. 

About the ingestion process stops, could that process stop for one host (one out of many), while the other hosts are not impacted ?

Brgds DD

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Well, we can't say retroactively what happened for sure. Syslog, especially UDP-transmitted one is sensitive to both network disruptions as well as receiver's performance.

If the receiving Splunk infrastructure listens for syslog directly with splunkd process, without external syslog daemons, that might have caused the receiver to be "overwhelmed" with a burst of data from other hosts and might have caused it to not process the incoming syslog data properly.

Performance is one of the reasons why in production environment you generally shouldn't listen for syslog data directly with Splunk process. You should use an external syslog daemon. See https://docs.splunk.com/Documentation/SVA/current/Architectures/Syslog for possible syslog ingestion architectures.

0 Karma

isoutamo
SplunkTrust
SplunkTrust
It's just like @PickleRick said, you should never use splunk as termination point for syslog! You should setup a real syslog server to avoid this kind of issues.

As said we haven't enough information yet to make any real guesses what have caused this issue.

But as there is only one server which has have this issue, I start to looking it first. Also check that is it in same network segment than those other. Has it been up and running this time? Any FW changes in it. Could it have any time issue or anything else related that time has changed and fixed later?
0 Karma

dendel
Observer

Hi Isoutamo.

Sorry for the late response, had some time off.

But If I share our Splunk enviroment, perhaps there would be some places you could recommend for troubleshooting. 

Our clients are sending syslogs through Netscaler enviroment to the syslogs servers.

Syslog servers are connected to the Splunk Index Cluster, which is connected to the Search Head cluster and Heavy Forwarders. 

Heavy forwarders are connected to Universal Forwarder. There was no changes in our network. The logs from this specific host was missing for about 5 hours. Earlier we had some issues with the indexes in our Splunk enviroment, which they are saying is fixed. In this case our syslogs servers have Netscaler between it self and the clients.

The logs are at the end send via Internet to a Cybersecurity enviroment.

Does this make the picture more clear, or can I send you our diagram over the Splunk envorment (some place where it is not public) ?

Brgds DD

0 Karma

PickleRick
SplunkTrust
SplunkTrust

1. Load-balancing syslogs usually doesn't work very well.

2. Your description doesn't make much sense in some points.

3. This is a public community where volutneers share their experience/expertise for common good. If you want a private audit - well, that's a service you're normally paying for. Reach out in your area for a friendly Splunk Partner and ask them to review your environment.

4. Without knowing both the details of your environment as well as seeing what really happened within your environment (checking internal logs if they haven't rolled out yet, maybe verifying some other logs and external monitoring) it's impossible to say what exactly happened.

What _might have_ happened is the usual - lack of connectivity, there was enough data buffered so that extra data just overflowed and didn't make it into the buffer. Maybe - as you're saying you had "some issues with indexes" - some data was indexed but got lost. We don't know. And it might or might not, depending on how much data you have on your environment still left, be something that's really only findable on-site by examining the environment in question.

As a side note vis-a-vis your sharing "somewhere where it's not private" - are you sure you're in position to freely disclose such information to a third party? Without a prior service agreement and possibly an NDA?

0 Karma

dendel
Observer

Hi Ricklepick.

Thanks for your info. We have a lot of data in our network. We lost logs form this device at approx 05:15 in the morning, local time. At that time, there isn`t a lot of traffic on our network. We had not excperinced any lack of connectivty in that period where we was missing these logs from this device. I

If it was the loadbalancer, then we should miss logs from more one device. Our syslogs sources are sending logs through the netscaler to syslog servers. Syslogservers are sending then the syslogs to SPlunk index Cluster, which are sending it to the Heavy forwarders.

Brgds DD

0 Karma

isoutamo
SplunkTrust
SplunkTrust
I totally agree with @PickleRick that this case should continue with some local company/ contractor. There are too many open items and one must see your real architecture and also logs to give answer to you.
0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Take Action Automatically on Splunk Alerts with Red Hat Ansible Automation Platform

 Are you ready to revolutionize your IT operations? As digital transformation accelerates, the demand for ...

Calling All Security Pros: Ready to Race Through Boston?

Hey Splunkers, .conf25 is heading to Boston and we’re kicking things off with something bold, competitive, and ...

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...

Financial services organizations face an impossible equation: maintain 99.9% uptime for mission-critical ...