Summary index issue to collect the data

uagraw01 · ‎08-16-2024

Hello Splunkers!!

As per the below screenshot, you can see jobs are running fine. But events are not collecting into summary index. Please help me to suggest some potential reason and fixes

Scheduled search with push data to summary index.

PickleRick · ‎08-16-2024

And how did you determine that the events are not collected? The typical issue with events which seem to be not collected (when the status does show returned events which should have been collected) is when there is something wrong with timestamps so that the events are collected and indexed but are put somewhere (or rather somewhen ;-)) else than you expect them to be.

Check your | tstats count on summary index over all time before and after you run the collecting search. This will tell you if your index grows.

uagraw01 · ‎08-17-2024

@PickleRick As per the below screenshot I can see huge delays in the indexing. So is this the cause that data is not visible on time.

What actions I need to perform for summary index?

ITWhisperer · ‎08-17-2024

What delays do you get for your source data?

uagraw01 · ‎08-17-2024

@PickleRick @ITWhisperer

I can see there is a huge delayed in hours in the source data which fills the summary index is around 8.67 hours.

Green arrows: To showcase the index and event time

Below is the attributes I am using in props.

DATETIME_CONFIG =
KV_MODE = xml
NO_BINARY_CHECK = true
CHARSET = UTF-8
LINE_BREAKER = <\/eqtext:EquipmentEvent>()
crcSalt = <SOURCE>
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
MAX_TIMESTAMP_LOOKAHEAD = 754
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3QZ
TIME_PREFIX = \<\/State\>\<eqtext\:EventTime\>
SEDCMD-first = s/^.*<eqtext:EquipmentEvent/<eqtext:EquipmentEvent/g
category = Custom
pulldown_type = true
TZ = UTC

=====================================

Sample logs I am attaching below.

<eqtext:EquipmentEvent xmlns:eqtext="http://Asas.com/FM/EqtEvent/EqtEventExtTypes/V1/1/5" xmlns:sbt="http://Asas.com/FM/Common/Services/ServicesBaseTypes/V1/8/4" xmlns:eqtexo="http://Asas.com/FM/EqtEvent/EqtEventExtOut/V1/1/5"><eqtext:ID><eqtext:Location><eqtext:PhysicalLocation><AreaID>7073</AreaID><ZoneID>33</ZoneID><EquipmentID>81</EquipmentID><ElementID>0</ElementID></eqtext:PhysicalLocation></eqtext:Location><eqtext:Description> Applicator tamper is jammed</eqtext:Description><eqtext:MIS_Address>0.1</eqtext:MIS_Address></eqtext:ID><eqtext:Detail><State>WENT_OUT</State><eqtext:EventTime>2024-08-16T12:14:24.843Z</eqtext:EventTime><eqtext:MsgNr>6232609270406364028</eqtext:MsgNr><Severity>LOW</Severity><eqtext:OperatorID>WALVAU-SCADA-1</eqtext:OperatorID><ErrorType>TECHNICAL</ErrorType></eqtext:Detail></eqtext:EquipmentEvent>

<eqtext:EquipmentEvent xmlns:eqtext="http://Asas.com/FM/EqtEvent/EqtEventExtTypes/V1/1/5" xmlns:sbt="http://Asas.com/FM/Common/Services/ServicesBaseTypes/V1/8/4" xmlns:eqtexo="http://Asas.com/FM/EqtEvent/EqtEventExtOut/V1/1/5"><eqtext:ID><eqtext:Location><eqtext:PhysicalLocation><AreaID>7073</AreaID><ZoneID>33</ZoneID><EquipmentID>81</EquipmentID><ElementID>0</ElementID></eqtext:PhysicalLocation></eqtext:Location><eqtext:Description> Applicator tamper is jammed</eqtext:Description><eqtext:MIS_Address>0.1</eqtext:MIS_Address></eqtext:ID><eqtext:Detail><State>ACK_BY_SYSTEM</State><eqtext:EventTime>2024-08-16T12:14:24.843Z</eqtext:EventTime><eqtext:MsgNr>6232609270406364028</eqtext:MsgNr><Severity>LOW</Severity><eqtext:OperatorID>WALVAU-SCADA-1</eqtext:OperatorID><ErrorType>TECHNICAL</ErrorType></eqtext:Detail></eqtext:EquipmentEvent>

Please help me what I can do fix it.

ITWhisperer · ‎08-17-2024

The Time column shown is the local time for the UTC time in the event which appears to be 4 hours different. This does not show you the index time of the event, merely how the time field has been interpreted from the event at ingestion time.

You need to do the same calculation you did for the summary index i.e. _indextime - _time to find out the lag between the event time and the index time to see if this is the "source" of your "delay" - note this is not really the true source of the delay, if it is significant e.g. over 1hr 45 minutes, this could be the reason why you are not getting the events into your summary index.

For example, if you have an event with a time of 01:15am, it would have to have been indexed by 02:45am in order for it to appear in the report which is populating the summary index for 01:00am to 02:00am

uagraw01 · ‎08-17-2024

@ITWhisperer

I ran query for on the source data which fills the summary index and below is the results.

ITWhisperer · ‎08-17-2024

So this is the reason why you are missing summary data. There could be a number of reasons for this difference.

It could be that there is a delay in your infrastructure such that it takes a long time between the event being written to the log which is being ingested
It could be that the application is writing events with an event time which is many hours prior to the time it is written to the log

You should investigate this. If this is not something that can be fixed, then you could look at your summary index population searches to take these delays into account e.g. running "back fill" search that populate your summary index with these "delayed" events. You would need to be careful about "double-counting" events which have already been included in earlier populations of the summary index

uagraw01 · ‎08-17-2024

@ITWhisperer

I have removed so many duplicates events. Because of it delta_time difference is decreased to 1.9 hours as compared to yesterday. Is the duplicate events also be the potential cause ?

PickleRick · ‎08-18-2024

OK. It's starting to get a bit silly. A community is meant to be a help by users for other users. Help in learning the platform and what it can do, checking if your train of thought is correct and so on. It is _not_ meant as a free support service. And you're trying to do just that - get your problem solved without trying to understand the underlying issue and providing almost no information about it.

You obviously have _some_ problem with your data ingestion process. What it is? We don't know. It's something that should be examined on your site locally by someone who can verify the data as it is ingested into Splunk, who can check the settings across your Splunk infrastructure and who can talk with administrators of your sources to verify the settings on their side and what and how they produce the data you're ingesting into Splunk.

This is not something you can do by asking single questions on Answers without any significant effort on your side (true, sometimes Answers can be helpful in diagnostics when the asking person does quite a lot of work on their own and only needs some gentle hints now and then). This is something a skilled Splunk engineer would probably diagnose in a relatively short time compared to ping-ponging scraps of information to Answers and back.

People on Answers are volunteers who use their spare time to help others. But that doesn't mean that they are free support service. You want some effort from them - show some serious effort on your side as well. Make the problem interesting, not frustrating because you're asking about stuff they have no idea of knowing because it's your internal information.

uagraw01 · ‎08-18-2024

@PickleRick Thanks for your help so far. I have received this unpleasant response 2 times from you .Let me tell you that I do not post queries to waste anyone's time. If you don't want the response on my queries then please don't respond. But this kind of reply from you makes me feel more embarrassed that I am really wasting people's time in Splunk Answers platform. You are not working for me and I am not working for you. According to me, this is a platform where I can ask my query, whoever wants to respond to it should do so.

PickleRick · ‎08-18-2024

I'm not saying you're wasting people's time deliberately. It's just that this is one of those cases where someone (in this case you) asks one thing without giving much background info, then it leads to more and more problems and issues the poster is either unaware of or is not willing to share and only keeps insisting on providing a solution based on a very small piece of the actual information needed for such troubleshooting.

I didn't mean to be rude against you but you're repeatedly asking "how to fix that" without actually digging into what we're suggesting. You do some random things (like "removing duplicates" whatever that should mean) instead of really investigating the issue. And then ask "is this the potential cause".

We're trying to help here but it quickly gets frustrating. I understand that people have different skill levels and knowledge but you're doing completely different things that are suggested to you and end up asking "why is it so?". That's why I'm saying that this is something you normally pay people for - they come to you, they do things _for you_ and everybody's happy. I cannot say for others but I'm usually trying to be helpful and friendly and if you check other threads when I'm active I take my time to explain my answers so that people not only know _what_ to do but also _why_ it works but in this case... well, if we're telling you "check your f...ascinating sources" then please do check your sources. You can't fix reality - if the sources do send you wrong data, you'll end up with wrong data. No amount of "removing duplicates" will fix that. So don't take it personally, because I don't know you and I don't know who you are. All I know is that this thread as it is leads nowhere for now. That's why I wrote that it's frustrating and it's all getting silly.

Of course we could point you to the docs and tell you - here's what should be configured, apparently something is not done properly (most of the time the answer really _is_ in the docs or your config/data) but we're not doing that. But in return we'd (ok I'd) expect some serious effort on your side. Not some random bits and pieces, jumping from one index to another and dropping some screenshots which tell us absolutely nothing. Honestly, I find it more frustrating than if you simply asked "ok, guys, I have no idea what you're talking about, can you explain that?".

uagraw01 · ‎08-18-2024

@PickleRick

I know that my way of asking queries is wrong in bits and pieces and a master like you did not like it. I value the Splunk Answers platform and I am also familiar with the contributions you have been making to users on the Splunk Answers platform over the years.

You could have simply told me you don't want to respond on my half of the details post. I have been posting at least 100 + queries on Splunk answers throughout my Splunk career and I have not received a reply like today. You are such a valuable member of the Splunk trust, your reply has shattered my confidence.

By writing like this type of unpleasant reply you diverted the attention of other users and experts who want to reply me and as a result essence of the query post is lost. I have seen your behaviour from my last two posts.

I exit this thread chat while maintaining the decorum of the Splunk Answers platform.

Thanks for all the help.

PickleRick · ‎08-18-2024

OK. If you found a way to feel offended, well that was not my intention. I just wanted to point out that what you were doing in this thread was counterproductive and it was indeed simply impossible to help you this way. Want to help us help you? Fine, do so - check your sources and verify what was already suggested in this thread. Want to just take offense? Well, I'm trully sorry to hear that because we're really trying to create an overally friendly atmosphere here. And again - it was not my intention to make you personally feel bad. The intention was to point out that doing random things and just "splashing" random bits of information you will not get a reasonable answer because it's simply impossible. That's all. Hope you still have fun on Answers.

ITWhisperer · ‎08-18-2024

How is it possible for me to tell ? You haven't explained which duplicate events you have removed, nor how you removed them. If you can show that the 10 hour delay that you are seeing in your calculation is caused by duplicate events (which is possible if you have collected events for those time periods over 10 hours after their timestamps), then removing these duplicate events would affect your delay statistic.

PickleRick · ‎08-17-2024

The source itself might be simply misconfigured and using wrong timezone. If it's using time sync of some kind it shouldn't happen when the time is reported in UTC but if the time was manually set using wrong timezone it will be reported as wrong timestamp.

uagraw01 · ‎08-17-2024

Here I am taking the TIME_FORMAT in props.conf from the eventtime field present in raw data (Toronto’s time zone is EST (UTC -5:00).

Is there any changes here I need to change.

PickleRick · ‎08-17-2024

The time format actually seems to match your event. But the question is whether the event itself contains right information. You'd have to check the source system's configuration for that.

PickleRick · ‎08-17-2024

There's nothing wrong with the index itself. Leave it alone 🙂

Depending on your data, your search and your collect command syntax that can actually be an OK result. Impossible to say without knowing your usecase and those details.

Summary index issue to collect the data

other

summary indexing

New This Month in Splunk Observability Cloud - Metrics Usage Analytics, Enhanced K8s ...

Alerting Best Practices: How to Create Good Detectors

Discover Powerful New Features in Splunk Cloud Platform: Enhanced Analytics, ...