About noman377

noman377 · ‎11-17-2021

@PickleRick - unfortunately, not able to make it work. The eventtype that is present for every log is still showing up in events tab and timechart tab comes out empty.

noman377 · ‎11-15-2021

@PickleRick - read through the documentation and tried search first as suggested and then timechart count by eventtype, it did not work. Not sure what i'm doing wrong. any help is appreciated. the eventtype that is present in every events is still present in the chart. eventtype=heartbeat namespace::my-namespace | search eventtype=heartbeat | timechart count by eventtype span=1m ```only want to see eventtype heartbeat```

noman377 · ‎11-11-2021

Hello, I am trying to timechart two event types ONLY: heartbeat and start. However, every event in our Splunk is also mapped as nix-all-logs and few other events by the system admin. Attached are screenshots. How can I timechart these 2 event types only.

noman377 · ‎10-31-2021

@tread_splunk thank you so much for looking into this. I will try out the query you just provided. After thinking through, I realized that I should just provide real data with clean-up and masking. Here is the real data. https://drive.google.com/file/d/1If2G2JNFm7NljWR7WuIU_VavIBxH9jqE/view?usp=drivesdk

noman377 · ‎10-30-2021

@tread_splunk @somesoni2 I realized having a sample data set would be helpful. For that reason, I am attaching a sample data set and an explanation below: We have 4 Pods. Each of these pods receive 3-5 messages in Every Minute. Now these messages are NOT evenly distributed. Meaning, it's not like time: 0s, 15s,30s,45s etc. We have noticed, one or two of these pods goes in Zombie state. Meaning, for say 3 minutes, these Pods are not writing this event. Overall objective is to find the query > create a dashboard panel > generate simple alerts when we detect these Zombiness. Now the explanation of the dataset. Below is the top level query: index namespace message="incoming events" pod=* To detect the bad actors with human eyes, I am adding timespan so I can detect the anomaly. The sample data that I am providing comes from the below query (and not the starter query): index namespace message="incoming events" pod=* | timechart count by pod span=1m Since data from raw data set is in JSON objects, I am not attaching here. But I can of course provide the true raw data set if it helps in our investigation. From this dataset, this is the bad Pod: Bad Pod: pod-a 2021-10-14T21:01:30.000+0000 event count 3 2021-10-14T21:02:30.000+0000 event count 0 ............. ........... 02021-10-14T21:05:00.000+0000 event count 0 02021-10-14T21:05:30.000+0000 event count 4 Duration of being in bad state: 3 mins I am trying to get the query that I can utilize in creating a dashboard (and alert), where me or any of my team mate can simply run the query and detect: 1. Bad Pod 2. Duration in bad state ( time > 1m). I appreciate your help and Thank you. And as I said, I can always provide the true raw data (from the starter query) if it helps in this investigation. CSV File

noman377 · ‎10-29-2021

@somesoni2 - yes

noman377 · ‎10-29-2021

@tread_splunk - yes

noman377 · ‎10-29-2021

@tread_splunk the expected behaviour is in every minute, every pod receives 2 to 4 heartbeats messages. the Zombie behaviour is a POD(s) not either receiving/ writing logs for several minutes. ultimate goal : figure out which pod(s) and how many minutes did not register heart beats. then setup alert based on that.

noman377 · ‎10-29-2021

yes. I have a list of all pods.

noman377 · ‎10-29-2021

this is what I am trying to detect. The time range or minute(s) when we are missing the heart beat for any given pod.

noman377 · ‎10-29-2021

@somesoni2 - sorry, it did not work.

noman377 · ‎10-29-2021

This worked 100%. Thank you so much!!!!

noman377 · ‎10-28-2021

Hi, I want to insert Timerange picker value like $time$ in my query for a Dynamic input. Requesting help with the query where the $time$ will get injected and will not utilize the GUI Timerange Picker in the Dynamic input widget.

noman377 · ‎10-28-2021

Hello, we receive somewhere between 3-5 messages in every Pod in every 1 minute. We have a situation where some of the pods go Zombie and stops writing messages. Here's the query: index namespace pod="pod-xyz" message="incoming events" | timechart count by pod span=1m I want help with this query to detect when the stats count in the minute time interval goes to zero.

noman377 · ‎03-22-2021

I have simple search: index=xyz logLevel IN (ERROR, INFO) How do I plot two different color in a timespan chart? See attached sample timespan chart. Ideally, I want to show red for error and green for info on the same time span chart.

noman377 · ‎08-07-2020

@oscar84x :: Yes. Within the same time frame (e.g., Last 24 hours, Last 7 days), I'm seeing search results that are not consistent. However, The alerts I receive based on "status" is accurate. To extract the http status, like 200, 500 etc., I used the regular expression to create the "status" field extraction. Since, "| stats count by status" does not bring the 500 statuses, my dashboard is off not much use.

noman377 · ‎08-07-2020

I have a very simple search: index=logs_glbl sourcetype=kube:container:app-name namespace=prod status=500 | stats count Result: 1 Results are coming from below sample logs: ::ffff:10.244.3.38 - - [06/Aug/2020:20:14:03 +0000] "GET /api/v1/workspace/getEngagement2?id=123 HTTP/1.1" 500 39 "https://atlas.intenal.noman.com" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36" I have defined a Field type: status for the above which uses Inline Field Extraction: ^[^"\n]*"(?P<method>\w+)[^"\n]*"\s+(?P<status>\d+) Now when I perform a new search index=logs_glbl sourcetype=kube:container:app-name namespace=prod | stats count by status I don’t get the status 500 error. My results exclude the 500 status. It is also probably missing other http statuses too. status count 200 515 302 152 304 8 401 71 409 7

noman377 · ‎08-06-2020

@thambisetty , I am sorry, my post probably was not very clear. Let me rephrase... Original query: index=xyz | stats count by eventtype where All_logs encompasses every log in the search (100% coverage). Current Result: eventtype count All_logs 14 Error 2 Login 4 Auth 8 Expected Result: eventtype count Error 2 Auth 8 Appreciate all your help.

noman377 · ‎08-06-2020

@thambisetty , still seeing All_logs and Login events in the stats count 😞

noman377 · ‎08-05-2020

Hi, I have a stat on eventtype like this index=xyz | stats count by eventtype This query generates: All_logs = 14 Error = 2 Login = 4 Auth = 8 Where All_logs is also an eventtype which encomapsses all events: Error, Login and Auth How can I rewrite this query so I will see count of eventtype excluding All_logs and Login events

Posts	20
Solutions	0
Karma Given	3
Karma Received	0
Member Since	‎08-05-2020

Online Status	Offline
Date Last Visited	‎02-18-2022 03:19 PM

Timechart with eventtype

Insert Timerange picker value in query

Query when count is zero in Minute interval

Timechart stats

stats count by generating inconsistent result

exclude certain event type from count

Re: Timechart with eventtype

Re: Timechart with eventtype

Timechart with eventtype

Re: Query when count is zero in Minute interval

Re: Query when count is zero in Minute interval

Re: Query when count is zero in Minute interval

Re: Query when count is zero in Minute interval

Re: Query when count is zero in Minute interval

Re: Query when count is zero in Minute interval

Re: Query when count is zero in Minute interval

Re: Query when count is zero in Minute interval

Re: Insert Timerange picker value in query

Insert Timerange picker value in query

Query when count is zero in Minute interval

Timechart stats

Re: stats count by generating inconsistent result

stats count by generating inconsistent result

Re: exclude certain event type from count

Re: exclude certain event type from count

exclude certain event type from count

Join the Conversation