All Apps and Add-ons

Microsoft Teams Add-on for Splunk: handling of 404 error

Cbr1sg
Path Finder

I downloaded and installed Teams Add-on for Splunk and it worked for a while, until we encountered a lot of 404 error like below

ERROR pid=14248 tid=MainThread file=base_modinput.py:log_error:309 | Error getting callRecord data: 404 Client Error: Not Found for url: https://graph.microsoft.com/v1.0/communications/callRecords/<call ID>?$expand=sessions($expand=segments)

 

I found out that the callID was removed from Teams CDR for some reason,  therefore when Splunk tried to download the CDR, it returned error 404, which is understandable.

However Teams Add-on will not remove the Call ID from webhook directory for this scenario. The call ID will remain there forever and Splunk will keep on trying again and again to download the CDR and failed. This results in a huge amount of call IDs that never get cleaned up and massive number of error messages in the log.

Further more, i found out that if there were too many call ID files exist in the wehbook directory (~60K), the Add-on will encountered error "401 Unauthorized to download the CDR" and stopped working soon afterward. After restarting Splunk, the Add-on worked again and then stopped the  moment it hit 401 error again. I manually created a script to manage the load of webhook folder, so this is OK for now but it would be preferable that the Add-on has load management feature by itself.

Hopefully the author of this Add-on will add this error handling soon, but meanwhile if anyone knows how to get around this 404 issue please kindly share.

Thanks a lot!

Labels (2)
0 Karma

tkomatsubara_sp
Splunk Employee
Splunk Employee

Hello,

 

The reason for that is the call ID is no longer available at Azure side, but MS Teams addon tries to get information with it.

Currently there is no way except to delete the local kv store lookup data.

 

Please try the below command and see if there is any improvements.

 

splunk clean kvstore -app TA_MS_Teams -collection TA_MS_Teams_checkpointer

 

0 Karma

Cbr1sg
Path Finder

I dont think this has anything to do with kvstore.

problem is clear: the add-on doesn't handle 404 error properly

Flow in normal situation:

check webhook folder for call ID --> download call ID --> delete call ID from webhook folder --> proceed with next ID

Flow in 404 situation:

check webhook folder for call ID --> download call ID (but failed) --> raise error.  And it stops there, the call ID with 404 error is not cleaned up from webhook folder.

So i think the author of the app just needs to improve the error handling to clean up "404" call ID from the webhook folder, problem will be solved

0 Karma

jaxjohnny2000
Builder

i totally agree.  This app does not handle 400 or 404 errors.  The developer  @jconger   is top notch though, i have met him.  just this one app have never really worked correctly

We have to reset the inputs almost daily  By reset, we create a new subscription input.  Or we have to disable/reenable the call record or user report inputs

However, i did perform the kvstore clean on both of our heavy fowarders (behind Load balancer)

for a load balancer environment webhook, call record, and user report inputs are setup on both HF, but subscription is setup on only one

This worked for me, for now

disable all inputs 

clean kvstore

splunk clean kvstore -app TA_MS_Teams -collection TA_MS_Teams_checkpointer

enable inputs in this order

webhook, subscription, call record, user report

 

we have data again...for now. 

 

 

 

 

 

 

0 Karma

norbertt911
Path Finder

Hello,

I have the same issue and the remediation you shared is correct. But... in my case the app runs like 2 days flawlessly, then the webhook fails > subscription fails > no callrecords anymore.

Maybe somebody found a way to make this app stable. I played with the intervals, but no help. I will try to disable and enable the webhook and subscription input by a crone job from CLI, but that is so "homemade"...

The app is installed on an HWF and it runs when it runs... I have no idea why go fail randomly...

The bad part is that when it stops collecting the CR, it will be lost. No way to fetch the "historical" logs...

Appreciate your advice...

 

 

 

0 Karma

Brendant
Observer

I have the same problem.  The webhook work for a couple of days and the fails.    Did the cron job to restart the inputs work successfully as a workaround? 

 

 

 

 

0 Karma

norbertt911
Path Finder

Hi,

I still have no 100% working workaround. I tried to create an Alert on my search head> when the subscription failed, triggering a curl script to disable - re-enable the inputs. I learned two important things there:

Order

you should disable the webhook, then the subscription input then the call record input. Enable the webhook, and enable the subscription. This will update the subscription, but sometimes doesn't work correctly -  in this case,  you should clear the KV store first - and the webhook is Exit! So you should disable the webhook again, enable it then enable the call record input. 

This method above, if you do manually solving the issue all the time. But the second thing:

Scripted disable/enable works 50-50%. Seems the call record is not correctly reset by the script.

so currently, I have an alert to myself: "Go monkey and reset it manually" 🙂

0 Karma

Brendant
Observer

Thanks for the update.

I am familiar with Windows and Powershell scripting,  The splunk instance is not managed by me and the person who manages has indicated he does not know how to script the restart of the inputs and to clear the keystore.

I would like a script to run every night at midnight  to complete the above steps

Can you provide some details on how to accomplish this in Splunk,

Any help would be greatly appreciated.

 

 

 

 

0 Karma

Brendant
Observer
  1. disable all inputs 
  2. clean kvstore
    1. splunk clean kvstore -app TA_MS_Teams -collection TA_MS_Teams_checkpointer
  3. enable inputs in this order
  1. webhook,
  2. subscription,
  3. call record.
0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...