API real-time data (websockets, webhooks) lags around prime-time

  • 2
  • Problem
  • Updated 1 year ago
  • In Progress
Archived and Closed

This conversation is no longer open for comments or replies and is no longer visible to community members.

16-18 hours out of the day I am receiving the API events via webhook and websockets in a timely fashion (0-3 minutes). However, 6-8 hours a day, the events severely lag, arriving 15-45 minutes late.

These 6-8 lagging hours always are always prime-time, so I can only assume it's the stress on automatic's servers of all the users driving home, overloading the API.

Has anyone else experienced this? I sent a request to Automatic support but haven't received a response.
Photo of Matt Farley

Matt Farley

  • 1,464 Points 1k badge 2x thumb

Posted 2 years ago

  • 2
Photo of Adam Altman

Adam Altman, Alum

  • 3,712 Points 3k badge 2x thumb
Official Response
Hi All, 

Adam from Product here. I gave some incorrect information to Amy this morning and want to apologize for taking so long to weigh in on the topic. I wanted to chase down some diagnosis before making any claims.

---- DELAYS IN GENERAL ----
A delay of an hour or more during rush hours is unfortunately expected behavior right now, as queues get backed up. This has become more of a problem lately as our traffic has grown much faster than expected -- good problem to have but creates bad outcomes like this. It is most pronounced at rush hours on the east and west coast.

We have a plan to improve this, but no promise on when it will be completed. Current estimate is about 1mo, but that may well change. I'm sorry I don't have a better answer for you. It's not that we think this is okay, just that we are doing all that we can with resource constraints.


---- DELTA FROM IFTTT ----
I must apologize for the response I instructed Amy to give earlier about IFTTT, it was incorrect and my fault. The Automatic Pro channel on IFTTT is in fact mediated by an internal service that we built and control. This service branches from the same event source as our public APIs, but isn't ultimately served by the same workers. That means that when the public API queue gets backed up, it does not affect IFTTT on the Automatic Pro channel. This situation of maintaining custom infrastructure for IFTTT is a special exception case because we have them built into our app. As an aside, it is our most popular connected app by far. While they are served by different pieces of infra, there is no goal to give one a higher quality of service than the other. It is our desire and intent to bring the public API portion of the stack in line, and will be addressed with the work mentioned above.

To avoid confusion, I must clarify that there are two IFTTT channels: Automatic Pro and Automatic. Only the Pro channel gets this separate branch of service. The original Automatic IFTTT channel, for users of our first and second generation devices, consumes the public API.

---- BOTTOM LINE ----
We want to make it better. We're not pleased with the current situation and apologize. We have the work outlined to improve this, but aren't able to get to it just yet.

Cheers,
Adam
Photo of Adam Altman

Adam Altman, Alum

  • 3,712 Points 3k badge 2x thumb
Official Response
Hi Matt, 

Some more context with a _maybe_ helpful change.

1. This is a known thing and we’re sorry.

2. Context: We had two bottle necks that needed attention re scaling: A = getting data from devices to server. B = getting events from server out to developers. working on B doesn’t make sense if A is breaking, so we did A first. And that’s what’s been taking our capacity.

3. Small yay: we HAVE very recently made the change to have more workers processing events during peak hours, so that should help B a bit. It is a stopgap, not the ultimate solution.

4. Next: the ultimate solution to B is a rewrite of our event delivery pipeline, which is on deck.

5. Sidenote: an example of what it will feel like when we rewrite is the current performance of the in-app location updates. That is on the new system and is very fast.

Cheers,
Adam