I had no idea WebRTC was so complicated behind the scenes. Now I know why all the video conference companies really, really want you to download their app instead. This post then sent me down a rabbit hole learning about WebTransport and Media over QUIC.
A great thing about shipping an app which uses WebRTC is that you don't have to support Firefox.
I swear, 90+% of my issues with WebRTC comes from Firefox doing weird things. It's so bad that on a personal level, I often use Chrome for Teams and Meet even though I use Firefox (or Waterfox) for everything else. The specific issues change over time, but I've had the camera not work, I've had extremely bad audio echo issues, and I've had all input devices show up under the name "Unknown Device". Never had these issues in Chrome.
And as a developer, I've spent a lot of time debugging Firefox-specific problems -- something I may not have done had I not been a Firefox user myself.
i wanted to try using it for some peer to peer stuff (just a hobby project), because it was basically the only way to get a udp-like connection in a browser
but it's absolutely miserable. i gave up and watched brazil to cheer myself up
A lot of this needs an asterisk: ".. for the default existing implementations of WebRTC". Jitter buffer sizes and time-stamping etc can all be completely modified, especially if you control both ends of the pipe.
It may however require forking the WebRTC libraries. I've found that webrts-rs has some.. questionable practices within the implementation and doesn't expose all the levers you would need for a good application on top.
But the core of WebRTC (ie the protocols and general ideas) I find pretty solid and honestly quite flexible. For instance we use it to receive video streams for archival and have configured an exceptionally long buffering time on the server and a custom time-stamping system on the client.
But I agree with the article overall. It's so obviously designed for video conferencing that the more you depart from it the more painful it becomes. I don't think there's a good reason to use it for voice in open ai. This is a simple, targeted thing that doesn't need the more advanced features sets.
It would be interesting to read a side by side comparison of WebRTC and AES67 (Dante/Ravenna)
I understand at a high level that WebRTC was built for conferencing and sending media across networks while AES67 standardized what pro/commercial devices were doing over LANs, but I'm curious what stops everyone from using the same standard for AoIP
It takes a minimum of 8* round trips (RTT) to establish a WebRTC connection.
There is an effort to try to improve on this called WARP, which in turn are two parts SNAP (speeds up SCTP by exchanging data over SDP) and SPED (DTLS 1.3 handshake over STUN). It's being trialed in libWebRTC/Chrome.
Yes, both of these are deliberately mashing the protocols together to save RTT. And yes, that is potentially making a fine mess of protocols even worse. :)
Disclaimer: I maintain str0m, a WebRTC library in Rust.
Apple has Siri, voice AI that's way older than OpenAI. On the networking side they use Multipath TCP, which provides reliable connections and uses multiple paths simultaneously, like Wi-Fi and cellular on an iPhone. However Apple can only use Multipath TCP because they control the device, standard Linux and Windows kernels still don't support the protocol.
jeremiahlee | a day ago
I had no idea WebRTC was so complicated behind the scenes. Now I know why all the video conference companies really, really want you to download their app instead. This post then sent me down a rabbit hole learning about WebTransport and Media over QUIC.
singpolyma | a day ago
I guarantee you their app also uses WebRTC :)
jeremiahlee | a day ago
Which he stated, but a "fork" and a "a tiny fraction of the protocol". I am curious to know more about what Discord specifically is doing.
panekj | a day ago
You can read about Discord stuff here: https://discord.com/blog/how-discord-handles-two-and-half-million-concurrent-voice-users-using-webrtc
Additional (and more recent) stuff about voice:
mort | 23 hours ago
A great thing about shipping an app which uses WebRTC is that you don't have to support Firefox.
I swear, 90+% of my issues with WebRTC comes from Firefox doing weird things. It's so bad that on a personal level, I often use Chrome for Teams and Meet even though I use Firefox (or Waterfox) for everything else. The specific issues change over time, but I've had the camera not work, I've had extremely bad audio echo issues, and I've had all input devices show up under the name "Unknown Device". Never had these issues in Chrome.
And as a developer, I've spent a lot of time debugging Firefox-specific problems -- something I may not have done had I not been a Firefox user myself.
hc | a day ago
i wanted to try using it for some peer to peer stuff (just a hobby project), because it was basically the only way to get a udp-like connection in a browser
but it's absolutely miserable. i gave up and watched brazil to cheer myself up
duck_of_death | a day ago
Damn. You alright bro?
0x2ba22e11 | a day ago
TBH I love the programmer-art illustrations.
[OP] polywolf | 13 hours ago
same!! I think they are a very nice touch, goes to show that, just like with writing, you don't need to be a professional to make something good
john_austin | a day ago
A lot of this needs an asterisk: ".. for the default existing implementations of WebRTC". Jitter buffer sizes and time-stamping etc can all be completely modified, especially if you control both ends of the pipe.
It may however require forking the WebRTC libraries. I've found that webrts-rs has some.. questionable practices within the implementation and doesn't expose all the levers you would need for a good application on top.
But the core of WebRTC (ie the protocols and general ideas) I find pretty solid and honestly quite flexible. For instance we use it to receive video streams for archival and have configured an exceptionally long buffering time on the server and a custom time-stamping system on the client.
But I agree with the article overall. It's so obviously designed for video conferencing that the more you depart from it the more painful it becomes. I don't think there's a good reason to use it for voice in open ai. This is a simple, targeted thing that doesn't need the more advanced features sets.
mikedorf | a day ago
It would be interesting to read a side by side comparison of WebRTC and AES67 (Dante/Ravenna)
I understand at a high level that WebRTC was built for conferencing and sending media across networks while AES67 standardized what pro/commercial devices were doing over LANs, but I'm curious what stops everyone from using the same standard for AoIP
algesten | 15 hours ago
There is an effort to try to improve on this called WARP, which in turn are two parts SNAP (speeds up SCTP by exchanging data over SDP) and SPED (DTLS 1.3 handshake over STUN). It's being trialed in libWebRTC/Chrome.
Yes, both of these are deliberately mashing the protocols together to save RTT. And yes, that is potentially making a fine mess of protocols even worse. :)
Disclaimer: I maintain str0m, a WebRTC library in Rust.
spenc | a day ago
FWIW, even a WebRTC specific website said WebRTC probably isn’t the best idea for Voice AI :)
https://webrtchacks.com/webrtc-vs-moq-by-use-case/#post-4716-_Toc213101666
pyfisch | a day ago
Apple has Siri, voice AI that's way older than OpenAI. On the networking side they use Multipath TCP, which provides reliable connections and uses multiple paths simultaneously, like Wi-Fi and cellular on an iPhone. However Apple can only use Multipath TCP because they control the device, standard Linux and Windows kernels still don't support the protocol.
m_eiman | 18 hours ago
Looks like it's been added to Linux since at least 6.10:
As of Linux v6.10, major features of MPTCP include: