Project: Creating an ephemeral, secure peer-to-peer web chat application

August 29, 2020

#Introduction

As the title implies, this post is about my journey of creating a web application that allows users to chat securely, meaning, just like on WhatsApp, Telegram, etc., cryptographic keys are exchanged between participants and the messages sent are encrypted (and decrypted) with these keys, ephemeral, meaning that the chats don’t persist anywhere (in particular on a server), and peer- to-peer, meaning that once users have established a connection with each other, messages don’t pass through any central server (with specific caveats). This is meant to be more of an essay rather than a guide and below are the details of my journey during this project. If you’d like to see the code I’ve posted the code on Github.

#Defining the problem scope and success criteria

Before I began I thought about how to approach this. I really wanted something at the end; I didn’t want this to be like my other projects, where I have a cool idea, start to work on it, and never quite get, y’know, done, and so although this is a personal project I decided to concretely define both the problem space and what I wanted to get out of this by the end, similarly to defining something at work or for a client. With that said...

#The problem

To allow users to chat securely, using cryptographic keys that are generated by the user (e.g. in their browser) and which are never stored centrally, and besides the RTC connection establishment (aka signaling), the entire chat is ephemeral, with no state persisted on any server.

#The success criteria

  1. Users can chat with others in a chat room style interface
  2. Users have the ability to create chat rooms
  3. Users have the ability to join created chat rooms
  4. Users can create room passwords
  5. Users can cryptographically verify that the other person is who they say they are

Hopefully not too lofty... Foreshadowing? With those defined I moved onto the implementation.

#Traversing a private network

Rather than diving into the service I decided to first start with the TURN/STUN servers first.

I knew this would be necessary long term from some previous research, and since this was one of the two major things that I had a lot of unknowns around I started here first. At a high level TURN and STUN servers allows the user to establish an RTC connection even when they are encumbered by private network (NAT) problems, allowing users who are behind a router, without a dedicated IP, to properly establish a connection with a peer. A STUN server is a bit lighter than a TURN server, but can’t be used in all situations, and so in practice both of these servers are required.

I had initially thought about making my own basic TURN/STUN server in Node.js, and while I found a couple libraries that supported the protocol, they were either old or had poor documentation, and considering that WebRTC has changed a lot in the past couple years and I wanted to make sure that there were no issues down the road with this part, I looked beyond and eventually found a good, up to date (and open source, to boot) TURN/STUN server called coturn.

It was thankfully very easy to install, as it’s available on the Ubuntu apt repos:

> sudo apt-get update
> sudo apt-get install coturn

It wasn’t the most recent release — as of this post, the most recent version is 4.5.1.3 and the version on the apt repository is 4.5.0.7 — but it’s only a few versions off and so I felt comfortable continuing with it.

To configure it I mostly followed this guide with a few changes:

Restarting the service, things seemed to be working, so I moved onto signaling.

#“Anyone there,” he yelled into the void

The other aspect that I had a lot of unknowns about was signaling, which is the term for how to get peers to initially connect to each other so that they can establish the RTC connection.

Finding out info about this was a little tricker than I had expected. A lot of resources about WebRTC online go in depth about the necessity of STUN and TURN servers, but don’t touch much on how to actually get the clients to connect and generally simply state that WebRTC doesn’t specify a signaling protocol, leaving it up to the implementation (all the articles love to mention something about carrier pigeons 🐦). For some reason this was ambiguous to me and I didn’t understand if this meant I could use the STUN/TURN servers to perform the signaling or if I had to handle it in some other way. This was further complicated by the occasional use of the term SIP and explaining what it means and how the shape of the packet/protocol data is, without explaining how it is transmitted. More careful reading may have illuminated the necessity to build or somehow implement my own signaling server, but I genuinely didn’t understand until I stumbled upon this article, which states this:

In most resources, the tendency here is to focus on the STUN and TURN servers that are used for the ICE negotiation and less on the fact that the negotiation itself is facilitated by the signaling server. [emphasis not mine]

This clicked for me, and the next paragraph states that a WebRTC signaling server is necessary and is distinct from the STUN/TURN servers. With this information, I moved ahead and created a basic websocket server that would handle client signaling.

Below is an abridged version of the code — check the repo for the rest, wink wink — but it’s essentially a basic Express server that listens for websocket connections, and stores the connections in a local map based on the room key. To keep things simple (and hopefully secure), the server is really and truly a relay, and simply maps connections to a given room ID, removing the room ID from the map after all connections have left.

const app = express();
const server = http.createServer(app);
const wss = new WebSocket.Server({ noServer: true });

const rooms = new Map()<string, Set<WebSocket>>;

wss.on("connection", function (ws, request) { const roomId = getRoomIdFromRequest(request); if (!rooms.has(roomId)) { rooms.set(roomId, new Set()); } const conns = rooms.get(roomId); conns.add(ws); ws.on("message", broadcast); ws.on("close", onLeave); ws.on("terminate", onLeave);
});

server.on("upgrade", function upgrade( request: http.IncomingMessage, socket: Socket, head: Buffer
) { wss.handleUpgrade(request, socket, head, function done(ws) { wss.emit("connection", ws, request); });
});

After a quick test to make sure I could connect to the WebSocket server locally:

const ws = new WebSocket('ws://localhost/ws/room/1234');
ws.addEventListener('open', () => console.log('it works!'));

It was now time to move onto the bread and butter of the application: the frontend chat client.

#Hi there, fellow room humans

Okay, the relay is working, but how do I setup the WebRTC connection? I decided to focus on learning how to set this up so that I didn’t need to think about how to when I’m actually creating and architecting the application, and so I set off on getting two tabs to communicate with each other.

The place I started with, as you may expect, is the MDN web docs. The MDN docs are great; they’re almost always of high quality and detailed enough to get started with, so I went into this thinking that the WebRTC documentation on MDN would help walk me through all of the intricacies and challenges when creating the basic application. This by and large was true — the documentation had all the classes and methods described and did give some examples, many of which are in depth, — but as I worked through getting something working I ran into a lot of subtle trouble.

One of the biggest challenges I encountered early on was that the documentation on the classes themselves is pretty dense — as to be expected from something low level like WebRTC — and the examples they use aren’t always great at explaining specific, important details. For example, the example I was using didn’t go into detail that the WebRTC connection must be in the “correct state” before being able to handle answers, and in fact must always be in the correct state for any operation you want to perform; you cannot simply always create an offer on the RTCPeerConnection instance, because then it will be in the have-local-offer state, rather than the stable state, and in this state it cannot accept an offer. I must have missed this and didn’t realize this was an important point. I had assumed I could simply create an offer immediately, even if the connection was supposed to create an answer, hoping to simplify some code later on, but this was a wrong assumption.

This isn’t to say that it wasn’t a simple fix — it was — but this important detail was either lost in the density or not addressed well, which complicated my initial understanding. With that solved I was able to get clients to connect — hurrah!

Now to send hello world from myself to myself. This was also pretty straightforward, but again the documentation was obtuse about certain things related to this and I spent more time that I’d like to admit finding out that you only create the channel on one side and that the data channel created on that one side is then an event that you listen to on the other. After adding in those changes, I could greet myself via RTC! A few small changes to get it to work with more than one client, and now the last step awaits: making it look good.

#A dash of geometry

Now I had a choice to make: what framework should I go with for creating the frontend. Normally I would choose React; I regularly work with it and have a high degree of proficiency and comfort with it, but I wanted to use this as an opportunity to get my feet wet in a different framework, and after some thought I chose Angular. It was remarkably, deceptively simple to get started:

> npm i --save-dev @angular/cli
> npx ng new framework

And after a couple minutes of dependency installation I was ready to go.

Now that the project was setup, I was ready to create the necessary services and components, and move all the “play” code I made before into this application. This was easier thought than done — I go into more detail below but the learning curve to Angular is a bit steep and it took a bit of time to really get my head around certain aspects, like how to get certain services to act as singletons (n.b. this happens by default). In general I found that the actual programming wasn’t the challenge, or at least not more than you would expect; the challenge came from learning how to architect the app and how to do that in Angular.

This cycle repeated itself a lot: reading, testing, and playing around, getting stumped, then back to reading... I definitely underestimated the initial learning curve. With a lot of trial and error (and copious amounts of reading and research) I had finally taken the original code I made and created services and components for it. Now to create a message input...

Wait, I need to learn how to handle form input in Angular... wait, which approach should I take?... okay, got it, let’s add that in... done, I think.

and, yeah I was done, and it works! In the original prototype I simply sent and looked for messages in the dev console but now I’m typing them and they’re appearing on my screen. I felt a sense of accomplishment.

#Girl, where’s your makeup?

The application was now, well, serviceable, but it needed a lot of polish, as well as the addition some of the missing features from the success criteria. I mean, users still couldn’t create rooms in the UI! They couldn’t see who they’re chatting with! And, it just didn’t feel nice to use. Let’s fix that — with a bit more work, now not only can users type messages to each other, but they can now create a room, set their name, and see who else is in the room with them.

The last part to do is tests. Should this have been the first part? Oh well. Even though I’m working with Angular and the tool discussed in the documentation and installed by the CLI tool is Karma, having experienced how slow and cumbersome it is, I yanked it out and installed Jest with jest-preset-angular, wrote some tests, and that was that. Okay it wasn’t that simple, but it was simple enough and definitely less complex than getting the application created.

#Learnings and conclusion

Biggest question: did I meet my success criteria? Yes and no. Fundamentally I didn’t get the last two pieces done — adding room passwords, and allowing others to verify the other person’s key. It is a little sad to not have gotten all of the pieces/criteria I wanted done... but I feel okay with my approach and know that I adding these features would be relatively straightforward.

Overall, though, this was a very fun project! I learned a ton; I learned how to setup and get started with WebRTC, both on the server and client; I learned about the WebCrpyto APIs, and learned a lot about the new Angular (and how it’s same but different from AngularJS). This is also my first post on my website, and so I learned quite a bit about my “writing voice” and about how to structure an essay about a project like this. It’s helped inspire me to do more, and write more, so stay tuned! Thanks for reading! ◾️

#Closing thoughts: Angular and React

I wasn’t quite sure what to expect when going into working with Angular, as I’ve never worked with anything newer than version 1 — at Contentful, we’re actively migrating the core web application from Angular 1 to React, and at Webflow we used Angular 1 for the website dashboard. In a sense, then, I was starting from scratch and so I gained a lot of new insights, as well as an appreciation for a few aspects of Angular (and a distaste for others). Before I go into detail I do want to preface that I haven’t worked with older versions of Angular, meaning I haven’t dealt with any kind of deprecration cycle or refactoring of the application just to get to the next version, so I am coming in only with experience from the latest and greatest (as of the time of this post, version 9).

First: the bad parts. Angular is a heavy framework (maybe framework somewhat implies heavy?) and the learning curve for Angular is relatively high in the beginning. You have to understand the CLI tool, how you want to setup the routes, if you should make a service, a class, or something else for your logic, how to handle forms (should you use the old way or new way to take form inputs? [n.b. the new way is better]), what kind of services and components you need to make, how to inject those services into the components, and more... it is a lot to take in. React is so much lighter in that sense; it’s a library, not a framework, but because it handles less and its mental model for React alone is simpler, it’s faster to get started with.

In addition, just as the mental model and setup of Angular is heavy, so is the initial project setup. It comes with everything, even if you don’t need it; beyond basic components, it comes with two testing frameworks (Karma, Protractor), an .editorconfig file, a complicated tsconfig.json setup, and a TSLint file. While these are not necessarily bad they do mean it’s even more to get used to for a beginner, and for someone like me coming from React the tools just feel cumbersome — why are we using Karma when Jest (or any other test runner based off of JSDom) is, in my opinion, in almost every respect a better testing tool? Is it because of inertia, that they’ve invested heavily in Karma already, that there’s the somewhat (and arugably) outdated notion that you need a full browser to run tests for a web application? I’m also a bit perplexed about why after more than one year post-deprecation the Angular CLI is still making a tslint.json file when the tool has been deprecated in favor of ESLint. I can understand for existing projects, but new ones? We should not be using deprecated tools, yinz. I very much appreciate the CLI tool, and I don’t mind that it’s essentially required for development of an Angular app, but it feels so heavy and in some ways suffocating as a more experienced developer and I wish there was a way to have a more bare bones “advanced” style setup.

Lastly, the documentation is, and as it was with AngularJS, dense. There’s a nice tutorial that helps get you started but if you do anything more advanced than what’s there it can be pretty tough to find the right resource, and if you happen to find the right resource, it might not explain things in a great way. This is in stark contrast with the React documentation; the React docs are just so much more straightforward, user-friendly, and when I need to look something up, I more or less know exactly where to go.

Another consideration is how prevalent the framework is. Even though React isn’t really the “new hotness” anymore, it’s by many estimates to be the framework to use, and although I don’t have data to back it up, I would imagine that a majority of frontend development positions would be in React, followed by Vue, while Angular is behind (but not as far behind as Ember). If I were building an application that would be used by customers, and would like to turn into a business, even if I really preferred Angular, why would I choose Angular when the sane, safe, and pragmatic choice is React or Vue?

It isn’t all negative, though. Stating the obvious, Angular is a framework and so by definition it gives you more compared to React, which only really deals with the view layer. While React really does get out of your way and is easier to get started with (especially with create-react-app), once you start making a larger application, or are migrating some application to React, you have to make a lot of architectural decisions that are decided for you by the Angular devs. For example, in React, at some point you’ll have to answer at least a few questions:

Of course, some of these questions do come up in Angular too, in particular around state management, but in general you don’t have to think about them as the Angular team already did and decided for you. You almost always make a service to handle component facing non-presentational logic; you have to use TypeScript (truly a blessing, and sadly TypeScript is a little bit of an afterthought for React); you must use classes for services, components — pretty much anything. This may sound strange but I get the feeling that if you really get Angular that over time it just kind of gets out of your way and you can focus on getting your application written and not a lot of time on how to architect it.

In closing, I still prefer React, by a large margin. I definitely prefer the lightweight approach, and want something that gets out of my way. That said, I do wish the React team would, well, bless some libraries for important SPA concepts like routing. Maybe one day they’ll take the Vue approach and support beyond the view layer, but I’m not holding my breath.