interesting, unsolved technical problems

One attribute of building for the bleeding edge is that there are a lot of interesting technical problems that no one's solved before. This may sound obvious—and is obvious in fields like frontier model research—but I've increasingly noticed that it exists in less expected, adjacent areas of software as well.

Here's one example we've come across at Kernel: a remote, read/write GUI for humans to access a Computer Using Agent's workspace.

In the world of AI agents, it's become common practice to spin up isolated virtual machines for AI agents to do their work in, rather than allowing agents to run on your actual computer. This is reasonably straightforward, but it introduces a new challenge: in this modality, a human now needs a way to remotely access the agent's machine to debug or manually intervene—ideally with a visual interface to operate on the same machine the agent has access to.

Technology for remote GUIs has existed well before LLMs but is a relatively niche corner of software. The primary use case pre-LLM was enterprise users with strict security policies who needed remote machine access. Unsurprisingly, most vendors providing these enterprise solutions are not open source, nor very start-up friendly.

We initially built Kernel on Anthropic Computer Use's reference implementation, which uses noVNC for remote GUI access. noVNC is a good fit for prototyping with static websites, but it breaks down on most modern websites where Javascript animations and frontend transformations are common. After testing it ourselves, we came to the conclusion that noVNC was a non-starter despite it being what most developers use today.

A few months ago, we happened to pose this problem to @xonkernel. He had encountered this world in his past work—and he had a feeling something better was technically possible. So we got to work: first identifying the right framework/protocols to use, then in integrating them into an isolated image compatible with Computer Using Agents, Docker, and unikernels.

One of the most exciting parts of the technical unknown is having no idea if experiments will succeed or fail. Here's a message he sent about halfway through our project (at 12am):

I'll leave out most of the other Discord messages, but this was an exceptionally good feeling when we got it working:

Today, I'm excited to share the result: an open source, low-latency remote GUI for browser automations and Computer Using Agents. It uses WebRTC, a framework for real-time streaming between clients. The benefits of WebRTC include:

UDP-based transport for minimal latency
Adaptive streaming that adjusts quality based on network conditions
Hardware-accelerated video encoders that are higher quality and require fewer CPU resources over VNC

All in all, it delivers significantly better performance than traditional VNC implementations that rely on CPU-based image compression and TCP transport. Our implementation adapts Neko, an open source container designed for live streaming videos with friends.

We've integrated it into our base Kernel image, and it's available on Github. It's compatible both as a standalone Docker image and on a Unikraft unikernel, providing a batteries-included, deployable container designed for browser automations and Computer Using Agents. (Alongside the remote GUI interface, the image ships with headful Chromium and supports Chrome DevTools Protocol-based connections like Playwright and Puppeteer.) From our testing, the new remote GUI feels noticeably more responsive, capable of handling JS-heavy sites like e-commerce, and reliable enough for real-time debugging.

We open sourced it because nothing quite like this existed, and we wish it had. I hope you find it useful!

Thank you @xonkernel for leading this development and thank you Neko for open sourcing their work.

july 7, 2025
← back