Features

A complete autonomous desktop agent. Real desktop control, multi-provider AI, containerized safety, and a modern web interface -- all open source.

## Core Capabilities

What Makes ByteBot Different

01 -- Real Desktop Control

Full Desktop Interaction, Not Just API Calls

ByteBot controls a real Ubuntu desktop running inside Docker. It moves the mouse, clicks buttons, types text, scrolls pages, and interacts with any application -- exactly like a human user would. No APIs, no browser automation scripts, no application-specific integrations.

Mouse control with precise coordinates
Keyboard input with modifier keys
Screen capture and visual analysis
Scroll and drag operations
Works with any desktop application
desktop actions
$ bytebot action screenshot
Captured 1920x1080 desktop
$ bytebot action click_mouse --x 450 --y 320
Clicked at (450, 320)
$ bytebot action type_text --text "quarterly report"
Typed 16 characters
$ bytebot action key --combo "ctrl+s"
Key combination sent
$ bytebot action scroll --direction down --amount 3
Scrolled down 3 units
Anthropic Claude
Best for complex reasoning and multi-step tasks
✓ Supported
OpenAI GPT-4
Strong general-purpose vision and control
✓ Supported
Google Gemini
Fast processing with large context windows
✓ Supported
Ollama (Local)
Run models locally, no API key needed
✓ Supported
02 -- Multi-LLM Support

Bring Your Own AI Provider

ByteBot supports multiple AI providers through a unified abstraction layer. Switch between Anthropic Claude, OpenAI GPT-4, Google Gemini, or local Ollama models without changing your workflow. Configure your preferred provider in the .env file and select it through the UI.

Hot-swap between providers
Unified API through LiteLLM proxy
Per-task provider selection
Local models via Ollama (free)
03 -- Containerized Safety

Sandboxed by Default

ByteBot runs inside a Docker container with a full Ubuntu desktop. The AI agent operates within this sandbox and cannot access your host machine, files, network, or other applications unless you explicitly configure volume mounts or network bridges.

Isolated Docker container
No host machine access by default
Network isolation
Easy reset: destroy and recreate
Volume mounts for controlled file sharing
Security Model
H
Host Machine
Your computer -- fully protected
| Docker boundary (isolated) |
C
Container: Ubuntu Desktop
AI operates here, sandboxed
A
bytebot-agent (AI orchestration)
D
bytebotd (desktop controller)
Activities 14:32
📁
Documents
🌐
Firefox
📄
Terminal
Real-time VNC stream via noVNC
04 -- VNC Desktop Access

Watch the AI Work in Real-Time

Access the containerized desktop through a browser-based VNC viewer powered by noVNC. Watch every click, keystroke, and screen transition as the AI executes tasks. You can also take manual control at any time.

Browser-based VNC (no client needed)
Real-time desktop streaming
Manual override capability
1920x1080 resolution
05 -- Task History & Monitoring

Full Audit Trail

Every task is recorded with complete conversation history, screenshots at each step, action logs, and execution timeline. Review past tasks, debug failures, and improve your prompts over time.

Full conversation history
Step-by-step screenshots
Action logs with timestamps
Task status tracking
Error reporting and debugging
Task Log
Organize PDF invoices by date 2m 14s
Search web for product pricing 3m 42s
Fill out expense report form 1m 58s
Generate Q4 analytics chart failed
Download and compile report running...
bytebot-ui
Next.js 15 | Task management, chat, VNC viewer
bytebot-agent
NestJS | AI orchestration, WebSocket, Prisma
bytebotd
NestJS | Desktop controller (mouse, keyboard, screen)
Ubuntu Desktop
Docker container | 1920x1080 | noVNC access
06 -- Modern Architecture

Clean TypeScript Monorepo

Built with modern technologies: NestJS backend for AI orchestration, Next.js frontend with real-time WebSocket updates, PostgreSQL for persistence, and a clean TypeScript monorepo structure. Everything containerized with Docker Compose for one-command deployment.

TypeScript throughout
NestJS + Next.js + PostgreSQL
WebSocket real-time updates
Docker Compose deployment
Prisma ORM for database

## Command Reference

Available Desktop Actions

ACTIONS

  screenshot      Capture current desktop state
  click_mouse     Click at coordinates (x, y)
  type_text       Type a string of characters
  key             Press a key combination
  scroll          Scroll up or down
  open_url        Navigate browser to URL

USAGE

  $ bytebot action screenshot
  $ bytebot action click_mouse --x 450 --y 320
  $ bytebot action type_text --text "hello world"
  $ bytebot action key --combo "ctrl+s"
  $ bytebot action scroll --direction down --amount 3
  $ bytebot action open_url --url "https://example.com"

TASK MODE

  $ bytebot task "Find all PDF invoices and organize them by date"
  $ bytebot task "Open Firefox and search for AI news"
  $ bytebot task "Fill out the expense report form"

See It in Action

Watch ByteBot execute a complete task in our interactive demo, or deploy it yourself in minutes.