Features -- ByteBot

## Core Capabilities

What Makes ByteBot Different

01 -- Real Desktop Control

Full Desktop Interaction, Not Just API Calls

ByteBot controls a real Ubuntu desktop running inside Docker. It moves the mouse, clicks buttons, types text, scrolls pages, and interacts with any application -- exactly like a human user would. No APIs, no browser automation scripts, no application-specific integrations.

✓Mouse control with precise coordinates

✓Keyboard input with modifier keys

✓Screen capture and visual analysis

✓Scroll and drag operations

✓Works with any desktop application

desktop actions

$ bytebot action screenshot

Captured 1920x1080 desktop

$ bytebot action click_mouse --x 450 --y 320

Clicked at (450, 320)

$ bytebot action type_text --text "quarterly report"

Typed 16 characters

$ bytebot action key --combo "ctrl+s"

Key combination sent

$ bytebot action scroll --direction down --amount 3

Scrolled down 3 units

Anthropic Claude

Best for complex reasoning and multi-step tasks

✓ Supported

OpenAI GPT-4

Strong general-purpose vision and control

✓ Supported

Google Gemini

Fast processing with large context windows

✓ Supported

Ollama (Local)

Run models locally, no API key needed

✓ Supported

02 -- Multi-LLM Support

Bring Your Own AI Provider

ByteBot supports multiple AI providers through a unified abstraction layer. Switch between Anthropic Claude, OpenAI GPT-4, Google Gemini, or local Ollama models without changing your workflow. Configure your preferred provider in the .env file and select it through the UI.

✓Hot-swap between providers

✓Unified API through LiteLLM proxy

✓Per-task provider selection

✓Local models via Ollama (free)

03 -- Containerized Safety

Sandboxed by Default

ByteBot runs inside a Docker container with a full Ubuntu desktop. The AI agent operates within this sandbox and cannot access your host machine, files, network, or other applications unless you explicitly configure volume mounts or network bridges.

✓Isolated Docker container

✓No host machine access by default

✓Network isolation

✓Easy reset: destroy and recreate

✓Volume mounts for controlled file sharing

Security Model

Host Machine

Your computer -- fully protected

| Docker boundary (isolated) |

Container: Ubuntu Desktop

AI operates here, sandboxed

bytebot-agent (AI orchestration)

bytebotd (desktop controller)

Activities 14:32

📁

Documents

🌐

Firefox

📄

Terminal

Real-time VNC stream via noVNC

04 -- VNC Desktop Access

Watch the AI Work in Real-Time

Access the containerized desktop through a browser-based VNC viewer powered by noVNC. Watch every click, keystroke, and screen transition as the AI executes tasks. You can also take manual control at any time.

✓Browser-based VNC (no client needed)

✓Real-time desktop streaming

✓Manual override capability

✓1920x1080 resolution

05 -- Task History & Monitoring

Full Audit Trail

Every task is recorded with complete conversation history, screenshots at each step, action logs, and execution timeline. Review past tasks, debug failures, and improve your prompts over time.

✓Full conversation history

✓Step-by-step screenshots

✓Action logs with timestamps

✓Task status tracking

✓Error reporting and debugging

Task Log

✓ Organize PDF invoices by date 2m 14s

✓ Search web for product pricing 3m 42s

✓ Fill out expense report form 1m 58s

✗ Generate Q4 analytics chart failed

Download and compile report running...

bytebot-ui

Next.js 15 | Task management, chat, VNC viewer

▼

bytebot-agent

NestJS | AI orchestration, WebSocket, Prisma

▼

bytebotd

NestJS | Desktop controller (mouse, keyboard, screen)

▼

Ubuntu Desktop

Docker container | 1920x1080 | noVNC access

06 -- Modern Architecture

Clean TypeScript Monorepo

Built with modern technologies: NestJS backend for AI orchestration, Next.js frontend with real-time WebSocket updates, PostgreSQL for persistence, and a clean TypeScript monorepo structure. Everything containerized with Docker Compose for one-command deployment.

✓TypeScript throughout

✓NestJS + Next.js + PostgreSQL

✓WebSocket real-time updates

✓Docker Compose deployment

✓Prisma ORM for database

## Command Reference

Available Desktop Actions

ACTIONS

  screenshot      Capture current desktop state
  click_mouse     Click at coordinates (x, y)
  type_text       Type a string of characters
  key             Press a key combination
  scroll          Scroll up or down
  open_url        Navigate browser to URL

USAGE

  $ bytebot action screenshot
  $ bytebot action click_mouse --x 450 --y 320
  $ bytebot action type_text --text "hello world"
  $ bytebot action key --combo "ctrl+s"
  $ bytebot action scroll --direction down --amount 3
  $ bytebot action open_url --url "https://example.com"

TASK MODE

  $ bytebot task "Find all PDF invoices and organize them by date"
  $ bytebot task "Open Firefox and search for AI news"
  $ bytebot task "Fill out the expense report form"

Features

## Core Capabilities

What Makes ByteBot Different

Full Desktop Interaction, Not Just API Calls

Bring Your Own AI Provider

Sandboxed by Default

Watch the AI Work in Real-Time

Full Audit Trail

Clean TypeScript Monorepo

## Command Reference

Available Desktop Actions

See It in Action