# Testing Strategy

## Overview

nanobot uses **pytest** with **pytest-asyncio** for testing. The test suite focuses on unit testing of core components with lightweight test doubles (no heavy mocking frameworks). Tests are designed to run fast without external services.

## Test Infrastructure

### Framework

| Tool | Version | Purpose |
|------|---------|---------|
| pytest | ≥ 9.0 | Test runner and assertions |
| pytest-asyncio | ≥ 1.3 | Async test support |
| ruff | ≥ 0.1 | Linting (not testing, but part of CI quality) |

### Configuration (`pyproject.toml`)

```toml
[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]
```

- `asyncio_mode = "auto"`: All async test functions are automatically treated as async tests (no need for `@pytest.mark.asyncio` on each test, though it's still used explicitly in the codebase for clarity).
- `testpaths = ["tests"]`: Test discovery starts from the `tests/` directory.

### Running Tests

```bash
# Run all tests
pytest -s tests/

# Run a specific test file
pytest -s tests/test_tool_validation.py

# Run tests matching a pattern
pytest -k "test_heartbeat" tests/

# Run with verbose output
pytest -v tests/

# Run with coverage (if coverage is installed)
pytest --cov=nanobot tests/
```

## Test Organization

### Test Files

| File | Tests | Component |
|------|-------|-----------|
| `test_tool_validation.py` | Tool parameter validation, JSON Schema checks | `agent/tools/base.py`, `agent/tools/registry.py` |
| `test_heartbeat_service.py` | Heartbeat idempotency, LLM decision handling | `heartbeat/service.py` |
| `test_cron_service.py` | Cron job scheduling, firing, cancellation | `cron/service.py` |
| `test_cron_commands.py` | Cron command parsing and dispatch | `cron/` |
| `test_commands.py` | CLI command parsing | `cli/commands.py` |
| `test_cli_input.py` | CLI input handling | `cli/commands.py` |
| `test_context_prompt_cache.py` | Prompt cache optimization | `agent/context.py` |
| `test_loop_save_turn.py` | Agent loop turn saving | `agent/loop.py` |
| `test_message_tool.py` | Message tool behavior | `agent/tools/message.py` |
| `test_message_tool_suppress.py` | Message suppression logic | `agent/tools/message.py` |
| `test_email_channel.py` | Email channel parsing | `channels/email.py` |
| `test_matrix_channel.py` | Matrix channel integration | `channels/matrix.py` |
| `test_feishu_post_content.py` | Feishu message formatting | `channels/feishu.py` |
| `test_memory_consolidation_types.py` | Memory consolidation edge cases | `agent/memory.py` |
| `test_consolidate_offset.py` | Consolidation offset tracking | `agent/memory.py`, `session/manager.py` |
| `test_task_cancel.py` | Task cancellation logic | `cron/service.py` |

## Testing Patterns

### Test Doubles

The codebase uses simple, custom test doubles rather than heavy mocking:

**DummyProvider** — A minimal LLM provider that returns pre-configured responses:

```python
class DummyProvider:
    def __init__(self, responses: list[LLMResponse]):
        self._responses = list(responses)

    async def chat(self, *args, **kwargs) -> LLMResponse:
        if self._responses:
            return self._responses.pop(0)
        return LLMResponse(content="", tool_calls=[])
```

**SampleTool** — A minimal tool for testing parameter validation:

```python
class SampleTool(Tool):
    @property
    def name(self) -> str:
        return "sample"

    @property
    def parameters(self) -> dict[str, Any]:
        return {
            "type": "object",
            "properties": {
                "query": {"type": "string", "minLength": 2},
                "count": {"type": "integer", "minimum": 1, "maximum": 10},
            },
            "required": ["query", "count"],
        }

    async def execute(self, **kwargs: Any) -> str:
        return "ok"
```

### Async Testing

All async tests use the `@pytest.mark.asyncio` decorator and can use `tmp_path` for isolated filesystem:

```python
@pytest.mark.asyncio
async def test_heartbeat_decide_skip(tmp_path) -> None:
    provider = DummyProvider([LLMResponse(content="no tool call", tool_calls=[])])
    service = HeartbeatService(workspace=tmp_path, provider=provider, model="test")
    action, tasks = await service._decide("heartbeat content")
    assert action == "skip"
```

### Filesystem Isolation

Tests that touch the filesystem use pytest's `tmp_path` fixture:

```python
@pytest.mark.asyncio
async def test_session_persistence(tmp_path) -> None:
    manager = SessionManager(tmp_path)
    session = manager.get_or_create("test:123")
    session.add_message("user", "Hello")
    manager.save(session)

    # Reload from disk
    manager2 = SessionManager(tmp_path)
    session2 = manager2.get_or_create("test:123")
    assert len(session2.messages) == 1
```

### Validation Testing

Tool parameter validation is tested thoroughly with edge cases:

```python
def test_validate_params_missing_required() -> None:
    tool = SampleTool()
    errors = tool.validate_params({"query": "hi"})
    assert "missing required count" in "; ".join(errors)

def test_validate_params_nested_object_and_array() -> None:
    tool = SampleTool()
    errors = tool.validate_params({
        "query": "hi", "count": 2,
        "meta": {"flags": [1, "ok"]},
    })
    assert any("missing required meta.tag" in e for e in errors)
    assert any("meta.flags[0] should be string" in e for e in errors)
```

## What Is Tested

### Core Agent

- **Tool parameter validation**: JSON Schema validation (types, ranges, enums, nested objects, arrays, required fields)
- **Tool registry**: Tool lookup, error messages for missing tools, validation error propagation
- **Agent loop**: Turn saving, session persistence
- **Context building**: Prompt cache optimization

### Memory & Sessions

- **Memory consolidation**: LLM-driven consolidation, offset tracking, edge cases with different argument types
- **Session management**: Message append, history retrieval, orphaned tool result cleanup

### Channels

- **Email**: IMAP message parsing, SMTP formatting
- **Matrix**: Room event handling
- **Feishu**: Post content formatting (rich text)

### Services

- **Heartbeat**: Idempotent start, skip/run decisions, LLM tool call handling
- **Cron**: Job scheduling (at/every/cron), firing, cancellation, store persistence

### CLI

- **Command parsing**: Typer command registration
- **Input handling**: Terminal input, exit commands

## What Is NOT Tested (Known Gaps)

- **End-to-end flows**: No integration tests that run the full gateway with real LLM providers
- **Channel connectivity**: No tests for actual WebSocket/API connections (would require external services)
- **LLM response quality**: No tests for actual LLM output (would require API keys)
- **Concurrent access**: No stress tests for simultaneous messages from multiple channels
- **Provider-specific behavior**: No tests for LiteLLM prefixing, env var injection, or gateway detection

## Adding New Tests

### Naming Convention

```
tests/test_{module_or_feature}.py
```

### Template

```python
"""Tests for {feature}."""

import pytest

from nanobot.module import SomeClass


def test_descriptive_behavior_name() -> None:
    """Test that SomeClass does X when Y."""
    obj = SomeClass()
    result = obj.method()
    assert result == expected


@pytest.mark.asyncio
async def test_async_behavior(tmp_path) -> None:
    """Test async operation with filesystem isolation."""
    obj = SomeClass(workspace=tmp_path)
    result = await obj.async_method()
    assert result is not None
```

### Best Practices

1. **Keep tests fast**: No network calls, no sleeping, no heavy I/O
2. **Use `tmp_path`**: Never write to real filesystem paths
3. **Minimal test doubles**: Simple classes that return pre-configured data
4. **One assertion focus**: Each test should verify one specific behavior
5. **Descriptive names**: `test_{what_is_tested}_{condition}_{expected_result}`

## Related Documentation

- [Conventions](05-conventions.md) — Code style and development practices
- [Repository Map](01-repo-map.md) — File locations
- [Architecture](02-architecture.md) — Component design

---

**Last Updated**: 2026-03-15
**Version**: 1.0