Harness Engineering | Tuna Cinsoy

Q) What are the five layers/components of harness engineering?

These are task specification (instructions), context provision with tools, execution environment, verification feedback and state management.

Q) What are the additional components that need to be in the repository?

General Files:
- AGENTS.md -> It must contain information about the overall project, tech stack
- CONSTRAINTS.md -> Directives that start with “MUST” or “MUST NOT” - clearly define the hard constraints of the project. (can be embedded in AGENTS.md file too, if the line count does not exceed 200 in AGENTS.md)
Specific Files:
- ARCHITECTURE.md -> File that contains description about each microservice’s, i.e. small executable component, responsibilities, dependencies and the state of readiness (which tests need to be passed, so that this microservice is ready for production?)
- PROGRESS.md -> Tracking the current microservice progress (could be in a checklist format, similar to what we have in spec-kit’s tasks phase)

Q) What are the features of AGENTS.md file?

AGENTS.md is an entry file that contains 50-200 lines, so it should only contain the general information about the whole project. These are:

Project Overview: What’s really happening here? What’s the solution? What’s the aim?
Quick Start: How to ‘install, run and test’ the code here
Global Hard Constraints: non-negotiable, and not more than 15 must or must not statements
Links to Specific Topic Documents: A link to the specific doc, and then followed by an applicability condition
For state persistance across sessions, rules to follow for clock-in (session start) and clock-out (session end)

So, something like the following:


# AGENTS.md

## Project Overview
Python 3.11 FastAPI backend, PostgreSQL 15 database.

## Quick Start
- Install: `make setup`
- Test: `make test`
- Full verification: `make check`

## Hard Constraints
- All APIs must use OAuth 2.0 authentication
- All database queries must use SQLAlchemy 2.0 syntax
- All PRs must pass pytest + mypy --strict + ruff check

## Topic Docs
- API Design Patterns (`docs/api-patterns.md`) — Required reading when adding endpoints
- Database Rules (`docs/database-rules.md`) — Required when modifying database operations
- Testing Standards (`docs/testing-standards.md`) — Reference when writing tests

## At session start (clock in)
1. Read PROGRESS.md for current state
2. Read DECISIONS.md for important decisions
3. Run make check to confirm repo is in consistent state
4. Continue from PROGRESS.md "Next Steps" section

## Before session end (clock out)
1. Update PROGRESS.md
2. Run make check to confirm consistent state
3. Commit all completed work

Q) What are the features of individual document files?

Each of the files, under docs directory or next to the corresponding module, must contain between 50 to 150 lines, and it needs to describe only the content that it encapsulates.

If there are some statements that needs to be in the codebase (specific line of code that does something because of something), there’s no need to duplicate it here.

Each statement in these document files must represent three additional remarks:

Source: Why the rule is here?
Applicability Condition: When is this rule needed?
Expiry Condition: In which condition, this rule can be removed?

So, if I had a backend API that talks to a postgresql database, I need to have something like the following:


# Postgres Microservice Standards

This document provides the rules for managing database operations inside the API microservice.

## Connection Pooling

You must use a connection pool to share active database connections instead of opening a new connection for every single user request.

Source: Opening fresh connections to Postgres is slow and wastes database server memory.
Applicability: Read this when setting up the database connection or configuring the server startup logic.
Expiry: You can remove this rule if we move to a cloud service that automatically manages connection pooling for us.

## Safe Query Inputs

You must pass user inputs as separate parameters to the database. You are not allowed to join text strings together to build a database query.

Source: Joining text directly into queries creates major security flaws where attackers can steal or delete data.
Applicability: Apply this rule whenever you write database queries that include data provided by the user.
Expiry: This rule is permanent and should never be removed.

## Short Transactions

You must finish all slow tasks like calculating data or calling outside APIs before you start a database transaction. The transaction should only contain the final read and write steps.

Source: Keeping a transaction open for too long locks the data, which forces other users to wait and slows down the entire system.
Applicability: Follow this when writing code that updates multiple tables at the same time.
Expiry: This rule can be removed if we stop using direct database transactions and switch to a message queue system for all data updates.

## Fast Data Lists

You must use a cursor, which acts like a bookmark, to load the next page of results. Do not use the offset method to skip previous items.

Source: The offset method forces the database to count and skip every single old row, which becomes incredibly slow as the application gets more users.
Applicability: Check this rule when building any endpoint that returns a long list of items.
Expiry: You can ignore this rule only for specific tables that are guaranteed to never hold more than fifty rows.

## Batching Related Data

When you need to get a list of items and their connected details, you must grab all the connected details at once in a single extra query.

Source: Asking the database for details one by one inside a loop creates a massive traffic jam of tiny requests.
Applicability: Use this rule when writing code that reads from two or more connected tables.
Expiry: This rule can be removed if the application starts using an automatic tool that groups these requests for us.

Q) What is context anxiety?

When LLMs realize that their context window is running low, they tend to skip verification, tests and all of these best-standard methods to finish the task as soon as possible. That’s called as context anxiety, by Anthropic.

Q) What’s the way to handle state persistance across different sessions?

Core approach: Treat the agent like an engineer whose short-term memory gets wiped at every session.

First, we need a PROGRESS.md file, that looks like the following:

# Project Progress

## Current State
- Latest commit: abc1234 (feat: add user preferences endpoint)
- Test status: 42/43 passing (test_pagination_edge_case failing)
- Lint: passing

## Completed
- [x] User model and database migration
- [x] Basic CRUD endpoints
- [x] Auth middleware integration

## In Progress
- [ ] Pagination feature (90% - edge case test failing)

## Known Issues
- test_pagination_edge_case returns 500 on empty result sets
- Need to confirm whether deleted users should appear in listings

## Next Steps
1. Fix pagination edge case bug
2. Add "include deleted users" query parameter
3. Update API documentation

Second, we need DECISIONS.md file, that looks like the following:

# Design Decisions

## 2024-01-15: Use Redis for user preferences caching
- Reason: High read frequency (every API call), small data size
- Rejected alternative: PostgreSQL materialized view (high change frequency makes maintenance cost not worthwhile)
- Constraint: Cache TTL of 5 minutes, active invalidation on write

Third, use git commits as checkpoints by committing each atomic unit of work (can be mentioned in the init file, AGENTS.md)

Fourth, enable harness initialization flow in AGENTS.md file:

## At session start (clock in)
1. Read PROGRESS.md for current state
2. Read DECISIONS.md for important decisions
3. Run make check to confirm repo is in consistent state
4. Continue from PROGRESS.md "Next Steps" section

## Before session ends (clock out)
1. Update PROGRESS.md
2. Run make check to confirm consistent state
3. Commit all completed work

Q) What’s the difference between Initialization and Implementation?

Initialization phase must take place in the first session, before any type of implementation starts. It’s all about preparing the environment, checking if the necessary dependencies are installed etc. - The question to answer is ‘When I start the actual development itself, am I going to bump into any environmental/infrastructural errors?’ And we do not want that to happen.

Q) What are the definitions of overreach and under-finish?

Code written but tests not passing is under-finish.
Doing 5 features with 0 passing end-to-end is overreach.

Overreach dilutes attention, diluted attention causes under-finish, and the half-finished code left behind increases system complexity, which further drives overreach in the next task. A vicious cycle.

Q) How to avoid overreach and under-finish?

To ensure that WIP=1, so that the attention of the agent will be used effectively, in AGENTS.md file, state:

## Work Rules
- Work on one feature at a time
- Only start the next feature after the current one passes end-to-end verification
- Don't "also refactor" feature B while implementing feature A

Each feature request must have its verification command:

F01: User Registration
  Verification: curl -X POST /api/register -d '{"email":"test@example.com","password":"123456"}' | jq .status == 201
  State: passing

Q) What are the feature lists? Why do we need them?

Its format look like the following:

{
  "id": "F03",
  "behavior": "POST /cart/items with {product_id, quantity} returns 201",
  "verification": "curl -X POST http://localhost:3000/api/cart/items -H 'Content-Type: application/json' -d '{\"product_id\":1,\"quantity\":2}' | jq .status == 201",
  "state": "passing",
  "evidence": "commit abc123, test output log"
}

It has 5 fields:

ID: shows the unique id of the feature request.
Behavior: What the agent needs to do?
Verification: How to verify if the work is in ‘done’ state?
State: What’s the status of this task? (not_started, active, blocked, passing)
Evidence: If the status is passing, how so? What’s the proof?

It allows agents to keep the track of feature requests between the sessions.

Q) What’s premature completion declaration? How to avoid that?

Premature Completion Declaration takes place when an agent has been given a task to both implement it, and also evaluate its completeness. Agents have tendency to sugarcoat their work - that’s why, additional explicit checks are necessary.

In AGENTS.md file, we can state:

## Definition of Done
- Feature complete = end-to-end verification passed, not "code is written"
- Required verification levels:
  1. Unit tests pass
  2. Integration tests pass
  3. End-to-end flow verification passes
- Do not proceed to level 2 if level 1 fails
- Do not proceed to level 3 if level 2 fails