← All articles

What is behavior-driven development (BDD)?

May 22, 2026 · 14 min read

Behavior-driven development (BDD) is a way of building software where you describe and verify observable behavior in language the business understands—before or alongside implementation details. Specifications are concrete examples everyone can read; those examples can drive automated checks and stay accurate as the product evolves.

Dan North introduced BDD around 2003 as an evolution of test-driven thinking: not more tests for their own sake, but a shared specification that product, development, and QA align on before code drifts. BDD is the conversation and the examples; Gherkin and tools like Cucumber are how many teams capture and automate them.

Discovery → Gherkin → Automated checks → Living documentation

WavePillars treats BDD as delivery discipline: behavior is explicit, reviewable, and provable—not buried in tickets, oral agreements, or implementation-only unit tests.

Why BDD

BDD pays off when multiple roles must agree what "done" means and keep that agreement over releases.

  • Shared language — Product, engineering, and QA use the same examples; ambiguity surfaces in discovery, not in production.
  • Living documentation.feature files stay readable by non-developers; CI proves they still match the running system.
  • Fewer wrong features — Edge cases and business rules appear as examples in workshops, not as UAT surprises.
  • Regression safety — Acceptance scenarios encode criteria; pipelines catch behavior breaks even when internals refactor freely.
  • Fits agile delivery — User stories gain executable acceptance criteria; retrospectives can point to scenarios that failed, not vague "it broke."

Costs and failure modes (worth naming honestly):

  • Ceremony on tiny work — A one-line config change does not need a feature file and a Three Amigos session.
  • Brittle scenarios — Steps tied to button IDs, CSS selectors, or HTTP status codes break on refactors and erode trust.
  • Conversation skipped — Teams that only write .feature files without discovery get verbose tests, not shared understanding.

BDD is a discipline, not a tax. Use it when alignment and behavior clarity matter; stay lightweight when scope is trivial and already covered by focused unit tests.

Three Amigos

The Three Amigos session brings three perspectives together before implementation: typically business/domain (product owner, BA, or domain expert), development, and testing/QA. On small teams one person may wear two hats; the point is three kinds of thinking in the room, not three job titles on a chart.

What happens in discovery

Teams often use Example Mapping or a similar workshop: name a user story or rule, list examples that illustrate it, flag open questions and risks, and only then phrase agreed examples in Gherkin. Disagreements resolve in the session—not in pull-request threads a week later.

Output

  • Agreed examples → scenarios in feature files
  • Explicit questions logged as spikes or follow-ups
  • Shared vocabulary (domain terms) that appears in steps and code
Role Typical contribution
Business / domain Outcomes that matter, business rules, priorities, "what would we tell the customer?"
Development Feasibility, system boundaries, data needs, what is expensive or risky to build
Testing / QA Edge cases, negative paths, data combinations, testability and observability

The Amigos do not replace code review or unit tests. They front-load what the system should do so implementation and test automation aim at the same target.

Gherkin language

Gherkin is the structured natural language Cucumber-family tools use for features and scenarios. Files usually use the .feature extension. Steps read like plain English (or your team's locale) but follow a fixed grammar so parsers and reporters stay consistent.

Keywords

Keyword Purpose
Feature High-level capability and short value statement
Background Steps shared by every scenario in the file (common setup)
Scenario One concrete example of behavior
Scenario Outline Template scenario with placeholders
Examples Table of values for a scenario outline
Given Context / preconditions
When Action or event
Then Expected outcome
And / But Continuation of the previous step type

Rules of thumb

  1. One scenario, one outcome — If a scenario asserts three unrelated results, split it.
  2. Describe what, not how — "The customer sees order confirmation" beats "The customer clicks #submit-btn."
  3. Declarative over imperative — Stay at the business-visible layer; hide UI mechanics and API plumbing in step definitions (outside the feature file).
  4. Stable domain language — Use terms the business uses (loyalty tier, withdrawal limit), not internal class names.

With BDD vs without

Aspect Without BDD With BDD
Requirements Ambiguous tickets, oral agreements Concrete examples in Gherkin
Alignment Late surprises in QA / UAT Three Amigos up front
Tests Heavy unit tests on internals; thin acceptance coverage Acceptance scenarios plus supporting unit tests
Documentation Wiki drifts from code Features verified in CI
Onboarding "Read the code" Read features, then code

Without BDD, teams often over-test implementation details and under-specify behavior customers care about. Refactors break tests that asserted private structure; nobody can say whether the product still meets the original intent.

With BDD, the opposite risk appears if scenarios are the only test layer: complex algorithms and edge-case logic still need focused unit tests. BDD acceptance checks behavior at boundaries; units check logic inside the boundary. Both layers together beat either alone.

What I've seen in practice

I've delivered on teams with and without full BDD—Three Amigos, Gherkin in the repo, and acceptance suites in CI—and the difference is not "BDD always wins." It is where the ambiguity and regression cost live.

Where BDD clearly paid off

  • Many scenarios — Checkout, pricing, eligibility, and permissions where one rule change ripples across dozens of cases; feature files and scenario outlines kept product and engineering aligned.
  • Complex domain — Insurance, loyalty, billing, and workflow products where business language matters as much as code; discovery caught edge cases that ticket summaries missed.

In those contexts, living documentation was not a slide deck—it was examples everyone argued over in a room, then proved in the pipeline.

Where BDD felt excessive

  • API-first requirements — Contract tests against OpenAPI, integration tests per endpoint, and focused unit tests on handlers often gave faster feedback with less ceremony than Gherkin layers on top of HTTP. When the consumer is another service or a mobile client, the "Three Amigos" conversation still helps, but full feature files for every endpoint duplicated the contract and slowed iteration.

Practical split: use BDD-style discovery and clear acceptance criteria for behavior the business owns; use contracts, integration tests, and units for API surface and plumbing. Do not force every POST into a Scenario because the toolchain supports it.

Examples

Bad vs good scenario (login)

Bad — imperative, UI-coupled, mixes setup with assertion:

Scenario: User logs in
  Given I open Chrome and navigate to "https://app.example/login"
  When I type "user@corp.com" into input#email
  And I type "s3cr3t" into input#password
  And I click button.submit
  Then I see div.dashboard

Good — declarative, business language, one outcome:

Scenario: Registered user signs in with valid credentials
  Given a registered user "user@corp.com"
  When the user signs in with valid credentials
  Then the user sees their account dashboard

The good version survives UI refactors; step definitions map business steps to whatever login flow exists today.

Full feature file (loyalty discount)

Feature: Apply loyalty discount at checkout
  Shoppers with an active loyalty tier receive the correct discount on eligible items.

  Background:
    Given a store with standard tax rules
    And an authenticated shopper

  Scenario: Gold tier receives ten percent off eligible items
    Given the shopper has loyalty tier "Gold"
    And the cart contains an eligible item priced at 100.00
    When the shopper checks out
    Then the order subtotal for eligible items is 90.00
    And the discount line shows "Gold 10%"

  Scenario: Ineligible items are not discounted
    Given the shopper has loyalty tier "Gold"
    And the cart contains a gift-card product
    When the shopper checks out
    Then the gift-card line has no loyalty discount

  Scenario: Guest checkout receives no loyalty discount
    Given the shopper is not signed in
    And the cart contains an eligible item priced at 50.00
    When the shopper checks out as a guest
    Then no loyalty discount is applied

  Scenario Outline: Tier discount rates
    Given the shopper has loyalty tier "<tier>"
    And the cart contains one eligible item priced at 100.00
    When the shopper checks out
    Then the order subtotal for eligible items is <expected_subtotal>

    Examples:
      | tier   | expected_subtotal |
      | Silver | 95.00             |
      | Gold   | 90.00             |
      | Platinum | 85.00           |

This file documents the rule, covers a happy path, an exclusion, a negative path, and parameterized tiers—readable in a refinement session and executable once step definitions exist.

Anti-patterns to avoid

  • Duplicate scenarios — Same behavior copy-pasted with tiny wording changes; use Scenario Outline or shared Background instead.
  • Scenario explosion — Hundreds of scenarios in one file; split by subdomain or tag (@checkout, @loyalty).
  • Technical GherkinGiven the API returns 200 in the feature file; keep HTTP and DB detail in step definitions.
  • Kitchen-sink scenarios — One scenario that logs in, updates profile, and checks out; split into separate outcomes.
  • Orphan features — Scenarios nobody runs in CI; they become lying documentation faster than no documentation.

When BDD helps—and when to skip

Good fits

  • Customer-facing flows (checkout, onboarding, billing, permissions)
  • Regulated or audit-friendly work where behavior must be explainable
  • Cross-functional teams with frequent requirement change
  • Systems where regressions are expensive and acceptance criteria are negotiable

Skip or stay lightweight

  • Throwaway spikes and one-off scripts
  • Pure algorithm libraries with no business-facing language (unit tests and property tests suffice)
  • API-only changes — schema, status codes, and auth already defined in OpenAPI or contract tests; Gherkin on top is usually redundant
  • Changes already fully covered by a tight unit suite and a one-line acceptance check

If writing the feature file takes longer than doing the work, the scope is probably too small for a full Three Amigos pass.

Practices that work

  1. Run discovery before large implementation; update scenarios when product changes, not only when tests fail.
  2. Keep features small and scoped—one capability per file when possible.
  3. Refactor steps like code: extract shared Background, reuse step wording, avoid copy-paste scenarios.
  4. Tag scenarios (@smoke, @wip) so CI can run fast subsets on every commit and full suites nightly.
  5. Run acceptance scenarios in CI on every merge; failing features block release the same way unit tests do.
  6. Pair BDD with unit tests for complex domain logic—scenarios prove boundaries, units prove calculations and branches.
  7. Reconcile after shipping—if production behavior intentionally diverged from a scenario, update the feature file so the next person does not "fix" the wrong thing.

Summary

Behavior-driven development makes observable behavior the shared contract between business, development, and QA. Three Amigos discovery turns conversations into examples; Gherkin captures those examples in readable feature files that automation can verify. Compared to ticket-only delivery, BDD reduces ambiguity and keeps documentation honest—when teams honor the conversation, not only the syntax.

Use acceptance scenarios for what users and regulators care about—especially when scenarios are numerous and the domain is complex. Use contracts and integration tests for API requirements; use unit tests for how the code implements rules inside. That balance is BDD done well.

Related reading

Related articles