Infrastructure Diagrams: Complete Reference

Infrastructure diagrams model your system topology as a traffic flow graph. You declare components, wire them together, set capacity and behavioral properties, and DGMO computes the rest — RPS distribution, latency percentiles, availability, circuit breaker states, queue overflow risk, and more.

Unlike static architecture diagrams, infra diagrams are live simulations. Change the entry RPS or flip a scenario, and every downstream metric updates instantly.

Quick start

The simplest useful infra diagram: an edge entry point, a CDN with caching, and an API server.

Minimal infrastructure diagram

infra

Edge
  rps 1000
  -> CDN

CDN
  cache-hit 60%
  -> API

API
  max-rps 500
  latency-ms 30

Three components, three properties. DGMO computes:

CDN receives 1,000 RPS, serves 60% from cache, forwards 400 RPS downstream
API receives 400 RPS against a 500 RPS capacity — headroom is visible
Latency accumulates: CDN → API, so end-to-end latency includes both hops

Core concepts

Entry point (Edge) — where traffic enters the system
Components — nodes in your architecture
Connections — wiring traffic flow
Traffic splits — distributing load across paths

Component properties

Cache (cache-hit) — CDN and cache layers
Firewall (firewall-block) — WAF and security filters
Rate limiting (ratelimit-rps) — throttling inbound traffic
Capacity (max-rps, instances) — throughput and scaling
Dynamic scaling (instances min-max) — auto-scaling ranges
Latency (latency-ms) — per-component response time
Uptime (uptime) — component reliability
Circuit breakers (cb-error-threshold, cb-latency-threshold-ms) — failure protection
Serverless (concurrency, duration-ms, cold-start-ms) — Lambda/function compute
Queues (buffer, drain-rate, retention-hours, partitions) — message queue modeling

Organization

Groups — clusters, pods, and replica sets
Tags — team ownership and categorization
Scenarios — simulate different load conditions

How calculations work

RPS propagation — how traffic flows through the graph
Latency computation — cumulative latency and percentiles
Availability computation — uptime × local availability
Circuit breaker logic — when breakers trip
Queue metrics — fill rate, overflow, and wait time
Percentile computation — how p50/p90/p99 are derived

Reference

Diagram options — global defaults
Validation and diagnostics — what DGMO checks for you
Property quick reference — all properties in one table

Entry point

Every infra diagram needs exactly one edge entry point — the source of all inbound traffic. Name a component Edge or Internet and give it an rps property:

Edge entry point with 50K RPS

infra

Edge
  rps 50000
  -> Gateway

The rps property is only valid on the edge node. It represents total inbound requests per second entering your system. All downstream RPS values are computed from this single number.

Components

A component is any named node in your architecture — a server, database, cache, queue, or service. Write the component name on its own line, then indent properties below it:

APIServer
  max-rps 500
  latency-ms 30
  uptime 99.95%

Component names must start with a letter or underscore and can contain letters, numbers, and underscores. You don’t declare a component’s “type” — its role is inferred from its properties. A component with cache-hit is a cache. One with buffer is a queue. One with concurrency is serverless.

Connections

Connect components with arrow syntax. A bare -> sends all traffic; a labeled arrow -label-> adds a route annotation:

Simple connections

infra

Edge
  rps 10000
  -> LB

LB
  -/api-> APIServer
  -/web-> WebServer

Connections define the directed acyclic graph (DAG) that traffic flows through. Cycles are not allowed — DGMO validates this and reports an error if it detects a loop.

Connection syntax

-> Target                           # unlabeled connection
-/api-> Target                      # labeled connection
-/api-> Target | split: 60%         # labeled with explicit split
-> [Group Name]                     # connect to a group
-route-> [Group Name] | split: 40%  # labeled connection to group with split

Traffic splits

When a component has multiple outbound connections, traffic is distributed across them. You can declare explicit percentages or let DGMO distribute evenly:

Explicit 70/30 traffic split

infra

Edge
  rps 10000
  -> LB

LB
  -/api-> APIServer | split: 70%
  -/static-> CDN | split: 30%

APIServer
  max-rps 800
  latency-ms 40

CDN
  cache-hit 90%
  latency-ms 5

Split rules:

All declared — percentages must sum to 100% (DGMO warns if they don’t)
None declared — traffic splits evenly (2 targets = 50/50, 3 targets = 33/33/34)
Some declared, some not — undeclared targets share the remainder equally

LB
  -/api-> API | split: 60%     # 60% of LB output
  -/web-> Web | split: 30%     # 30% of LB output
  -/health-> Health            # gets remaining 10%

Component properties

Properties define what a component does to traffic passing through it. Each property maps to a specific behavior in the traffic simulation.

Cache

Property: cache-hit <percentage>

A cache layer absorbs a fraction of inbound traffic before it reaches downstream components. The cache-hit percentage is the fraction of requests served directly from cache.

CDN absorbing 80% of traffic

infra

Edge
  rps 100000
  -> CDN

CDN
  cache-hit 80%
  -> AppServer

AppServer
  max-rps 5000
  latency-ms 50

How it works: If a component receives 100,000 RPS with cache-hit 80%, only 20,000 RPS flow downstream. The remaining 80,000 are served from cache and never reach backend services.

Computed effect on downstream RPS:

downstream_rps = inbound_rps × (1 - cache_hit / 100)

Firewall

Property: firewall-block <percentage>

A firewall or WAF drops a percentage of inbound traffic (malicious requests, bot traffic, blocked IPs). Blocked traffic is removed from the flow entirely.

WAF + rate limiter + API chain

infra

Edge
  rps 50000
  -> WAF

WAF
  firewall-block 8%
  -> Gateway

Gateway
  ratelimit-rps 10000
  -> API

API
  max-rps 5000
  latency-ms 45

Computed effect on downstream RPS:

downstream_rps = inbound_rps × (1 - firewall_block / 100)

Cache and firewall effects compose multiplicatively. If traffic passes through a cache (cache-hit 80%) then a firewall (firewall-block 5%), only 20% × 95% = 19% of original traffic reaches downstream.

Rate limiting

Property: ratelimit-rps <number>

A rate limiter caps throughput at a fixed RPS threshold. Excess traffic is rejected.

Gateway
  ratelimit-rps 10000
  -> API

Computed effect:

downstream_rps = min(effective_inbound_rps, ratelimit_rps)

Where effective_inbound_rps is the RPS after cache and firewall reductions. If 15,000 RPS arrive after cache/firewall and ratelimit-rps is 10,000, only 10,000 flow downstream and 5,000 are rejected.

Rate limiting also affects availability — rejected traffic reduces the availability score proportionally.

Capacity

Properties: max-rps <number>, instances <number>

These define a component’s throughput capacity. max-rps is the per-instance maximum. instances multiplies it:

3 instances × 400 max-rps = 1,200 total capacity

infra

Edge
  rps 3000
  -> LB

LB
  -> API

API
  instances 3
  max-rps 400
  latency-ms 30

Total capacity formula:

total_capacity = max_rps × instances

When computed RPS exceeds total capacity, the component is overloaded. DGMO flags this visually (red indicators) and in diagnostics. Overload also reduces availability.

If instances is omitted, it defaults to 1. If max-rps is omitted, the component has unlimited capacity.

Dynamic scaling

Property: instances <min>-<max> (range syntax)

When you specify a range like instances 1-8, DGMO computes the number of instances needed to handle current load:

Auto-scaling from 1 to 8 instances

infra

Edge
  rps 5000
  -> LB

LB
  -> API

API
  instances 1-8
  max-rps 300
  latency-ms 25

Scaling formula:

needed = ceil(computed_rps / max_rps)
actual  = clamp(needed, min, max)

If the API receives 5,000 RPS with max-rps 300 and instances 1-8:

needed = ceil(5000 / 300) = 17
actual = clamp(17, 1, 8) = 8
Total capacity = 300 × 8 = 2,400 — still overloaded at 5,000 RPS

This lets you model auto-scaling behavior realistically, including cases where scaling maxes out.

Latency

Property: latency-ms <number>

Per-component response time in milliseconds. Latency accumulates along the path from edge to leaf:

CDN
  latency-ms 5
  -> API

API
  latency-ms 40
  -> DB

DB
  latency-ms 8

A request traversing CDN → API → DB has cumulative latency of 5 + 40 + 8 = 53ms.

If omitted, a component contributes 0ms latency (or the default-latency-ms value if set — see diagram options).

Uptime

Property: uptime <percentage>

Component reliability as a percentage. Uptime propagates along paths — the end-to-end uptime of a chain is the product of individual uptimes:

Uptime cascading through the chain

infra
default-uptime 99.9

Edge
  rps 1000
  -> API

API
  max-rps 2000
  latency-ms 30
  uptime 99.95%
  -> DB

DB
  latency-ms 5
  uptime 99.99%

end_to_end_uptime = 99.9% × 99.95% × 99.99% ≈ 99.84%

If omitted, a component’s uptime defaults to 100% (or default-uptime if set globally). Uptime feeds into the availability computation as the baseline before load-dependent degradation.

Circuit breakers

Properties: cb-error-threshold <percentage>, cb-latency-threshold-ms <number>

Circuit breakers protect downstream services by tripping when failure conditions are met. DGMO models three states: closed (normal), open (tripped), and half-open (recovering).

Circuit breaker on overloaded API

infra

Edge
  rps 5000
  -> Gateway

Gateway
  -> API

API
  max-rps 300
  instances 2
  latency-ms 40
  cb-error-threshold 50%

Error-rate trigger:

error_rate = (computed_rps - capacity) / computed_rps × 100
if error_rate ≥ cb_error_threshold → state = 'open'

The error rate is derived from overload — if a component receives more RPS than its capacity, the excess is treated as errors. When the error rate exceeds the threshold, the circuit breaker opens.

Latency trigger:

if cumulative_latency_ms > cb_latency_threshold_ms → state = 'open'

If the total latency accumulated up to this component exceeds the threshold, the breaker trips. This models timeout-based circuit breakers.

You can combine both triggers on the same component — the breaker opens if either condition is met.

Serverless

Properties: concurrency <number>, duration-ms <number>, cold-start-ms <number>

Serverless components (Lambda, Cloud Functions) use a different capacity model. Instead of instances × max-rps, capacity is derived from concurrency and execution duration:

Serverless function with cold starts

infra

Edge
  rps 2000
  -> Gateway

Gateway
  -> ProcessOrder

ProcessOrder
  concurrency 1000
  duration-ms 200
  cold-start-ms 800

Capacity formula:

capacity_rps = concurrency / (duration_ms / 1000)

With concurrency 1000 and duration-ms 200:

capacity = 1000 / 0.2 = 5,000 RPS

Cold starts: When cold-start-ms is set, DGMO splits traffic into two paths for percentile computation:

95% warm path — latency = duration-ms
5% cold path — latency = duration-ms + cold-start-ms

This means cold starts primarily affect p99 latency, which matches real-world behavior. A function with duration-ms 200 and cold-start-ms 800 has p50 latency of ~200ms but p99 of ~1,000ms.

Important: concurrency is mutually exclusive with instances and max-rps. A component is either serverless (concurrency-based) or traditional (instance-based). DGMO warns if you mix them.

Queues

Properties: buffer <number>, drain-rate <number>, retention-hours <number>, partitions <number>

Queues decouple producers from consumers. They have fundamentally different behavior from request/response components — they absorb traffic bursts and reset latency boundaries.

Queue decoupling API from workers

infra

Edge
  rps 5000
  -> API

API
  max-rps 6000
  latency-ms 20
  -> OrderQueue

OrderQueue
  buffer 50000
  drain-rate 1000
  retention-hours 72
  -> Worker

Worker
  instances 3
  max-rps 400
  latency-ms 100

Key properties:

Property	What it does
`buffer`	Maximum queue depth (messages). Determines overflow risk.
`drain-rate`	Messages consumed per second. Downstream RPS is capped at this rate.
`retention-hours`	How long messages are retained. Informational, shown in the node card.
`partitions`	Number of partitions. Informational, shown in the node card.

How queues transform traffic:

RPS capping — Downstream components receive at most drain-rate RPS, regardless of how much traffic the queue receives. If 5,000 RPS arrive but drain-rate is 1,000, only 1,000 RPS flow to workers.
Overflow computation — When inbound RPS exceeds drain rate, the queue fills:
```
fill_rate     = max(0, inbound_rps - drain_rate)
time_to_overflow = buffer / fill_rate   (in seconds)
```
If buffer 50000 and fill_rate 4000, the queue overflows in 12.5 seconds.
Latency boundary — Queues reset the cumulative latency chain. Downstream components don’t inherit the producer’s latency. Instead, queue wait time becomes the new baseline:
```
wait_time_ms = (fill_rate / drain_rate) × 1000
```
Availability decoupling — The producer side and consumer side have independent availability. A queue absorbs producer-side overload without propagating it downstream.

Important: buffer is mutually exclusive with max-rps. A component is either a queue or a standard service. DGMO warns if you mix them.

Groups

Groups represent clusters, pods, or replica sets — a set of components that scale together. Wrap components in [Group Name] brackets:

API cluster with 3 instances

infra

Edge
  rps 10000
  -> LB

LB
  -/api-> [API Cluster] | split: 70%
  -/static-> StaticServer | split: 30%

[API Cluster]
  instances 3
  APIServer
    max-rps 500
    latency-ms 45
    -> DB

DB
  latency-ms 10
  uptime 99.99%

StaticServer
  cache-hit 95%
  latency-ms 2

Group syntax

[API Cluster]
  instances 3          # group-level instance count
  APIServer            # component inside the group
    max-rps 500
    latency-ms 45

The group’s instances property acts as a multiplier on child components’ capacity. If APIServer has max-rps 500 and the group has instances 3, total capacity is 500 × 3 = 1,500 RPS.

Connecting to groups

You can connect directly to a group. Traffic is distributed to the group’s children:

LB
  -> [API Cluster]

Group capacity with multiple children

When a group contains multiple components in a chain (e.g., API → DB), the group’s effective capacity is the bottleneck — the minimum capacity among its children:

[Backend Pod]
  instances 3
  API               # max-rps 500 → 500 per instance
    max-rps 500
    -> Cache
  Cache              # max-rps 2000 → 2000 per instance
    max-rps 2000

The pod’s effective capacity is 500 × 3 = 1,500 (bottlenecked on API, not Cache).

Drain-rate scaling in groups

For queues inside groups, drain-rate scales with group instances (more consumers = faster draining), but buffer does not scale (fixed capacity per queue).

Scenarios

Scenarios let you define alternative configurations to simulate different load conditions — peak traffic, Black Friday, cache failures, outages. Each scenario overrides specific properties on specific components:

Diagram with peak-traffic and cache-miss scenarios

infra

Edge
  rps 10000
  -> CDN

CDN
  cache-hit 80%
  -> API

API
  instances 2
  max-rps 500
  latency-ms 40

scenario peak-traffic
  Edge
    rps 50000
  API
    instances 6

scenario cache-miss
  CDN
    cache-hit 20%

Scenario syntax

scenario peak-traffic
  Edge
    rps 50000             # override edge RPS
  API
    instances 6           # scale up instances

scenario cache-miss
  CDN
    cache-hit 20%         # simulate cache degradation

Each scenario block lists component names with indented property overrides. When a scenario is active, those properties replace the base values, and all downstream metrics recompute.

In the desktop app, scenarios appear in a dropdown — select one to see how your architecture handles that load profile. In the online editor and CLI, the base configuration renders by default.

How calculations work

DGMO doesn’t just draw boxes and arrows. It runs a full traffic simulation through your architecture graph. Here’s exactly what it computes and how.

RPS propagation

Traffic flows from the edge entry point through the graph via breadth-first traversal. At each node, behavioral properties transform the RPS before it reaches downstream components:

Start: Edge node’s rps value is the total inbound traffic
At each component, apply behaviors in order:
- Cache: rps = rps × (1 - cache_hit / 100)
- Firewall: rps = rps × (1 - firewall_block / 100)
- Rate limiter: rps = min(rps, ratelimit_rps)
- Queue: rps = min(rps, drain_rate × group_instances)
Split: Distribute the post-behavior RPS across outbound edges by split percentage
Accumulate: If a node receives traffic from multiple sources, RPS values are summed

Example trace:

Edge (rps 100,000)
  → CDN (cache-hit 80%) → forwards 20,000 RPS
    → WAF (firewall-block 5%) → forwards 19,000 RPS
      → LB → splits:
        - /api (60%) → API receives 11,400 RPS
        - /static (40%) → Static receives 7,600 RPS

Latency computation

Latency accumulates along the worst-case path from edge to each component:

Each component adds its latency-ms value (or default-latency-ms, or 0)
If a component has multiple incoming paths, DGMO takes the maximum incoming latency (worst case)
Queue nodes reset the latency chain — downstream latency starts from queue wait time, not from the producer’s cumulative latency

Percentile computation: DGMO computes p50, p90, and p99 latency by collecting all leaf-to-edge paths, weighting each by its traffic proportion:

For normal components: one path per leaf, with cumulative latency
For serverless with cold starts: the path splits into a 95% warm path and a 5% cold path (warm = duration-ms, cold = duration-ms + cold-start-ms)
Paths are sorted by latency and weighted by traffic volume
p50/p90/p99 are interpolated from the cumulative weight distribution

This means cold starts primarily show up in p99, and high-traffic paths have more weight in overall percentiles — matching real-world latency distributions.

Availability computation

Availability is computed in two layers:

1. Uptime propagation (path-based): The product of all uptime values along the path from edge to each node. This represents the probability that all components in the chain are operational:

path_uptime = ∏(component_uptime / 100) for each component in the path

If multiple paths converge, DGMO takes the minimum (most conservative).

2. Local availability (load-dependent): Each component’s local availability depends on its current load relative to capacity:

Normal (under capacity): local_availability = 1.0
Overloaded (over capacity): local_availability = capacity / inbound_rps
- A component with 500 capacity receiving 1,000 RPS has 50% local availability
Rate-limited: local_availability = ratelimit_rps / effective_inbound_rps
Queue overflow risk: If the queue fills within 60 seconds, availability degrades proportionally to drain_rate / inbound_rps

3. Compound availability: The product of all local availabilities along the path from edge:

compound_availability = ∏(local_availability) for each node in the path

Queue decoupling: Queues reset the availability chain. The consumer side doesn’t inherit the producer’s overload — it only sees the queue’s own availability.

Circuit breaker logic

Circuit breakers have three states:

State	Condition	Effect
Closed	Error rate below threshold, latency below threshold	Normal operation
Open	Error rate ≥ `cb-error-threshold`, OR cumulative latency > `cb-latency-threshold-ms`	Component is tripped — shown as dashed border
Half-open	(not currently modeled — DGMO uses closed/open)	—

Error rate derivation:

capacity = serverless ? (concurrency / duration_s) : (max_rps × instances × group_mul)
error_rate = max(0, (computed_rps - capacity) / computed_rps × 100)

The circuit breaker trips when the overload-derived error rate exceeds the threshold. This means a component at 2× capacity with cb-error-threshold 50% will trip (error rate = 50%).

Queue metrics

Queues compute three additional metrics:

Metric	Formula	Meaning
Fill rate	`max(0, inbound_rps - drain_rate)`	How fast the buffer fills (msg/s)
Time to overflow	`buffer / fill_rate` (if fill_rate > 0)	Seconds until queue is full
Wait time	`(fill_rate / drain_rate) × 1000`	Milliseconds a message waits in queue

If fill_rate is 0 (drain keeps up), time to overflow is infinite and wait time is 0. The queue is healthy.

If time_to_overflow < 60 seconds, DGMO marks the queue as at risk and degrades its availability score.

Percentile computation

DGMO computes p50, p90, and p99 for both latency and availability using weighted path distributions:

Collect all paths from edge to leaf nodes via depth-first traversal
Each path carries a weight proportional to its traffic volume (from RPS splits)
Sort paths by the metric (latency ascending, availability ascending)
Walk the sorted list, accumulating weights until the target percentile threshold:
- p50: cumulative weight reaches 50%
- p90: cumulative weight reaches 90%
- p99: cumulative weight reaches 99%
Interpolate the metric value at that threshold

This is computed both system-wide (from the edge node) and per-node (from each individual component’s downstream paths).

Diagram options

Set global defaults at the top of your diagram:

infra My System
direction-tb
default-latency-ms 10
default-uptime 99.9
no-animate

Option	Default	Description
`direction-tb`	off (LR is default)	Layout direction: omit for left-to-right, add for top-to-bottom
`default-latency-ms N`	`0`	Latency applied to components without an explicit `latency-ms`
`default-uptime N`	`100`	Uptime percentage applied to components without explicit `uptime`
`animate` / `no-animate`	`animate`	Flow animation particles

Validation

DGMO validates your diagram and reports diagnostics for common issues:

Check	Type	What it catches
Cycle detection	Error	Circular connections (A → B → A). Infra diagrams must be DAGs.
Split sum	Warning	Split percentages that don’t add up to 100%.
Orphan detection	Warning	Components not reachable from the edge entry point.
Overload	Warning	Components receiving more RPS than their capacity.
Rate-limit excess	Warning	Inbound RPS exceeding the rate limiter threshold.
System uptime	Warning	Overall system uptime below 99% SLA threshold.
Property conflicts	Warning	Mixing incompatible properties (e.g., `concurrency` with `instances`).

Property reference

All component properties in one table:

Property	Type	Valid on	Behavior
`rps`	number	Edge only	Total inbound requests per second
`cache-hit`	percentage	Any	Fraction of traffic served from cache, not forwarded
`firewall-block`	percentage	Any	Fraction of traffic dropped (blocked)
`ratelimit-rps`	number	Any	Maximum RPS forwarded; excess rejected
`max-rps`	number	Non-queue	Per-instance maximum throughput
`instances`	number or range	Non-serverless	Replica count (e.g., `3` or `1-8`)
`latency-ms`	number	Any	Per-component response time in milliseconds
`uptime`	percentage	Any	Component reliability (e.g., `99.99%`)
`cb-error-threshold`	percentage	Any	Circuit breaker trips when error rate exceeds this
`cb-latency-threshold-ms`	number	Any	Circuit breaker trips when cumulative latency exceeds this
`concurrency`	number	Serverless	Maximum concurrent executions
`duration-ms`	number	Serverless	Average execution time per invocation
`cold-start-ms`	number	Serverless	Additional latency on cold invocations
`buffer`	number	Queue	Maximum queue depth (messages)
`drain-rate`	number	Queue	Messages consumed per second
`retention-hours`	number	Queue	Message retention duration (informational)
`partitions`	number	Queue	Number of partitions (informational)

Mutual exclusions:

concurrency cannot be combined with instances or max-rps (serverless vs. traditional)
buffer cannot be combined with max-rps (queue vs. request/response)

Putting it all together

Here’s a complete production-grade example combining caching, firewall, rate limiting, load balancing, pod groups, dynamic scaling, and team tags:

Full e-commerce infrastructure

infra E-Commerce Platform

tag Team alias t
  Backend(blue)
  Platform(teal) default
  Data(violet)

Edge
  rps 100000
  -> CloudFront

CloudFront | t: Platform
  cache-hit 80%
  -> WAF

WAF | t: Platform
  firewall-block 5%
  -> ALB

ALB | t: Platform
  -/api-> [API Pods] | split: 60%
  -/purchase-> [Commerce Pods] | split: 30%
  -/static-> StaticServer | split: 10%

[API Pods]
  instances 3
  APIServer | t: Backend
    max-rps 500
    latency-ms 45
    cb-error-threshold 50%

[Commerce Pods]
  PurchaseMS | t: Backend
    instances 1-8
    max-rps 300
    latency-ms 120

StaticServer | t: Platform
  latency-ms 5

This diagram models:

100K RPS at the edge, reduced to 20K after CDN caching, then 19K after WAF filtering
Three traffic paths through the ALB: API (60%), Commerce (30%), Static (10%)
API Pods with 3 instances at 500 RPS each = 1,500 total capacity
Commerce Pods with dynamic scaling from 1-8 instances
Team ownership via the t tag, visualized in the legend

Every computed metric — downstream RPS, latency percentiles, availability, overload detection — updates based on these declarations.

Try it yourself

Online Editor — select “Infrastructure” from the sidebar to start with a template
CLI — render from the terminal: dgmo diagram.dgmo -o infra.png
Desktop app — full editor with live preview, scenario switching, and click-to-source navigation