Infrastructure Diagrams: Complete Reference
Infrastructure diagrams model your system topology as a traffic flow graph. You declare components, wire them together, set capacity and behavioral properties, and DGMO computes the rest — RPS distribution, latency percentiles, availability, circuit breaker states, queue overflow risk, and more.
Unlike static architecture diagrams, infra diagrams are live simulations. Change the entry RPS or flip a scenario, and every downstream metric updates instantly.
Quick start
The simplest useful infra diagram: an edge entry point, a CDN with caching, and an API server.
infra
Edge
rps 1000
-> CDN
CDN
cache-hit 60%
-> API
API
max-rps 500
latency-ms 30Three components, three properties. DGMO computes:
- CDN receives 1,000 RPS, serves 60% from cache, forwards 400 RPS downstream
- API receives 400 RPS against a 500 RPS capacity — headroom is visible
- Latency accumulates: CDN → API, so end-to-end latency includes both hops
Table of contents
Core concepts
- Entry point (Edge) — where traffic enters the system
- Components — nodes in your architecture
- Connections — wiring traffic flow
- Traffic splits — distributing load across paths
Component properties
- Cache (
cache-hit) — CDN and cache layers - Firewall (
firewall-block) — WAF and security filters - Rate limiting (
ratelimit-rps) — throttling inbound traffic - Capacity (
max-rps,instances) — throughput and scaling - Dynamic scaling (
instances min-max) — auto-scaling ranges - Latency (
latency-ms) — per-component response time - Uptime (
uptime) — component reliability - Circuit breakers (
cb-error-threshold,cb-latency-threshold-ms) — failure protection - Serverless (
concurrency,duration-ms,cold-start-ms) — Lambda/function compute - Queues (
buffer,drain-rate,retention-hours,partitions) — message queue modeling
Organization
- Groups — clusters, pods, and replica sets
- Tags — team ownership and categorization
- Scenarios — simulate different load conditions
How calculations work
- RPS propagation — how traffic flows through the graph
- Latency computation — cumulative latency and percentiles
- Availability computation — uptime × local availability
- Circuit breaker logic — when breakers trip
- Queue metrics — fill rate, overflow, and wait time
- Percentile computation — how p50/p90/p99 are derived
Reference
- Diagram options — global defaults
- Validation and diagnostics — what DGMO checks for you
- Property quick reference — all properties in one table
Entry point
Every infra diagram needs exactly one edge entry point — the source of all inbound traffic. Name a component Edge or Internet and give it an rps property:
infra
Edge
rps 50000
-> GatewayThe rps property is only valid on the edge node. It represents total inbound requests per second entering your system. All downstream RPS values are computed from this single number.
Components
A component is any named node in your architecture — a server, database, cache, queue, or service. Write the component name on its own line, then indent properties below it:
APIServer
max-rps 500
latency-ms 30
uptime 99.95%
Component names must start with a letter or underscore and can contain letters, numbers, and underscores. You don’t declare a component’s “type” — its role is inferred from its properties. A component with cache-hit is a cache. One with buffer is a queue. One with concurrency is serverless.
Connections
Connect components with arrow syntax. A bare -> sends all traffic; a labeled arrow -label-> adds a route annotation:
infra
Edge
rps 10000
-> LB
LB
-/api-> APIServer
-/web-> WebServerConnections define the directed acyclic graph (DAG) that traffic flows through. Cycles are not allowed — DGMO validates this and reports an error if it detects a loop.
Connection syntax
-> Target # unlabeled connection
-/api-> Target # labeled connection
-/api-> Target | split: 60% # labeled with explicit split
-> [Group Name] # connect to a group
-route-> [Group Name] | split: 40% # labeled connection to group with split
Traffic splits
When a component has multiple outbound connections, traffic is distributed across them. You can declare explicit percentages or let DGMO distribute evenly:
infra
Edge
rps 10000
-> LB
LB
-/api-> APIServer | split: 70%
-/static-> CDN | split: 30%
APIServer
max-rps 800
latency-ms 40
CDN
cache-hit 90%
latency-ms 5Split rules:
- All declared — percentages must sum to 100% (DGMO warns if they don’t)
- None declared — traffic splits evenly (2 targets = 50/50, 3 targets = 33/33/34)
- Some declared, some not — undeclared targets share the remainder equally
LB
-/api-> API | split: 60% # 60% of LB output
-/web-> Web | split: 30% # 30% of LB output
-/health-> Health # gets remaining 10%
Component properties
Properties define what a component does to traffic passing through it. Each property maps to a specific behavior in the traffic simulation.
Cache
Property: cache-hit <percentage>
A cache layer absorbs a fraction of inbound traffic before it reaches downstream components. The cache-hit percentage is the fraction of requests served directly from cache.
infra
Edge
rps 100000
-> CDN
CDN
cache-hit 80%
-> AppServer
AppServer
max-rps 5000
latency-ms 50How it works: If a component receives 100,000 RPS with cache-hit 80%, only 20,000 RPS flow downstream. The remaining 80,000 are served from cache and never reach backend services.
Computed effect on downstream RPS:
downstream_rps = inbound_rps × (1 - cache_hit / 100)
Firewall
Property: firewall-block <percentage>
A firewall or WAF drops a percentage of inbound traffic (malicious requests, bot traffic, blocked IPs). Blocked traffic is removed from the flow entirely.
infra
Edge
rps 50000
-> WAF
WAF
firewall-block 8%
-> Gateway
Gateway
ratelimit-rps 10000
-> API
API
max-rps 5000
latency-ms 45Computed effect on downstream RPS:
downstream_rps = inbound_rps × (1 - firewall_block / 100)
Cache and firewall effects compose multiplicatively. If traffic passes through a cache (cache-hit 80%) then a firewall (firewall-block 5%), only 20% × 95% = 19% of original traffic reaches downstream.
Rate limiting
Property: ratelimit-rps <number>
A rate limiter caps throughput at a fixed RPS threshold. Excess traffic is rejected.
Gateway
ratelimit-rps 10000
-> API
Computed effect:
downstream_rps = min(effective_inbound_rps, ratelimit_rps)
Where effective_inbound_rps is the RPS after cache and firewall reductions. If 15,000 RPS arrive after cache/firewall and ratelimit-rps is 10,000, only 10,000 flow downstream and 5,000 are rejected.
Rate limiting also affects availability — rejected traffic reduces the availability score proportionally.
Capacity
Properties: max-rps <number>, instances <number>
These define a component’s throughput capacity. max-rps is the per-instance maximum. instances multiplies it:
infra
Edge
rps 3000
-> LB
LB
-> API
API
instances 3
max-rps 400
latency-ms 30Total capacity formula:
total_capacity = max_rps × instances
When computed RPS exceeds total capacity, the component is overloaded. DGMO flags this visually (red indicators) and in diagnostics. Overload also reduces availability.
If instances is omitted, it defaults to 1. If max-rps is omitted, the component has unlimited capacity.
Dynamic scaling
Property: instances <min>-<max> (range syntax)
When you specify a range like instances 1-8, DGMO computes the number of instances needed to handle current load:
infra
Edge
rps 5000
-> LB
LB
-> API
API
instances 1-8
max-rps 300
latency-ms 25Scaling formula:
needed = ceil(computed_rps / max_rps)
actual = clamp(needed, min, max)
If the API receives 5,000 RPS with max-rps 300 and instances 1-8:
needed = ceil(5000 / 300) = 17actual = clamp(17, 1, 8) = 8- Total capacity =
300 × 8 = 2,400— still overloaded at 5,000 RPS
This lets you model auto-scaling behavior realistically, including cases where scaling maxes out.
Latency
Property: latency-ms <number>
Per-component response time in milliseconds. Latency accumulates along the path from edge to leaf:
CDN
latency-ms 5
-> API
API
latency-ms 40
-> DB
DB
latency-ms 8
A request traversing CDN → API → DB has cumulative latency of 5 + 40 + 8 = 53ms.
If omitted, a component contributes 0ms latency (or the default-latency-ms value if set — see diagram options).
Uptime
Property: uptime <percentage>
Component reliability as a percentage. Uptime propagates along paths — the end-to-end uptime of a chain is the product of individual uptimes:
infra
default-uptime 99.9
Edge
rps 1000
-> API
API
max-rps 2000
latency-ms 30
uptime 99.95%
-> DB
DB
latency-ms 5
uptime 99.99%end_to_end_uptime = 99.9% × 99.95% × 99.99% ≈ 99.84%
If omitted, a component’s uptime defaults to 100% (or default-uptime if set globally). Uptime feeds into the availability computation as the baseline before load-dependent degradation.
Circuit breakers
Properties: cb-error-threshold <percentage>, cb-latency-threshold-ms <number>
Circuit breakers protect downstream services by tripping when failure conditions are met. DGMO models three states: closed (normal), open (tripped), and half-open (recovering).
infra
Edge
rps 5000
-> Gateway
Gateway
-> API
API
max-rps 300
instances 2
latency-ms 40
cb-error-threshold 50%Error-rate trigger:
error_rate = (computed_rps - capacity) / computed_rps × 100
if error_rate ≥ cb_error_threshold → state = 'open'
The error rate is derived from overload — if a component receives more RPS than its capacity, the excess is treated as errors. When the error rate exceeds the threshold, the circuit breaker opens.
Latency trigger:
if cumulative_latency_ms > cb_latency_threshold_ms → state = 'open'
If the total latency accumulated up to this component exceeds the threshold, the breaker trips. This models timeout-based circuit breakers.
You can combine both triggers on the same component — the breaker opens if either condition is met.
Serverless
Properties: concurrency <number>, duration-ms <number>, cold-start-ms <number>
Serverless components (Lambda, Cloud Functions) use a different capacity model. Instead of instances × max-rps, capacity is derived from concurrency and execution duration:
infra
Edge
rps 2000
-> Gateway
Gateway
-> ProcessOrder
ProcessOrder
concurrency 1000
duration-ms 200
cold-start-ms 800Capacity formula:
capacity_rps = concurrency / (duration_ms / 1000)
With concurrency 1000 and duration-ms 200:
capacity = 1000 / 0.2 = 5,000 RPS
Cold starts: When cold-start-ms is set, DGMO splits traffic into two paths for percentile computation:
- 95% warm path — latency =
duration-ms - 5% cold path — latency =
duration-ms + cold-start-ms
This means cold starts primarily affect p99 latency, which matches real-world behavior. A function with duration-ms 200 and cold-start-ms 800 has p50 latency of ~200ms but p99 of ~1,000ms.
Important: concurrency is mutually exclusive with instances and max-rps. A component is either serverless (concurrency-based) or traditional (instance-based). DGMO warns if you mix them.
Queues
Properties: buffer <number>, drain-rate <number>, retention-hours <number>, partitions <number>
Queues decouple producers from consumers. They have fundamentally different behavior from request/response components — they absorb traffic bursts and reset latency boundaries.
infra
Edge
rps 5000
-> API
API
max-rps 6000
latency-ms 20
-> OrderQueue
OrderQueue
buffer 50000
drain-rate 1000
retention-hours 72
-> Worker
Worker
instances 3
max-rps 400
latency-ms 100Key properties:
| Property | What it does |
|---|---|
buffer | Maximum queue depth (messages). Determines overflow risk. |
drain-rate | Messages consumed per second. Downstream RPS is capped at this rate. |
retention-hours | How long messages are retained. Informational, shown in the node card. |
partitions | Number of partitions. Informational, shown in the node card. |
How queues transform traffic:
-
RPS capping — Downstream components receive at most
drain-rateRPS, regardless of how much traffic the queue receives. If 5,000 RPS arrive butdrain-rateis 1,000, only 1,000 RPS flow to workers. -
Overflow computation — When inbound RPS exceeds drain rate, the queue fills:
fill_rate = max(0, inbound_rps - drain_rate) time_to_overflow = buffer / fill_rate (in seconds)If
buffer 50000andfill_rate 4000, the queue overflows in 12.5 seconds. -
Latency boundary — Queues reset the cumulative latency chain. Downstream components don’t inherit the producer’s latency. Instead, queue wait time becomes the new baseline:
wait_time_ms = (fill_rate / drain_rate) × 1000 -
Availability decoupling — The producer side and consumer side have independent availability. A queue absorbs producer-side overload without propagating it downstream.
Important: buffer is mutually exclusive with max-rps. A component is either a queue or a standard service. DGMO warns if you mix them.
Groups
Groups represent clusters, pods, or replica sets — a set of components that scale together. Wrap components in [Group Name] brackets:
infra
Edge
rps 10000
-> LB
LB
-/api-> [API Cluster] | split: 70%
-/static-> StaticServer | split: 30%
[API Cluster]
instances 3
APIServer
max-rps 500
latency-ms 45
-> DB
DB
latency-ms 10
uptime 99.99%
StaticServer
cache-hit 95%
latency-ms 2Group syntax
[API Cluster]
instances 3 # group-level instance count
APIServer # component inside the group
max-rps 500
latency-ms 45
The group’s instances property acts as a multiplier on child components’ capacity. If APIServer has max-rps 500 and the group has instances 3, total capacity is 500 × 3 = 1,500 RPS.
Connecting to groups
You can connect directly to a group. Traffic is distributed to the group’s children:
LB
-> [API Cluster]
Group capacity with multiple children
When a group contains multiple components in a chain (e.g., API → DB), the group’s effective capacity is the bottleneck — the minimum capacity among its children:
[Backend Pod]
instances 3
API # max-rps 500 → 500 per instance
max-rps 500
-> Cache
Cache # max-rps 2000 → 2000 per instance
max-rps 2000
The pod’s effective capacity is 500 × 3 = 1,500 (bottlenecked on API, not Cache).
Drain-rate scaling in groups
For queues inside groups, drain-rate scales with group instances (more consumers = faster draining), but buffer does not scale (fixed capacity per queue).
Tags
Tags add metadata dimensions to components — team ownership, environment, region, or any categorization. Tags appear as colored badges and are filterable in the legend.
infra
tag Team alias t
Backend(blue)
Platform(teal) default
Data(violet)
Edge
rps 10000
-> CDN
CDN | t: Platform
cache-hit 70%
-> LB
LB | t: Platform
-> API
API | t: Backend
max-rps 2000
latency-ms 40
-> DB
DB | t: Data
latency-ms 8
uptime 99.99%Tag syntax
tag Team alias t # declare tag group, alias "t"
Backend(blue) # tag value with color
Platform(teal) default # "default" auto-applies to untagged components
Data(violet)
Then assign tags inline on component declarations using the alias:
APIServer | t: Backend # pipe syntax, using alias "t"
CDN | t: Platform
Aliases
The alias keyword provides a shorthand for inline tag assignment. tag Team alias t lets you write | t: Backend instead of | Team: Backend.
Default values
Adding default after a tag value auto-applies it to any component that doesn’t explicitly set that tag group. In the example above, any component without | t: <value> is automatically tagged as Platform.
Scenarios
Scenarios let you define alternative configurations to simulate different load conditions — peak traffic, Black Friday, cache failures, outages. Each scenario overrides specific properties on specific components:
infra
Edge
rps 10000
-> CDN
CDN
cache-hit 80%
-> API
API
instances 2
max-rps 500
latency-ms 40
scenario peak-traffic
Edge
rps 50000
API
instances 6
scenario cache-miss
CDN
cache-hit 20%Scenario syntax
scenario peak-traffic
Edge
rps 50000 # override edge RPS
API
instances 6 # scale up instances
scenario cache-miss
CDN
cache-hit 20% # simulate cache degradation
Each scenario block lists component names with indented property overrides. When a scenario is active, those properties replace the base values, and all downstream metrics recompute.
In the desktop app, scenarios appear in a dropdown — select one to see how your architecture handles that load profile. In the online editor and CLI, the base configuration renders by default.
How calculations work
DGMO doesn’t just draw boxes and arrows. It runs a full traffic simulation through your architecture graph. Here’s exactly what it computes and how.
RPS propagation
Traffic flows from the edge entry point through the graph via breadth-first traversal. At each node, behavioral properties transform the RPS before it reaches downstream components:
- Start: Edge node’s
rpsvalue is the total inbound traffic - At each component, apply behaviors in order:
- Cache:
rps = rps × (1 - cache_hit / 100) - Firewall:
rps = rps × (1 - firewall_block / 100) - Rate limiter:
rps = min(rps, ratelimit_rps) - Queue:
rps = min(rps, drain_rate × group_instances)
- Cache:
- Split: Distribute the post-behavior RPS across outbound edges by split percentage
- Accumulate: If a node receives traffic from multiple sources, RPS values are summed
Example trace:
Edge (rps 100,000)
→ CDN (cache-hit 80%) → forwards 20,000 RPS
→ WAF (firewall-block 5%) → forwards 19,000 RPS
→ LB → splits:
- /api (60%) → API receives 11,400 RPS
- /static (40%) → Static receives 7,600 RPS
Latency computation
Latency accumulates along the worst-case path from edge to each component:
- Each component adds its
latency-msvalue (ordefault-latency-ms, or 0) - If a component has multiple incoming paths, DGMO takes the maximum incoming latency (worst case)
- Queue nodes reset the latency chain — downstream latency starts from queue wait time, not from the producer’s cumulative latency
Percentile computation: DGMO computes p50, p90, and p99 latency by collecting all leaf-to-edge paths, weighting each by its traffic proportion:
- For normal components: one path per leaf, with cumulative latency
- For serverless with cold starts: the path splits into a 95% warm path and a 5% cold path (warm =
duration-ms, cold =duration-ms + cold-start-ms) - Paths are sorted by latency and weighted by traffic volume
- p50/p90/p99 are interpolated from the cumulative weight distribution
This means cold starts primarily show up in p99, and high-traffic paths have more weight in overall percentiles — matching real-world latency distributions.
Availability computation
Availability is computed in two layers:
1. Uptime propagation (path-based): The product of all uptime values along the path from edge to each node. This represents the probability that all components in the chain are operational:
path_uptime = ∏(component_uptime / 100) for each component in the path
If multiple paths converge, DGMO takes the minimum (most conservative).
2. Local availability (load-dependent): Each component’s local availability depends on its current load relative to capacity:
- Normal (under capacity):
local_availability = 1.0 - Overloaded (over capacity):
local_availability = capacity / inbound_rps- A component with 500 capacity receiving 1,000 RPS has 50% local availability
- Rate-limited:
local_availability = ratelimit_rps / effective_inbound_rps - Queue overflow risk: If the queue fills within 60 seconds, availability degrades proportionally to
drain_rate / inbound_rps
3. Compound availability: The product of all local availabilities along the path from edge:
compound_availability = ∏(local_availability) for each node in the path
Queue decoupling: Queues reset the availability chain. The consumer side doesn’t inherit the producer’s overload — it only sees the queue’s own availability.
Circuit breaker logic
Circuit breakers have three states:
| State | Condition | Effect |
|---|---|---|
| Closed | Error rate below threshold, latency below threshold | Normal operation |
| Open | Error rate ≥ cb-error-threshold, OR cumulative latency > cb-latency-threshold-ms | Component is tripped — shown as dashed border |
| Half-open | (not currently modeled — DGMO uses closed/open) | — |
Error rate derivation:
capacity = serverless ? (concurrency / duration_s) : (max_rps × instances × group_mul)
error_rate = max(0, (computed_rps - capacity) / computed_rps × 100)
The circuit breaker trips when the overload-derived error rate exceeds the threshold. This means a component at 2× capacity with cb-error-threshold 50% will trip (error rate = 50%).
Queue metrics
Queues compute three additional metrics:
| Metric | Formula | Meaning |
|---|---|---|
| Fill rate | max(0, inbound_rps - drain_rate) | How fast the buffer fills (msg/s) |
| Time to overflow | buffer / fill_rate (if fill_rate > 0) | Seconds until queue is full |
| Wait time | (fill_rate / drain_rate) × 1000 | Milliseconds a message waits in queue |
If fill_rate is 0 (drain keeps up), time to overflow is infinite and wait time is 0. The queue is healthy.
If time_to_overflow < 60 seconds, DGMO marks the queue as at risk and degrades its availability score.
Percentile computation
DGMO computes p50, p90, and p99 for both latency and availability using weighted path distributions:
- Collect all paths from edge to leaf nodes via depth-first traversal
- Each path carries a weight proportional to its traffic volume (from RPS splits)
- Sort paths by the metric (latency ascending, availability ascending)
- Walk the sorted list, accumulating weights until the target percentile threshold:
- p50: cumulative weight reaches 50%
- p90: cumulative weight reaches 90%
- p99: cumulative weight reaches 99%
- Interpolate the metric value at that threshold
This is computed both system-wide (from the edge node) and per-node (from each individual component’s downstream paths).
Diagram options
Set global defaults at the top of your diagram:
infra My System
direction-tb
default-latency-ms 10
default-uptime 99.9
no-animate
| Option | Default | Description |
|---|---|---|
direction-tb | off (LR is default) | Layout direction: omit for left-to-right, add for top-to-bottom |
default-latency-ms N | 0 | Latency applied to components without an explicit latency-ms |
default-uptime N | 100 | Uptime percentage applied to components without explicit uptime |
animate / no-animate | animate | Flow animation particles |
Validation
DGMO validates your diagram and reports diagnostics for common issues:
| Check | Type | What it catches |
|---|---|---|
| Cycle detection | Error | Circular connections (A → B → A). Infra diagrams must be DAGs. |
| Split sum | Warning | Split percentages that don’t add up to 100%. |
| Orphan detection | Warning | Components not reachable from the edge entry point. |
| Overload | Warning | Components receiving more RPS than their capacity. |
| Rate-limit excess | Warning | Inbound RPS exceeding the rate limiter threshold. |
| System uptime | Warning | Overall system uptime below 99% SLA threshold. |
| Property conflicts | Warning | Mixing incompatible properties (e.g., concurrency with instances). |
Property reference
All component properties in one table:
| Property | Type | Valid on | Behavior |
|---|---|---|---|
rps | number | Edge only | Total inbound requests per second |
cache-hit | percentage | Any | Fraction of traffic served from cache, not forwarded |
firewall-block | percentage | Any | Fraction of traffic dropped (blocked) |
ratelimit-rps | number | Any | Maximum RPS forwarded; excess rejected |
max-rps | number | Non-queue | Per-instance maximum throughput |
instances | number or range | Non-serverless | Replica count (e.g., 3 or 1-8) |
latency-ms | number | Any | Per-component response time in milliseconds |
uptime | percentage | Any | Component reliability (e.g., 99.99%) |
cb-error-threshold | percentage | Any | Circuit breaker trips when error rate exceeds this |
cb-latency-threshold-ms | number | Any | Circuit breaker trips when cumulative latency exceeds this |
concurrency | number | Serverless | Maximum concurrent executions |
duration-ms | number | Serverless | Average execution time per invocation |
cold-start-ms | number | Serverless | Additional latency on cold invocations |
buffer | number | Queue | Maximum queue depth (messages) |
drain-rate | number | Queue | Messages consumed per second |
retention-hours | number | Queue | Message retention duration (informational) |
partitions | number | Queue | Number of partitions (informational) |
Mutual exclusions:
concurrencycannot be combined withinstancesormax-rps(serverless vs. traditional)buffercannot be combined withmax-rps(queue vs. request/response)
Putting it all together
Here’s a complete production-grade example combining caching, firewall, rate limiting, load balancing, pod groups, dynamic scaling, and team tags:
infra E-Commerce Platform
tag Team alias t
Backend(blue)
Platform(teal) default
Data(violet)
Edge
rps 100000
-> CloudFront
CloudFront | t: Platform
cache-hit 80%
-> WAF
WAF | t: Platform
firewall-block 5%
-> ALB
ALB | t: Platform
-/api-> [API Pods] | split: 60%
-/purchase-> [Commerce Pods] | split: 30%
-/static-> StaticServer | split: 10%
[API Pods]
instances 3
APIServer | t: Backend
max-rps 500
latency-ms 45
cb-error-threshold 50%
[Commerce Pods]
PurchaseMS | t: Backend
instances 1-8
max-rps 300
latency-ms 120
StaticServer | t: Platform
latency-ms 5This diagram models:
- 100K RPS at the edge, reduced to 20K after CDN caching, then 19K after WAF filtering
- Three traffic paths through the ALB: API (60%), Commerce (30%), Static (10%)
- API Pods with 3 instances at 500 RPS each = 1,500 total capacity
- Commerce Pods with dynamic scaling from 1-8 instances
- Team ownership via the
ttag, visualized in the legend
Every computed metric — downstream RPS, latency percentiles, availability, overload detection — updates based on these declarations.
Try it yourself
- Online Editor — select “Infrastructure” from the sidebar to start with a template
- CLI — render from the terminal:
dgmo diagram.dgmo -o infra.png - Desktop app — full editor with live preview, scenario switching, and click-to-source navigation