Back to Blog

Infrastructure Diagrams: Complete Reference

infrastructurereferencediagrams-as-codetraffic-simulation

Infrastructure diagrams model your system topology as a traffic flow graph. You declare components, wire them together, set capacity and behavioral properties, and DGMO computes the rest — RPS distribution, latency percentiles, availability, circuit breaker states, queue overflow risk, and more.

Unlike static architecture diagrams, infra diagrams are live simulations. Change the entry RPS or flip a scenario, and every downstream metric updates instantly.

Quick start

The simplest useful infra diagram: an edge entry point, a CDN with caching, and an API server.

Minimal infrastructure diagram
dgmo
infra

Edge
  rps 1000
  -> CDN

CDN
  cache-hit 60%
  -> API

API
  max-rps 500
  latency-ms 30
EdgeRPS: 1.0kp90: 30msCDNRPS: 1.0kp90: 30msAPIRPS: 400 / 500p90: 30ms

Three components, three properties. DGMO computes:

  • CDN receives 1,000 RPS, serves 60% from cache, forwards 400 RPS downstream
  • API receives 400 RPS against a 500 RPS capacity — headroom is visible
  • Latency accumulates: CDN → API, so end-to-end latency includes both hops

Table of contents

Core concepts

Component properties

Organization

  • Groups — clusters, pods, and replica sets
  • Tags — team ownership and categorization
  • Scenarios — simulate different load conditions

How calculations work

Reference


Entry point

Every infra diagram needs exactly one edge entry point — the source of all inbound traffic. Name a component Edge or Internet and give it an rps property:

Edge entry point with 50K RPS
dgmo
infra

Edge
  rps 50000
  -> Gateway
EdgeRPS: 50.0kGatewayRPS: 50.0k

The rps property is only valid on the edge node. It represents total inbound requests per second entering your system. All downstream RPS values are computed from this single number.

Components

A component is any named node in your architecture — a server, database, cache, queue, or service. Write the component name on its own line, then indent properties below it:

APIServer
  max-rps 500
  latency-ms 30
  uptime 99.95%

Component names must start with a letter or underscore and can contain letters, numbers, and underscores. You don’t declare a component’s “type” — its role is inferred from its properties. A component with cache-hit is a cache. One with buffer is a queue. One with concurrency is serverless.

Connections

Connect components with arrow syntax. A bare -> sends all traffic; a labeled arrow -label-> adds a route annotation:

Simple connections
dgmo
infra

Edge
  rps 10000
  -> LB

LB
  -/api-> APIServer
  -/web-> WebServer
EdgeRPS: 10.0kLBRPS: 10.0kAPIServerRPS: 5.0kWebServerRPS: 5.0k/api/web

Connections define the directed acyclic graph (DAG) that traffic flows through. Cycles are not allowed — DGMO validates this and reports an error if it detects a loop.

Connection syntax

-> Target                           # unlabeled connection
-/api-> Target                      # labeled connection
-/api-> Target | split: 60%         # labeled with explicit split
-> [Group Name]                     # connect to a group
-route-> [Group Name] | split: 40%  # labeled connection to group with split

Traffic splits

When a component has multiple outbound connections, traffic is distributed across them. You can declare explicit percentages or let DGMO distribute evenly:

Explicit 70/30 traffic split
dgmo
infra

Edge
  rps 10000
  -> LB

LB
  -/api-> APIServer | split: 70%
  -/static-> CDN | split: 30%

APIServer
  max-rps 800
  latency-ms 40

CDN
  cache-hit 90%
  latency-ms 5
EdgeRPS: 10.0kp90: 40msavailability: 25.2%LBRPS: 10.0kp90: 40msavailability: 25.2%APIServerRPS: 7.0k / 800p90: 40msavailability: 11.4%CDNRPS: 3.0kp90: 5ms/api/static

Split rules:

  • All declared — percentages must sum to 100% (DGMO warns if they don’t)
  • None declared — traffic splits evenly (2 targets = 50/50, 3 targets = 33/33/34)
  • Some declared, some not — undeclared targets share the remainder equally
LB
  -/api-> API | split: 60%     # 60% of LB output
  -/web-> Web | split: 30%     # 30% of LB output
  -/health-> Health            # gets remaining 10%

Component properties

Properties define what a component does to traffic passing through it. Each property maps to a specific behavior in the traffic simulation.

Cache

Property: cache-hit <percentage>

A cache layer absorbs a fraction of inbound traffic before it reaches downstream components. The cache-hit percentage is the fraction of requests served directly from cache.

CDN absorbing 80% of traffic
dgmo
infra

Edge
  rps 100000
  -> CDN

CDN
  cache-hit 80%
  -> AppServer

AppServer
  max-rps 5000
  latency-ms 50
EdgeRPS: 100.0kp90: 50msavailability: 25.0%CDNRPS: 100.0kp90: 50msavailability: 25.0%AppServerRPS: 20.0k / 5.0kp90: 50msavailability: 25.0%

How it works: If a component receives 100,000 RPS with cache-hit 80%, only 20,000 RPS flow downstream. The remaining 80,000 are served from cache and never reach backend services.

Computed effect on downstream RPS:

downstream_rps = inbound_rps × (1 - cache_hit / 100)

Firewall

Property: firewall-block <percentage>

A firewall or WAF drops a percentage of inbound traffic (malicious requests, bot traffic, blocked IPs). Blocked traffic is removed from the flow entirely.

WAF + rate limiter + API chain
dgmo
infra

Edge
  rps 50000
  -> WAF

WAF
  firewall-block 8%
  -> Gateway

Gateway
  ratelimit-rps 10000
  -> API

API
  max-rps 5000
  latency-ms 45
EdgeRPS: 50.0kp90: 45msavailability: 10.9%WAFRPS: 50.0kp90: 45msavailability: 10.9%GatewayRPS: 46.0k / 10.0kp90: 45msavailability: 10.9%APIRPS: 10.0k / 5.0kp90: 45msavailability: 50.0%

Computed effect on downstream RPS:

downstream_rps = inbound_rps × (1 - firewall_block / 100)

Cache and firewall effects compose multiplicatively. If traffic passes through a cache (cache-hit 80%) then a firewall (firewall-block 5%), only 20% × 95% = 19% of original traffic reaches downstream.

Rate limiting

Property: ratelimit-rps <number>

A rate limiter caps throughput at a fixed RPS threshold. Excess traffic is rejected.

Gateway
  ratelimit-rps 10000
  -> API

Computed effect:

downstream_rps = min(effective_inbound_rps, ratelimit_rps)

Where effective_inbound_rps is the RPS after cache and firewall reductions. If 15,000 RPS arrive after cache/firewall and ratelimit-rps is 10,000, only 10,000 flow downstream and 5,000 are rejected.

Rate limiting also affects availability — rejected traffic reduces the availability score proportionally.

Capacity

Properties: max-rps <number>, instances <number>

These define a component’s throughput capacity. max-rps is the per-instance maximum. instances multiplies it:

3 instances × 400 max-rps = 1,200 total capacity
dgmo
infra

Edge
  rps 3000
  -> LB

LB
  -> API

API
  instances 3
  max-rps 400
  latency-ms 30
EdgeRPS: 3.0kp90: 30msavailability: 40.0%LBRPS: 3.0kp90: 30msavailability: 40.0%APIRPS: 3.0k / 1.2kp90: 30msavailability: 40.0%3x

Total capacity formula:

total_capacity = max_rps × instances

When computed RPS exceeds total capacity, the component is overloaded. DGMO flags this visually (red indicators) and in diagnostics. Overload also reduces availability.

If instances is omitted, it defaults to 1. If max-rps is omitted, the component has unlimited capacity.

Dynamic scaling

Property: instances <min>-<max> (range syntax)

When you specify a range like instances 1-8, DGMO computes the number of instances needed to handle current load:

Auto-scaling from 1 to 8 instances
dgmo
infra

Edge
  rps 5000
  -> LB

LB
  -> API

API
  instances 1-8
  max-rps 300
  latency-ms 25
EdgeRPS: 5.0kp90: 25msavailability: 48.0%LBRPS: 5.0kp90: 25msavailability: 48.0%APIRPS: 5.0k / 2.4kp90: 25msavailability: 48.0%8x

Scaling formula:

needed = ceil(computed_rps / max_rps)
actual  = clamp(needed, min, max)

If the API receives 5,000 RPS with max-rps 300 and instances 1-8:

  • needed = ceil(5000 / 300) = 17
  • actual = clamp(17, 1, 8) = 8
  • Total capacity = 300 × 8 = 2,400 — still overloaded at 5,000 RPS

This lets you model auto-scaling behavior realistically, including cases where scaling maxes out.

Latency

Property: latency-ms <number>

Per-component response time in milliseconds. Latency accumulates along the path from edge to leaf:

CDN
  latency-ms 5
  -> API

API
  latency-ms 40
  -> DB

DB
  latency-ms 8

A request traversing CDN → API → DB has cumulative latency of 5 + 40 + 8 = 53ms.

If omitted, a component contributes 0ms latency (or the default-latency-ms value if set — see diagram options).

Uptime

Property: uptime <percentage>

Component reliability as a percentage. Uptime propagates along paths — the end-to-end uptime of a chain is the product of individual uptimes:

Uptime cascading through the chain
dgmo
infra
default-uptime 99.9

Edge
  rps 1000
  -> API

API
  max-rps 2000
  latency-ms 30
  uptime 99.95%
  -> DB

DB
  latency-ms 5
  uptime 99.99%
EdgeRPS: 1.0kp90: 35msavailability: 99.94%APIRPS: 1.0k / 2.0kp90: 35msavailability: 99.94%DBRPS: 1.0kp90: 5mseff. uptime: 99.94%availability: 99.99%
end_to_end_uptime = 99.9% × 99.95% × 99.99% ≈ 99.84%

If omitted, a component’s uptime defaults to 100% (or default-uptime if set globally). Uptime feeds into the availability computation as the baseline before load-dependent degradation.

Circuit breakers

Properties: cb-error-threshold <percentage>, cb-latency-threshold-ms <number>

Circuit breakers protect downstream services by tripping when failure conditions are met. DGMO models three states: closed (normal), open (tripped), and half-open (recovering).

Circuit breaker on overloaded API
dgmo
infra

Edge
  rps 5000
  -> Gateway

Gateway
  -> API

API
  max-rps 300
  instances 2
  latency-ms 40
  cb-error-threshold 50%
EdgeRPS: 5.0kp90: 40msavailability: 12.0%GatewayRPS: 5.0kp90: 40msavailability: 12.0%APIRPS: 5.0k / 600p90: 40msavailability: 12.0%CB: OPEN2x

Error-rate trigger:

error_rate = (computed_rps - capacity) / computed_rps × 100
if error_rate ≥ cb_error_threshold → state = 'open'

The error rate is derived from overload — if a component receives more RPS than its capacity, the excess is treated as errors. When the error rate exceeds the threshold, the circuit breaker opens.

Latency trigger:

if cumulative_latency_ms > cb_latency_threshold_ms → state = 'open'

If the total latency accumulated up to this component exceeds the threshold, the breaker trips. This models timeout-based circuit breakers.

You can combine both triggers on the same component — the breaker opens if either condition is met.

Serverless

Properties: concurrency <number>, duration-ms <number>, cold-start-ms <number>

Serverless components (Lambda, Cloud Functions) use a different capacity model. Instead of instances × max-rps, capacity is derived from concurrency and execution duration:

Serverless function with cold starts
dgmo
infra

Edge
  rps 2000
  -> Gateway

Gateway
  -> ProcessOrder

ProcessOrder
  concurrency 1000
  duration-ms 200
  cold-start-ms 800
EdgeRPS: 2.0kp90: 200msGatewayRPS: 2.0kp90: 200msProcessOrderRPS: 2.0k / 5.0kinstances: 400 / 1kp90: 200ms

Capacity formula:

capacity_rps = concurrency / (duration_ms / 1000)

With concurrency 1000 and duration-ms 200:

capacity = 1000 / 0.2 = 5,000 RPS

Cold starts: When cold-start-ms is set, DGMO splits traffic into two paths for percentile computation:

  • 95% warm path — latency = duration-ms
  • 5% cold path — latency = duration-ms + cold-start-ms

This means cold starts primarily affect p99 latency, which matches real-world behavior. A function with duration-ms 200 and cold-start-ms 800 has p50 latency of ~200ms but p99 of ~1,000ms.

Important: concurrency is mutually exclusive with instances and max-rps. A component is either serverless (concurrency-based) or traditional (instance-based). DGMO warns if you mix them.

Queues

Properties: buffer <number>, drain-rate <number>, retention-hours <number>, partitions <number>

Queues decouple producers from consumers. They have fundamentally different behavior from request/response components — they absorb traffic bursts and reset latency boundaries.

Queue decoupling API from workers
dgmo
infra

Edge
  rps 5000
  -> API

API
  max-rps 6000
  latency-ms 20
  -> OrderQueue

OrderQueue
  buffer 50000
  drain-rate 1000
  retention-hours 72
  -> Worker

Worker
  instances 3
  max-rps 400
  latency-ms 100
EdgeRPS: 5.0kp90: 4.1savailability: 20.0%APIRPS: 5.0k / 6.0kp90: 4.1savailability: 20.0%OrderQueueRPS: 5.0kp90: 4.1savailability: 20.0%lag: 4k msg/soverflow: ~13sWorkerRPS: 1.0k / 1.2kp90: 100ms3x

Key properties:

PropertyWhat it does
bufferMaximum queue depth (messages). Determines overflow risk.
drain-rateMessages consumed per second. Downstream RPS is capped at this rate.
retention-hoursHow long messages are retained. Informational, shown in the node card.
partitionsNumber of partitions. Informational, shown in the node card.

How queues transform traffic:

  1. RPS capping — Downstream components receive at most drain-rate RPS, regardless of how much traffic the queue receives. If 5,000 RPS arrive but drain-rate is 1,000, only 1,000 RPS flow to workers.

  2. Overflow computation — When inbound RPS exceeds drain rate, the queue fills:

    fill_rate     = max(0, inbound_rps - drain_rate)
    time_to_overflow = buffer / fill_rate   (in seconds)

    If buffer 50000 and fill_rate 4000, the queue overflows in 12.5 seconds.

  3. Latency boundary — Queues reset the cumulative latency chain. Downstream components don’t inherit the producer’s latency. Instead, queue wait time becomes the new baseline:

    wait_time_ms = (fill_rate / drain_rate) × 1000
  4. Availability decoupling — The producer side and consumer side have independent availability. A queue absorbs producer-side overload without propagating it downstream.

Important: buffer is mutually exclusive with max-rps. A component is either a queue or a standard service. DGMO warns if you mix them.


Groups

Groups represent clusters, pods, or replica sets — a set of components that scale together. Wrap components in [Group Name] brackets:

API cluster with 3 instances
dgmo
infra

Edge
  rps 10000
  -> LB

LB
  -/api-> [API Cluster] | split: 70%
  -/static-> StaticServer | split: 30%

[API Cluster]
  instances 3
  APIServer
    max-rps 500
    latency-ms 45
    -> DB

DB
  latency-ms 10
  uptime 99.99%

StaticServer
  cache-hit 95%
  latency-ms 2
API Cluster3xEdgeRPS: 10.0kp90: 55msavailability: 33.6%LBRPS: 10.0kp90: 55msavailability: 33.6%APIServerRPS: 7.0k / 1.5kp90: 55msavailability: 21.4%DBRPS: 7.0kp90: 10msavailability: 99.99%StaticServerRPS: 3.0kp90: 2ms/static

Group syntax

[API Cluster]
  instances 3          # group-level instance count
  APIServer            # component inside the group
    max-rps 500
    latency-ms 45

The group’s instances property acts as a multiplier on child components’ capacity. If APIServer has max-rps 500 and the group has instances 3, total capacity is 500 × 3 = 1,500 RPS.

Connecting to groups

You can connect directly to a group. Traffic is distributed to the group’s children:

LB
  -> [API Cluster]

Group capacity with multiple children

When a group contains multiple components in a chain (e.g., API → DB), the group’s effective capacity is the bottleneck — the minimum capacity among its children:

[Backend Pod]
  instances 3
  API               # max-rps 500 → 500 per instance
    max-rps 500
    -> Cache
  Cache              # max-rps 2000 → 2000 per instance
    max-rps 2000

The pod’s effective capacity is 500 × 3 = 1,500 (bottlenecked on API, not Cache).

Drain-rate scaling in groups

For queues inside groups, drain-rate scales with group instances (more consumers = faster draining), but buffer does not scale (fixed capacity per queue).


Tags

Tags add metadata dimensions to components — team ownership, environment, region, or any categorization. Tags appear as colored badges and are filterable in the legend.

Team ownership tags
dgmo
infra

tag Team alias t
  Backend(blue)
  Platform(teal) default
  Data(violet)

Edge
  rps 10000
  -> CDN

CDN | t: Platform
  cache-hit 70%
  -> LB

LB | t: Platform
  -> API

API | t: Backend
  max-rps 2000
  latency-ms 40
  -> DB

DB | t: Data
  latency-ms 8
  uptime 99.99%

Tag syntax

tag Team alias t             # declare tag group, alias "t"
  Backend(blue)             # tag value with color
  Platform(teal) default    # "default" auto-applies to untagged components
  Data(violet)

Then assign tags inline on component declarations using the alias:

APIServer | t: Backend      # pipe syntax, using alias "t"
CDN | t: Platform

Aliases

The alias keyword provides a shorthand for inline tag assignment. tag Team alias t lets you write | t: Backend instead of | Team: Backend.

Default values

Adding default after a tag value auto-applies it to any component that doesn’t explicitly set that tag group. In the example above, any component without | t: <value> is automatically tagged as Platform.


Scenarios

Scenarios let you define alternative configurations to simulate different load conditions — peak traffic, Black Friday, cache failures, outages. Each scenario overrides specific properties on specific components:

Diagram with peak-traffic and cache-miss scenarios
dgmo
infra

Edge
  rps 10000
  -> CDN

CDN
  cache-hit 80%
  -> API

API
  instances 2
  max-rps 500
  latency-ms 40

scenario peak-traffic
  Edge
    rps 50000
  API
    instances 6

scenario cache-miss
  CDN
    cache-hit 20%
EdgeRPS: 10.0kp90: 40msavailability: 50.0%CDNRPS: 10.0kp90: 40msavailability: 50.0%APIRPS: 2.0k / 1.0kp90: 40msavailability: 50.0%2xscenario6x

Scenario syntax

scenario peak-traffic
  Edge
    rps 50000             # override edge RPS
  API
    instances 6           # scale up instances

scenario cache-miss
  CDN
    cache-hit 20%         # simulate cache degradation

Each scenario block lists component names with indented property overrides. When a scenario is active, those properties replace the base values, and all downstream metrics recompute.

In the desktop app, scenarios appear in a dropdown — select one to see how your architecture handles that load profile. In the online editor and CLI, the base configuration renders by default.


How calculations work

DGMO doesn’t just draw boxes and arrows. It runs a full traffic simulation through your architecture graph. Here’s exactly what it computes and how.

RPS propagation

Traffic flows from the edge entry point through the graph via breadth-first traversal. At each node, behavioral properties transform the RPS before it reaches downstream components:

  1. Start: Edge node’s rps value is the total inbound traffic
  2. At each component, apply behaviors in order:
    • Cache: rps = rps × (1 - cache_hit / 100)
    • Firewall: rps = rps × (1 - firewall_block / 100)
    • Rate limiter: rps = min(rps, ratelimit_rps)
    • Queue: rps = min(rps, drain_rate × group_instances)
  3. Split: Distribute the post-behavior RPS across outbound edges by split percentage
  4. Accumulate: If a node receives traffic from multiple sources, RPS values are summed

Example trace:

Edge (rps 100,000)
  → CDN (cache-hit 80%) → forwards 20,000 RPS
    → WAF (firewall-block 5%) → forwards 19,000 RPS
      → LB → splits:
        - /api (60%) → API receives 11,400 RPS
        - /static (40%) → Static receives 7,600 RPS

Latency computation

Latency accumulates along the worst-case path from edge to each component:

  1. Each component adds its latency-ms value (or default-latency-ms, or 0)
  2. If a component has multiple incoming paths, DGMO takes the maximum incoming latency (worst case)
  3. Queue nodes reset the latency chain — downstream latency starts from queue wait time, not from the producer’s cumulative latency

Percentile computation: DGMO computes p50, p90, and p99 latency by collecting all leaf-to-edge paths, weighting each by its traffic proportion:

  • For normal components: one path per leaf, with cumulative latency
  • For serverless with cold starts: the path splits into a 95% warm path and a 5% cold path (warm = duration-ms, cold = duration-ms + cold-start-ms)
  • Paths are sorted by latency and weighted by traffic volume
  • p50/p90/p99 are interpolated from the cumulative weight distribution

This means cold starts primarily show up in p99, and high-traffic paths have more weight in overall percentiles — matching real-world latency distributions.

Availability computation

Availability is computed in two layers:

1. Uptime propagation (path-based): The product of all uptime values along the path from edge to each node. This represents the probability that all components in the chain are operational:

path_uptime = ∏(component_uptime / 100) for each component in the path

If multiple paths converge, DGMO takes the minimum (most conservative).

2. Local availability (load-dependent): Each component’s local availability depends on its current load relative to capacity:

  • Normal (under capacity): local_availability = 1.0
  • Overloaded (over capacity): local_availability = capacity / inbound_rps
    • A component with 500 capacity receiving 1,000 RPS has 50% local availability
  • Rate-limited: local_availability = ratelimit_rps / effective_inbound_rps
  • Queue overflow risk: If the queue fills within 60 seconds, availability degrades proportionally to drain_rate / inbound_rps

3. Compound availability: The product of all local availabilities along the path from edge:

compound_availability = ∏(local_availability) for each node in the path

Queue decoupling: Queues reset the availability chain. The consumer side doesn’t inherit the producer’s overload — it only sees the queue’s own availability.

Circuit breaker logic

Circuit breakers have three states:

StateConditionEffect
ClosedError rate below threshold, latency below thresholdNormal operation
OpenError rate ≥ cb-error-threshold, OR cumulative latency > cb-latency-threshold-msComponent is tripped — shown as dashed border
Half-open(not currently modeled — DGMO uses closed/open)

Error rate derivation:

capacity = serverless ? (concurrency / duration_s) : (max_rps × instances × group_mul)
error_rate = max(0, (computed_rps - capacity) / computed_rps × 100)

The circuit breaker trips when the overload-derived error rate exceeds the threshold. This means a component at 2× capacity with cb-error-threshold 50% will trip (error rate = 50%).

Queue metrics

Queues compute three additional metrics:

MetricFormulaMeaning
Fill ratemax(0, inbound_rps - drain_rate)How fast the buffer fills (msg/s)
Time to overflowbuffer / fill_rate (if fill_rate > 0)Seconds until queue is full
Wait time(fill_rate / drain_rate) × 1000Milliseconds a message waits in queue

If fill_rate is 0 (drain keeps up), time to overflow is infinite and wait time is 0. The queue is healthy.

If time_to_overflow < 60 seconds, DGMO marks the queue as at risk and degrades its availability score.

Percentile computation

DGMO computes p50, p90, and p99 for both latency and availability using weighted path distributions:

  1. Collect all paths from edge to leaf nodes via depth-first traversal
  2. Each path carries a weight proportional to its traffic volume (from RPS splits)
  3. Sort paths by the metric (latency ascending, availability ascending)
  4. Walk the sorted list, accumulating weights until the target percentile threshold:
    • p50: cumulative weight reaches 50%
    • p90: cumulative weight reaches 90%
    • p99: cumulative weight reaches 99%
  5. Interpolate the metric value at that threshold

This is computed both system-wide (from the edge node) and per-node (from each individual component’s downstream paths).


Diagram options

Set global defaults at the top of your diagram:

infra My System
direction-tb
default-latency-ms 10
default-uptime 99.9
no-animate
OptionDefaultDescription
direction-tboff (LR is default)Layout direction: omit for left-to-right, add for top-to-bottom
default-latency-ms N0Latency applied to components without an explicit latency-ms
default-uptime N100Uptime percentage applied to components without explicit uptime
animate / no-animateanimateFlow animation particles

Validation

DGMO validates your diagram and reports diagnostics for common issues:

CheckTypeWhat it catches
Cycle detectionErrorCircular connections (A → B → A). Infra diagrams must be DAGs.
Split sumWarningSplit percentages that don’t add up to 100%.
Orphan detectionWarningComponents not reachable from the edge entry point.
OverloadWarningComponents receiving more RPS than their capacity.
Rate-limit excessWarningInbound RPS exceeding the rate limiter threshold.
System uptimeWarningOverall system uptime below 99% SLA threshold.
Property conflictsWarningMixing incompatible properties (e.g., concurrency with instances).

Property reference

All component properties in one table:

PropertyTypeValid onBehavior
rpsnumberEdge onlyTotal inbound requests per second
cache-hitpercentageAnyFraction of traffic served from cache, not forwarded
firewall-blockpercentageAnyFraction of traffic dropped (blocked)
ratelimit-rpsnumberAnyMaximum RPS forwarded; excess rejected
max-rpsnumberNon-queuePer-instance maximum throughput
instancesnumber or rangeNon-serverlessReplica count (e.g., 3 or 1-8)
latency-msnumberAnyPer-component response time in milliseconds
uptimepercentageAnyComponent reliability (e.g., 99.99%)
cb-error-thresholdpercentageAnyCircuit breaker trips when error rate exceeds this
cb-latency-threshold-msnumberAnyCircuit breaker trips when cumulative latency exceeds this
concurrencynumberServerlessMaximum concurrent executions
duration-msnumberServerlessAverage execution time per invocation
cold-start-msnumberServerlessAdditional latency on cold invocations
buffernumberQueueMaximum queue depth (messages)
drain-ratenumberQueueMessages consumed per second
retention-hoursnumberQueueMessage retention duration (informational)
partitionsnumberQueueNumber of partitions (informational)

Mutual exclusions:

  • concurrency cannot be combined with instances or max-rps (serverless vs. traditional)
  • buffer cannot be combined with max-rps (queue vs. request/response)

Putting it all together

Here’s a complete production-grade example combining caching, firewall, rate limiting, load balancing, pod groups, dynamic scaling, and team tags:

Full e-commerce infrastructure
dgmo
infra E-Commerce Platform

tag Team alias t
  Backend(blue)
  Platform(teal) default
  Data(violet)

Edge
  rps 100000
  -> CloudFront

CloudFront | t: Platform
  cache-hit 80%
  -> WAF

WAF | t: Platform
  firewall-block 5%
  -> ALB

ALB | t: Platform
  -/api-> [API Pods] | split: 60%
  -/purchase-> [Commerce Pods] | split: 30%
  -/static-> StaticServer | split: 10%

[API Pods]
  instances 3
  APIServer | t: Backend
    max-rps 500
    latency-ms 45
    cb-error-threshold 50%

[Commerce Pods]
  PurchaseMS | t: Backend
    instances 1-8
    max-rps 300
    latency-ms 120

StaticServer | t: Platform
  latency-ms 5

This diagram models:

  • 100K RPS at the edge, reduced to 20K after CDN caching, then 19K after WAF filtering
  • Three traffic paths through the ALB: API (60%), Commerce (30%), Static (10%)
  • API Pods with 3 instances at 500 RPS each = 1,500 total capacity
  • Commerce Pods with dynamic scaling from 1-8 instances
  • Team ownership via the t tag, visualized in the legend

Every computed metric — downstream RPS, latency percentiles, availability, overload detection — updates based on these declarations.

Try it yourself

  1. Online Editor — select “Infrastructure” from the sidebar to start with a template
  2. CLI — render from the terminal: dgmo diagram.dgmo -o infra.png
  3. Desktop app — full editor with live preview, scenario switching, and click-to-source navigation