Observability Dashboard

Introduction

The Observability Dashboard provides a centralized view of sync health across your Cinchy environment. It surfaces health scores, failure patterns, execution trends, and real-time listener metrics so you can identify issues quickly without parsing logs manually.

Open the Dashboard from Profile > Observability Dashboard or by visiting /observability. Access is permission-based. See Access and permissions.

Platform compatibility

Unless noted otherwise, all features work on both deployment types and on both SQL Server and PostgreSQL databases. Platform-specific features are tagged as [IIS only], [K8s only], or [SQL Server only] throughout this page.

Access and permissions

Access to the Observability Dashboard is permission-based. You can open it from Profile > Observability Dashboard or by visiting /observability if you have View All Columns access to all nine tables the dashboard depends on:

Observability tables: Observability Sync Groups, Observability Sync Group Members, Observability SLA Definitions, Observability Health Snapshots, Observability Baseline Metrics
Source tables: Data Sync Configurations, Execution Log, Listener Config, Event Listener State

Cinchy Administrators have this access by default, so the dashboard remains available to them. To grant a non-administrator access, give them View All Columns on all nine tables, either directly or through a User Group. Partial or single-column access is not sufficient: the dashboard's queries read across all columns of these tables, so the page denies access unless every table grants full column view.

Granting dashboard access does not let a user change its configuration. Creating sync groups, assigning members, and defining SLAs still requires the usual add/edit entitlements on the Observability configuration tables.

Changed in v5.21

In v5.20 the dashboard was restricted to the Cinchy Administrators User Group and served from /admin/observability. As of v5.21 access is permission-based (as described above) and the dashboard is served from /observability.

How it works

Cinchy reads execution and listener metrics already captured in the platform, rolls them into baselines and health snapshots during a scheduled maintenance pass, and renders the result as scores, trends, and alerts. The dashboard does not compute scores live — it reads the most recent snapshot.

Key capabilities

Capability	Description
Health scores	A 0-100 color-coded score for each sync group, computed from recent execution history.
Correlated failure detection	Identifies clusters of syncs that failed around the same time, indicating a shared root cause.
Execution time trending	Per-sync timing with a 30-day rolling baseline for comparison.
Top error summary	Errors ranked by impact across all monitored syncs.
Listener traffic monitoring	Throughput, queue depth, and batch success rates for real-time listeners.
Stale listener detection	Alerts when a listener has not processed messages within its expected interval.
Data freshness and SLA tracking	Per-table staleness compared to configured thresholds.
Queue depth forecasting	Estimated drain time based on current throughput.
Service Broker queue health	Poison message detection, activation status, and transmission backlog. `[IIS only]` `[SQL Server only]`
Kafka consumer lag	Committed vs. end offset and total unprocessed messages per listener. `[K8s only]`
Post-outage recovery status	Shows which syncs have recovered and which remain degraded.

Getting started

The dashboard displays data only after you create sync groups and assign syncs. Follow these steps to set up the dashboard for the first time.

1. Create sync groups

Open the Observability Sync Groups table in Cinchy and create one or more groups. Each group represents a logical collection of related syncs (for example, "Claims Processing" or "Billing"). Use the Parent Group column to create subgroup hierarchies.

2. Assign syncs to groups

In the Observability Sync Group Members table, link each batch sync (via Data Sync Config) or real-time listener (via Listener Config) to a group. Set Priority to one of the following:

Priority	Description
Critical	Failures in Critical syncs are penalized by a dedicated 10% component of the health score and surface in alerts. They also count toward the overall success rate (60% of the score) like any other failure.
Urgent	Important syncs that should be investigated promptly. Failures count toward the overall success rate but do not trigger the Critical-score penalty.
Low Priority	Syncs where occasional failures are acceptable. Failures still count toward the overall success rate but are not otherwise weighted.

info

Only syncs assigned to a group appear on the dashboard. Unassigned syncs are not visible.

3. Define SLA thresholds (optional)

In the Observability SLA Definitions table, select a Cinchy table you want to monitor for data freshness and set a Maximum Staleness (Minutes) threshold. When the table has not been updated within that window, the dashboard flags an SLA breach.

4. Run initial maintenance

Baseline metrics and health snapshots must be populated before the dashboard shows any data. You have three options.

Maintenance CLI (recommended)
Update health scores button
Run the data syncs manually

Runs the full sequence in the correct order:

Cinchy.Connections.CLI.exe maintain-observability \
  -s cinchy.example.com/Cinchy \
  --https \
  --pat "YOUR_PERSONAL_ACCESS_TOKEN"

This executes:

Refresh 30-day rolling baseline metrics
Calculate health scores for all sync groups
Clean up snapshots older than the retention window
Reset real-time event counters

5. Schedule daily maintenance

Schedule Cinchy.Connections.CLI.exe maintain-observability to run daily so health scores and baselines stay current. The recommended time is 02:00 UTC, when sync activity is typically lowest.

Use any external scheduler that fits your deployment, such as Windows Task Scheduler, a Kubernetes CronJob, or a systemd timer. The CLI runs the full daily sequence (refresh baselines, refresh health snapshots, retention cleanup, counter reset).

tip

You can refresh health scores at any time without waiting for the daily job. Either re-run maintain-observability, or click Update health scores on the System Overview view of the dashboard. Both run the same four step sequence and are useful after adding new syncs, changing priorities, or recovering from an outage.

Dashboard views

The dashboard is organized into three top-level views and four alert views.

Top-level views

View	What it shows
System Overview	All sync groups with their current health score, member counts, and priority breakdown. Open alert counts (correlated failures, stale listeners, failed listeners, SLA breaches) appear as clickable tiles.
Real-Time Sync Status	Per-listener status (Caught Up, Recovering, or Stalled) with throughput, estimated backlog, and estimated catch-up time.
Batch Syncs Status	All batch syncs with their last execution state, recent success/failure pattern, and timing relative to baseline.

Alert views

View	What it shows
Stale Listener Detection	Enabled listeners that have not processed messages within the configured stale threshold.
Correlated Failure Clusters	Groups of syncs that failed within a shared time window, indicating a common root cause.
Data Freshness SLA	Tables exceeding their configured Maximum Staleness threshold defined in Observability SLA Definitions.
Failed Listeners	Listeners currently in a failed or disabled state.

Detail views

Clicking through any of the above opens a focused detail view:

View	What it shows
Group Detail	Members of a sync group with per-member health, baseline comparison, recent failure analysis, configuration changes, and trend charts.
Batch Sync Detail	Per-execution log, baseline comparison, top errors, and failure pattern for a single batch sync.
Listener Detail	Traffic, queue depth forecast, status history, and (where available) Service Broker or Kafka diagnostics for a single listener.

Understanding health scores

Each sync group receives a health score between 0 and 100 that summarizes its overall status.

How the score is calculated

The health score is a weighted average of three components:

Component	Weight	What it measures
Success rate	60%	The percentage of syncs in the group whose most recent execution succeeded.
Timing score	30%	How close current execution times are to the 30-day baseline. A score of 1.0 means running at baseline speed; the score drops toward 0.0 as execution time doubles.
Critical score	10%	Penalizes failures in Critical-priority syncs. If no Critical members exist, this component scores 1.0 (no penalty).

Color coding

Score range	Color	Meaning
90-100	Green	Healthy
70-89	Amber	Warning
50-69	Orange	Degraded
0-49	Red	Critical

Platform-specific notes

Windows / IIS + SQL Server
Kubernetes + Kafka
PostgreSQL

Service Broker queue health diagnostics require the database login used by Cinchy to have VIEW SERVER STATE permission. If this permission is missing, the dashboard displays an error message.

Check whether the current login already has the permission:

SELECT HAS_PERMS_BY_NAME(NULL, NULL, 'VIEW SERVER STATE') AS HasViewServerState;

A result of 1 means the permission is granted; 0 means it is not. If it is not granted, run:

GRANT VIEW SERVER STATE TO [YourCinchyLoginUser];

Permissions at the server scope can only be granted when the current database is master, so switch to it before running the GRANT statement (for example, USE master;).

Service Broker diagnostics (poison message detection, activation status, transmission backlog) are not available on PostgreSQL or Kubernetes deployments.

Kafka consumer lag metrics (committed vs. end offset, total unprocessed messages) are written by the Connections worker (KafkaBatchProcessor) via the EventSyncMetricsService after each successful batch, and persisted to the Event Listener State table in the dedicated [Kafka Lag], [Kafka Committed Offset], and [Kafka End Offset] columns. Until these columns are populated, the dashboard derives lag estimates from listener timestamps and counter-based processing rates.

Performance recommendations

Run platform maintenance regularly

The dashboard relies heavily on the Execution Log table for health scores, execution time trending, and correlated failure detection. As Execution Log accumulates rows over time, dashboard queries slow down, which affects every view that scans recent execution history.

We strongly recommend running Cinchy's platform maintenance on a regular cadence so old Execution Log rows are cleaned up. See Maintenance for instructions on enabling maintenance on Kubernetes or running the maintenance CLI on IIS.

Maintenance

Health scores and baseline metrics are not updated in real time. They are refreshed on a schedule, but you can also trigger a refresh on demand from the dashboard UI or by re-running the CLI.

Update health scores from the dashboard

On the System Overview view, click Update health scores in the Sync Groups header. A confirmation dialog lists the four steps that will run:

Refresh baseline metrics
Refresh health snapshots
Clean up old snapshots
Reset listener counters

Click Run now to execute the sequence. This is the same set of steps that the daily CLI job performs, so it is the recommended way to get an immediate score update between scheduled runs (for example, after adding new syncs, changing priorities, or recovering from an outage).

The job may take several minutes on large datasets. Keep the page open until it completes. A toast notification confirms success or lists any failed steps. Schedule the CLI for daily maintenance even if you use this button for on demand refreshes.

Using the CLI (recommended)

The maintain-observability CLI command runs the full maintenance sequence automatically in the correct order:

Cinchy.Connections.CLI.exe maintain-observability \
  -s cinchy.example.com/Cinchy \
  --https \
  --pat "YOUR_PERSONAL_ACCESS_TOKEN" \
  --retention-days 90

Option	Required	Description
`-s, --server`	Yes	Path to Cinchy server without protocol (for example, `cinchy.co/Cinchy`).
`-h, --https`	No	Use HTTPS connections.
`--pat`	Required (one of)	Personal Access Token. Use either `--pat` or both `-u` and `-p` — one authentication method must be provided.
`-u, --userid`	Required (one of)	User ID for Cinchy access. Use with `-p` as an alternative to `--pat`.
`-p, --password`	Required (one of)	Password for the specified user. Required when using `-u` instead of `--pat`.
`-r, --retention-days`	No	Number of days to retain health snapshots. Default: 90.
`-a, --tls`	No	TLS protocol version to use for the connection.

The CLI executes the following steps in order:

Refresh Baseline Metrics: computes 30-day rolling baselines (average execution time, 95th percentile, failure rate) for each batch sync.
Refresh Health Snapshots: calculates health scores for all sync groups using the latest baseline data.
Retention cleanup: deletes health snapshots older than the configured retention window.
Reset event counters: resets cumulative counter columns on Event Listener State for real-time metrics.

Retention

Health snapshots are retained for 90 days by default. When using the CLI, override this with the --retention-days flag. When running data syncs manually, delete old rows from the Observability Health Snapshots table as needed.

Running the data syncs manually

If you prefer to run the maintenance steps individually, you can execute the underlying data syncs directly from the Data Sync Configurations table in Cinchy. The two data sync configurations are provisioned automatically on upgrade.

Execution order

You must run Observability - Refresh Baseline Metrics before Observability - Refresh Health Snapshots. Health snapshot calculations depend on the baseline data. Running them out of order produces inaccurate scores.

Step 1: Refresh Baseline Metrics

Run the data sync named Observability - Refresh Baseline Metrics. This reads the last 30 days of completed batch sync executions from the Execution Log and writes per-sync baselines (average execution time, 95th percentile execution time, total executions, failure rate) to the Observability Baseline Metrics table.

Step 2: Refresh Health Snapshots

Run the data sync named Observability - Refresh Health Snapshots. This reads from Observability Sync Group Members, Execution Log, Listener Config, and Observability Baseline Metrics to calculate a health score for each sync group. Results are written to the Observability Health Snapshots table.

tip

After adding new syncs, changing priorities, or recovering from an outage, refresh scores by clicking Update health scores on the dashboard, running the CLI, or running both maintenance data syncs in order.

Introduction​

Access and permissions​

How it works​

Key capabilities​

Getting started​

1. Create sync groups​

2. Assign syncs to groups​

3. Define SLA thresholds (optional)​

4. Run initial maintenance​

5. Schedule daily maintenance​

Dashboard views​

Top-level views​

Alert views​

Detail views​

Understanding health scores​

How the score is calculated​

Color coding​

Platform-specific notes​

Performance recommendations​

Run platform maintenance regularly​

Maintenance​

Update health scores from the dashboard​

Using the CLI (recommended)​

Retention​

Running the data syncs manually​