Observability Dashboard
Introduction
The Observability Dashboard provides a centralized view of sync health across your Cinchy environment. It surfaces health scores, failure patterns, execution trends, and real-time listener metrics so you can identify issues quickly without parsing logs manually.
Users with Admin access can access the Dashboard by navigating to Profile > Dashboard or by visiting /admin/observability.
Unless noted otherwise, all features work on both deployment types and on both SQL Server and PostgreSQL databases. Platform-specific features are tagged as [IIS only], [K8s only], or [SQL Server only] throughout this page.
Access and permissions
The Observability Dashboard is restricted to members of the Cinchy Administrators User Group. Both the dashboard view — reachable from Profile > Dashboard or /admin/observability — and the underlying Observability tables (sync groups, group members, SLA definitions, health snapshots, baseline metrics) are administrator-only. Non-admin users cannot view the dashboard or edit its configuration.
How it works
Cinchy reads execution and listener metrics already captured in the platform, rolls them into baselines and health snapshots during a scheduled maintenance pass, and renders the result as scores, trends, and alerts. The dashboard does not compute scores live — it reads the most recent snapshot.
Key capabilities
| Capability | Description |
|---|---|
| Health scores | A 0-100 color-coded score for each sync group, computed from recent execution history. |
| Correlated failure detection | Identifies clusters of syncs that failed around the same time, indicating a shared root cause. |
| Execution time trending | Per-sync timing with a 30-day rolling baseline for comparison. |
| Top error summary | Errors ranked by impact across all monitored syncs. |
| Listener traffic monitoring | Throughput, queue depth, and batch success rates for real-time listeners. |
| Stale listener detection | Alerts when a listener has not processed messages within its expected interval. |
| Data freshness and SLA tracking | Per-table staleness compared to configured thresholds. |
| Queue depth forecasting | Estimated drain time based on current throughput. |
| Service Broker queue health | Poison message detection, activation status, and transmission backlog. [IIS only] [SQL Server only] |
| Kafka consumer lag | Committed vs. end offset and total unprocessed messages per listener. [K8s only] |
| Post-outage recovery status | Shows which syncs have recovered and which remain degraded. |
Getting started
The dashboard displays data only after you create sync groups and assign syncs. Follow these steps to set up the dashboard for the first time.
1. Create sync groups
Open the Observability Sync Groups table in Cinchy and create one or more groups. Each group represents a logical collection of related syncs (for example, "Claims Processing" or "Billing"). Use the Parent Group column to create subgroup hierarchies.
2. Assign syncs to groups
In the Observability Sync Group Members table, link each batch sync (via Data Sync Config) or real-time listener (via Listener Config) to a group. Set Priority to one of the following:
| Priority | Description |
|---|---|
| Critical | Failures in Critical syncs are penalized by a dedicated 10% component of the health score and surface in alerts. They also count toward the overall success rate (60% of the score) like any other failure. |
| Urgent | Important syncs that should be investigated promptly. Failures count toward the overall success rate but do not trigger the Critical-score penalty. |
| Low Priority | Syncs where occasional failures are acceptable. Failures still count toward the overall success rate but are not otherwise weighted. |
Only syncs assigned to a group appear on the dashboard. Unassigned syncs are not visible.
3. Define SLA thresholds (optional)
In the Observability SLA Definitions table, select a Cinchy table you want to monitor for data freshness and set a Maximum Staleness (Minutes) threshold. When the table has not been updated within that window, the dashboard flags an SLA breach.
4. Run initial maintenance
Baseline metrics and health snapshots must be populated before the dashboard shows any data. You have three options.
- Maintenance CLI (recommended)
- Update health scores button
- Run the data syncs manually
Runs the full sequence in the correct order:
Cinchy.Connections.CLI.exe maintain-observability \
-s cinchy.example.com/Cinchy \
--https \
--pat "YOUR_PERSONAL_ACCESS_TOKEN"
This executes:
- Refresh 30-day rolling baseline metrics
- Calculate health scores for all sync groups
- Clean up snapshots older than the retention window
- Reset real-time event counters
On the System Overview view of the dashboard, click Update health scores in the Sync Groups header. This runs the same four step sequence as the CLI on demand and is the easiest way to populate data for the first time or refresh it between scheduled runs. Leave the page open while the job is running because it may take several minutes on large datasets.
The model upgrade provisions two data sync configurations in the Data Sync Configurations table. Run them in this order from the Cinchy UI:
- Observability - Refresh Baseline Metrics
- Observability - Refresh Health Snapshots
See Running the data syncs manually for details. Note that this option does not perform retention cleanup or counter resets. Schedule the CLI command for daily maintenance even if you ran the syncs manually for the initial population.
5. Schedule daily maintenance
Schedule Cinchy.Connections.CLI.exe maintain-observability to run daily so health scores and baselines stay current. The recommended time is 02:00 UTC, when sync activity is typically lowest.
Use any external scheduler that fits your deployment, such as Windows Task Scheduler, a Kubernetes CronJob, or a systemd timer. The CLI runs the full daily sequence (refresh baselines, refresh health snapshots, retention cleanup, counter reset).
You can refresh health scores at any time without waiting for the daily job. Either re-run maintain-observability, or click Update health scores on the System Overview view of the dashboard. Both run the same four step sequence and are useful after adding new syncs, changing priorities, or recovering from an outage.
Dashboard views
The dashboard is organized into three top-level views and four alert views.
Top-level views
| View | What it shows |
|---|---|
| System Overview | All sync groups with their current health score, member counts, and priority breakdown. Open alert counts (correlated failures, stale listeners, failed listeners, SLA breaches) appear as clickable tiles. |
| Real-Time Sync Status | Per-listener status (Caught Up, Recovering, or Stalled) with throughput, estimated backlog, and estimated catch-up time. |
| Batch Syncs Status | All batch syncs with their last execution state, recent success/failure pattern, and timing relative to baseline. |
Alert views
| View | What it shows |
|---|---|
| Stale Listener Detection | Enabled listeners that have not processed messages within the configured stale threshold. |
| Correlated Failure Clusters | Groups of syncs that failed within a shared time window, indicating a common root cause. |
| Data Freshness SLA | Tables exceeding their configured Maximum Staleness threshold defined in Observability SLA Definitions. |
| Failed Listeners | Listeners currently in a failed or disabled state. |
Detail views
Clicking through any of the above opens a focused detail view:
| View | What it shows |
|---|---|
| Group Detail | Members of a sync group with per-member health, baseline comparison, recent failure analysis, configuration changes, and trend charts. |
| Batch Sync Detail | Per-execution log, baseline comparison, top errors, and failure pattern for a single batch sync. |
| Listener Detail | Traffic, queue depth forecast, status history, and (where available) Service Broker or Kafka diagnostics for a single listener. |
Understanding health scores
Each sync group receives a health score between 0 and 100 that summarizes its overall status.
How the score is calculated
The health score is a weighted average of three components:
| Component | Weight | What it measures |
|---|---|---|
| Success rate | 60% | The percentage of syncs in the group whose most recent execution succeeded. |
| Timing score | 30% | How close current execution times are to the 30-day baseline. A score of 1.0 means running at baseline speed; the score drops toward 0.0 as execution time doubles. |
| Critical score | 10% | Penalizes failures in Critical-priority syncs. If no Critical members exist, this component scores 1.0 (no penalty). |
Color coding
| Score range | Color | Meaning |
|---|---|---|
| 90-100 | Green | Healthy |
| 70-89 | Amber | Warning |
| 50-69 | Orange | Degraded |
| 0-49 | Red | Critical |
Platform-specific notes
- Windows / IIS + SQL Server
- Kubernetes + Kafka
- PostgreSQL
Service Broker queue health diagnostics require the database login used by Cinchy to have VIEW SERVER STATE permission. If this permission is missing, the dashboard displays an error message.
Check whether the current login already has the permission:
SELECT HAS_PERMS_BY_NAME(NULL, NULL, 'VIEW SERVER STATE') AS HasViewServerState;
A result of 1 means the permission is granted; 0 means it is not. If it is not granted, run:
GRANT VIEW SERVER STATE TO [YourCinchyLoginUser];
Permissions at the server scope can only be granted when the current database is master, so switch to it before running the GRANT statement (for example, USE master;).
Service Broker diagnostics (poison message detection, activation status, transmission backlog) are not available on PostgreSQL or Kubernetes deployments.
Kafka consumer lag metrics (committed vs. end offset, total unprocessed messages) are written by the Connections worker (KafkaBatchProcessor) via the EventSyncMetricsService after each successful batch, and persisted to the Event Listener State table in the dedicated [Kafka Lag], [Kafka Committed Offset], and [Kafka End Offset] columns. Until these columns are populated, the dashboard derives lag estimates from listener timestamps and counter-based processing rates.
All dashboard queries work on both SQL Server and PostgreSQL, with the exception of Service Broker diagnostics (SQL Server only). CDC cursor data decoding and hourly metrics parsing are handled client-side to avoid database dialect differences.
Performance recommendations
Run platform maintenance regularly
The dashboard relies heavily on the Execution Log table for health scores, execution time trending, and correlated failure detection. As Execution Log accumulates rows over time, dashboard queries slow down, which affects every view that scans recent execution history.
We strongly recommend running Cinchy's platform maintenance on a regular cadence so old Execution Log rows are cleaned up. See Maintenance for instructions on enabling maintenance on Kubernetes or running the maintenance CLI on IIS.
Maintenance
Health scores and baseline metrics are not updated in real time. They are refreshed on a schedule, but you can also trigger a refresh on demand from the dashboard UI or by re-running the CLI.
Update health scores from the dashboard
On the System Overview view, click Update health scores in the Sync Groups header. A confirmation dialog lists the four steps that will run:
- Refresh baseline metrics
- Refresh health snapshots
- Clean up old snapshots
- Reset listener counters
Click Run now to execute the sequence. This is the same set of steps that the daily CLI job performs, so it is the recommended way to get an immediate score update between scheduled runs (for example, after adding new syncs, changing priorities, or recovering from an outage).
The job may take several minutes on large datasets. Keep the page open until it completes. A toast notification confirms success or lists any failed steps. Schedule the CLI for daily maintenance even if you use this button for on demand refreshes.
Using the CLI (recommended)
The maintain-observability CLI command runs the full maintenance sequence automatically in the correct order:
Cinchy.Connections.CLI.exe maintain-observability \
-s cinchy.example.com/Cinchy \
--https \
--pat "YOUR_PERSONAL_ACCESS_TOKEN" \
--retention-days 90
| Option | Required | Description |
|---|---|---|
-s, --server | Yes | Path to Cinchy server without protocol (for example, cinchy.co/Cinchy). |
-h, --https | No | Use HTTPS connections. |
--pat | Required (one of) | Personal Access Token. Use either --pat or both -u and -p — one authentication method must be provided. |
-u, --userid | Required (one of) | User ID for Cinchy access. Use with -p as an alternative to --pat. |
-p, --password | Required (one of) | Password for the specified user. Required when using -u instead of --pat. |
-r, --retention-days | No | Number of days to retain health snapshots. Default: 90. |
-a, --tls | No | TLS protocol version to use for the connection. |
The CLI executes the following steps in order:
- Refresh Baseline Metrics: computes 30-day rolling baselines (average execution time, 95th percentile, failure rate) for each batch sync.
- Refresh Health Snapshots: calculates health scores for all sync groups using the latest baseline data.
- Retention cleanup: deletes health snapshots older than the configured retention window.
- Reset event counters: resets cumulative counter columns on Event Listener State for real-time metrics.
Retention
Health snapshots are retained for 90 days by default. When using the CLI, override this with the --retention-days flag. When running data syncs manually, delete old rows from the Observability Health Snapshots table as needed.
Running the data syncs manually
If you prefer to run the maintenance steps individually, you can execute the underlying data syncs directly from the Data Sync Configurations table in Cinchy. The two data sync configurations are provisioned automatically on upgrade.
You must run Observability - Refresh Baseline Metrics before Observability - Refresh Health Snapshots. Health snapshot calculations depend on the baseline data. Running them out of order produces inaccurate scores.
Step 1: Refresh Baseline Metrics
Run the data sync named Observability - Refresh Baseline Metrics. This reads the last 30 days of completed batch sync executions from the Execution Log and writes per-sync baselines (average execution time, 95th percentile execution time, total executions, failure rate) to the Observability Baseline Metrics table.
Step 2: Refresh Health Snapshots
Run the data sync named Observability - Refresh Health Snapshots. This reads from Observability Sync Group Members, Execution Log, Listener Config, and Observability Baseline Metrics to calculate a health score for each sync group. Results are written to the Observability Health Snapshots table.
After adding new syncs, changing priorities, or recovering from an outage, refresh scores by clicking Update health scores on the dashboard, running the CLI, or running both maintenance data syncs in order.