Skip to main content

Observability Dashboard

Introduction

The Observability Dashboard provides a centralized view of sync health across your Cinchy environment. It surfaces health scores, failure patterns, execution trends, and real-time listener metrics so you can identify issues quickly without parsing logs manually.

Users with Admin access can access the Dashboard by navigating to Profile > Dashboard or by visiting /admin/observability.

Platform compatibility

Unless noted otherwise, all features work on both deployment types and on both SQL Server and PostgreSQL databases. Platform-specific features are tagged as [IIS only], [K8s only], or [SQL Server only] throughout this page.

Access and permissions

The Observability Dashboard is restricted to members of the Cinchy Administrators User Group. Both the dashboard view — reachable from Profile > Dashboard or /admin/observability — and the underlying Observability tables (sync groups, group members, SLA definitions, health snapshots, baseline metrics) are administrator-only. Non-admin users cannot view the dashboard or edit its configuration.

How it works

Cinchy reads execution and listener metrics already captured in the platform, rolls them into baselines and health snapshots during a scheduled maintenance pass, and renders the result as scores, trends, and alerts. The dashboard does not compute scores live — it reads the most recent snapshot.

Key capabilities

CapabilityDescription
Health scoresA 0-100 color-coded score for each sync group, computed from recent execution history.
Correlated failure detectionIdentifies clusters of syncs that failed around the same time, indicating a shared root cause.
Execution time trendingPer-sync timing with a 30-day rolling baseline for comparison.
Top error summaryErrors ranked by impact across all monitored syncs.
Listener traffic monitoringThroughput, queue depth, and batch success rates for real-time listeners.
Stale listener detectionAlerts when a listener has not processed messages within its expected interval.
Data freshness and SLA trackingPer-table staleness compared to configured thresholds.
Queue depth forecastingEstimated drain time based on current throughput.
Service Broker queue healthPoison message detection, activation status, and transmission backlog. [IIS only] [SQL Server only]
Kafka consumer lagCommitted vs. end offset and total unprocessed messages per listener. [K8s only]
Post-outage recovery statusShows which syncs have recovered and which remain degraded.

Getting started

The dashboard displays data only after you create sync groups and assign syncs. Follow these steps to set up the dashboard for the first time.

1. Create sync groups

Open the Observability Sync Groups table in Cinchy and create one or more groups. Each group represents a logical collection of related syncs (for example, "Claims Processing" or "Billing"). Use the Parent Group column to create subgroup hierarchies.

2. Assign syncs to groups

In the Observability Sync Group Members table, link each batch sync (via Data Sync Config) or real-time listener (via Listener Config) to a group. Set Priority to one of the following:

PriorityDescription
CriticalFailures in Critical syncs are penalized by a dedicated 10% component of the health score and surface in alerts. They also count toward the overall success rate (60% of the score) like any other failure.
UrgentImportant syncs that should be investigated promptly. Failures count toward the overall success rate but do not trigger the Critical-score penalty.
Low PrioritySyncs where occasional failures are acceptable. Failures still count toward the overall success rate but are not otherwise weighted.
info

Only syncs assigned to a group appear on the dashboard. Unassigned syncs are not visible.

3. Define SLA thresholds (optional)

In the Observability SLA Definitions table, select a Cinchy table you want to monitor for data freshness and set a Maximum Staleness (Minutes) threshold. When the table has not been updated within that window, the dashboard flags an SLA breach.

4. Run initial maintenance

Baseline metrics and health snapshots must be populated before the dashboard shows any data. You have three options.

Runs the full sequence in the correct order:

Cinchy.Connections.CLI.exe maintain-observability \
-s cinchy.example.com/Cinchy \
--https \
--pat "YOUR_PERSONAL_ACCESS_TOKEN"

This executes:

  1. Refresh 30-day rolling baseline metrics
  2. Calculate health scores for all sync groups
  3. Clean up snapshots older than the retention window
  4. Reset real-time event counters

5. Schedule daily maintenance

Schedule Cinchy.Connections.CLI.exe maintain-observability to run daily so health scores and baselines stay current. The recommended time is 02:00 UTC, when sync activity is typically lowest.

Use any external scheduler that fits your deployment, such as Windows Task Scheduler, a Kubernetes CronJob, or a systemd timer. The CLI runs the full daily sequence (refresh baselines, refresh health snapshots, retention cleanup, counter reset).

tip

You can refresh health scores at any time without waiting for the daily job. Either re-run maintain-observability, or click Update health scores on the System Overview view of the dashboard. Both run the same four step sequence and are useful after adding new syncs, changing priorities, or recovering from an outage.

Dashboard views

The dashboard is organized into three top-level views and four alert views.

Top-level views

ViewWhat it shows
System OverviewAll sync groups with their current health score, member counts, and priority breakdown. Open alert counts (correlated failures, stale listeners, failed listeners, SLA breaches) appear as clickable tiles.
Real-Time Sync StatusPer-listener status (Caught Up, Recovering, or Stalled) with throughput, estimated backlog, and estimated catch-up time.
Batch Syncs StatusAll batch syncs with their last execution state, recent success/failure pattern, and timing relative to baseline.

Alert views

ViewWhat it shows
Stale Listener DetectionEnabled listeners that have not processed messages within the configured stale threshold.
Correlated Failure ClustersGroups of syncs that failed within a shared time window, indicating a common root cause.
Data Freshness SLATables exceeding their configured Maximum Staleness threshold defined in Observability SLA Definitions.
Failed ListenersListeners currently in a failed or disabled state.

Detail views

Clicking through any of the above opens a focused detail view:

ViewWhat it shows
Group DetailMembers of a sync group with per-member health, baseline comparison, recent failure analysis, configuration changes, and trend charts.
Batch Sync DetailPer-execution log, baseline comparison, top errors, and failure pattern for a single batch sync.
Listener DetailTraffic, queue depth forecast, status history, and (where available) Service Broker or Kafka diagnostics for a single listener.

Understanding health scores

Each sync group receives a health score between 0 and 100 that summarizes its overall status.

How the score is calculated

The health score is a weighted average of three components:

ComponentWeightWhat it measures
Success rate60%The percentage of syncs in the group whose most recent execution succeeded.
Timing score30%How close current execution times are to the 30-day baseline. A score of 1.0 means running at baseline speed; the score drops toward 0.0 as execution time doubles.
Critical score10%Penalizes failures in Critical-priority syncs. If no Critical members exist, this component scores 1.0 (no penalty).

Color coding

Score rangeColorMeaning
90-100GreenHealthy
70-89AmberWarning
50-69OrangeDegraded
0-49RedCritical

Platform-specific notes

Service Broker queue health diagnostics require the database login used by Cinchy to have VIEW SERVER STATE permission. If this permission is missing, the dashboard displays an error message.

Check whether the current login already has the permission:

SELECT HAS_PERMS_BY_NAME(NULL, NULL, 'VIEW SERVER STATE') AS HasViewServerState;

A result of 1 means the permission is granted; 0 means it is not. If it is not granted, run:

GRANT VIEW SERVER STATE TO [YourCinchyLoginUser];

Permissions at the server scope can only be granted when the current database is master, so switch to it before running the GRANT statement (for example, USE master;).

Service Broker diagnostics (poison message detection, activation status, transmission backlog) are not available on PostgreSQL or Kubernetes deployments.

Performance recommendations

Run platform maintenance regularly

The dashboard relies heavily on the Execution Log table for health scores, execution time trending, and correlated failure detection. As Execution Log accumulates rows over time, dashboard queries slow down, which affects every view that scans recent execution history.

We strongly recommend running Cinchy's platform maintenance on a regular cadence so old Execution Log rows are cleaned up. See Maintenance for instructions on enabling maintenance on Kubernetes or running the maintenance CLI on IIS.

Maintenance

Health scores and baseline metrics are not updated in real time. They are refreshed on a schedule, but you can also trigger a refresh on demand from the dashboard UI or by re-running the CLI.

Update health scores from the dashboard

On the System Overview view, click Update health scores in the Sync Groups header. A confirmation dialog lists the four steps that will run:

  1. Refresh baseline metrics
  2. Refresh health snapshots
  3. Clean up old snapshots
  4. Reset listener counters

Click Run now to execute the sequence. This is the same set of steps that the daily CLI job performs, so it is the recommended way to get an immediate score update between scheduled runs (for example, after adding new syncs, changing priorities, or recovering from an outage).

The job may take several minutes on large datasets. Keep the page open until it completes. A toast notification confirms success or lists any failed steps. Schedule the CLI for daily maintenance even if you use this button for on demand refreshes.

The maintain-observability CLI command runs the full maintenance sequence automatically in the correct order:

Cinchy.Connections.CLI.exe maintain-observability \
-s cinchy.example.com/Cinchy \
--https \
--pat "YOUR_PERSONAL_ACCESS_TOKEN" \
--retention-days 90
OptionRequiredDescription
-s, --serverYesPath to Cinchy server without protocol (for example, cinchy.co/Cinchy).
-h, --httpsNoUse HTTPS connections.
--patRequired (one of)Personal Access Token. Use either --pat or both -u and -p — one authentication method must be provided.
-u, --useridRequired (one of)User ID for Cinchy access. Use with -p as an alternative to --pat.
-p, --passwordRequired (one of)Password for the specified user. Required when using -u instead of --pat.
-r, --retention-daysNoNumber of days to retain health snapshots. Default: 90.
-a, --tlsNoTLS protocol version to use for the connection.

The CLI executes the following steps in order:

  1. Refresh Baseline Metrics: computes 30-day rolling baselines (average execution time, 95th percentile, failure rate) for each batch sync.
  2. Refresh Health Snapshots: calculates health scores for all sync groups using the latest baseline data.
  3. Retention cleanup: deletes health snapshots older than the configured retention window.
  4. Reset event counters: resets cumulative counter columns on Event Listener State for real-time metrics.

Retention

Health snapshots are retained for 90 days by default. When using the CLI, override this with the --retention-days flag. When running data syncs manually, delete old rows from the Observability Health Snapshots table as needed.

Running the data syncs manually

If you prefer to run the maintenance steps individually, you can execute the underlying data syncs directly from the Data Sync Configurations table in Cinchy. The two data sync configurations are provisioned automatically on upgrade.

Execution order

You must run Observability - Refresh Baseline Metrics before Observability - Refresh Health Snapshots. Health snapshot calculations depend on the baseline data. Running them out of order produces inaccurate scores.

Step 1: Refresh Baseline Metrics

Run the data sync named Observability - Refresh Baseline Metrics. This reads the last 30 days of completed batch sync executions from the Execution Log and writes per-sync baselines (average execution time, 95th percentile execution time, total executions, failure rate) to the Observability Baseline Metrics table.

Step 2: Refresh Health Snapshots

Run the data sync named Observability - Refresh Health Snapshots. This reads from Observability Sync Group Members, Execution Log, Listener Config, and Observability Baseline Metrics to calculate a health score for each sync group. Results are written to the Observability Health Snapshots table.

tip

After adding new syncs, changing priorities, or recovering from an outage, refresh scores by clicking Update health scores on the dashboard, running the CLI, or running both maintenance data syncs in order.