MongoDB collection
Overview
MongoDB is a scalable, flexible NoSQL document database platform known for its horizontal scaling and load balancing capabilities, which has given application developers an unprecedented level of flexibility and scalability.
Considerations
Please review the following considerations before you set up your MongoDB Collection data sync source:
- We currently only support SCRAM authentication (Mongo 4.0+).
- Syncs are column based. This means that you must flatten the MongoDB source document prior to sync by using a projection (See section 2: Projection (JSON Object)).
- The column names used in the source must match elements on the root object, with the exception of "$" which can be used to retrieve the full document.
- By default, MongoDB batch size is 101.
- By default, bulk operations size is 5000.
- Due to a conversion of doubles to decimals that occurs during the sync process, minor data losses may occur.
- The following data types aren't supported:
- Binary Data
- Regular Expression
- DBPointer
- JavaScript
- JavaScript code with scope
- Symbol
- Min Key
- Max Key
- The following data types are supported with conversions:
- ObjectID is supported, but converted to string
- Object is supported, but converted to JSON
- Array is supported, but converted to JSON
- Timestamp is supported, but converted to 64-bit integers
The MongoDB Collection source supports batch syncs. (To enable real-time syncs with MongoDB, use the MongoDB Collection (Cinchy Event Triggered) or Mongo Event source instead.)
\
Info tab
You can find the parameters in the Info tab below (Image 1).
Values
Parameter | Description | Example |
---|---|---|
Title | Mandatory. Input a name for your data sync | MongoDB collection to Cinchy |
Description | Optional. Add in a description for your sync. There is a 500 character limit in thie field. | |
Variables | Optional. Review our documentation on Variables here for more information about this field. | |
Permissions | Data syncs are role based access systems where you can give specific groups read, write, execute, and/or all of the above with admin access. Inputting at least an Admin Group is mandatory. |
Source tab
The following table outlines the mandatory and optional parameters you will find on the Source tab (Image 2).
- Source Details
- Schema
The following parameters will help to define your data sync source and how it functions.
MongoDB Data Sync Configuration Parameters
Parameter | Description | Example |
---|---|---|
Source | Mandatory. Select your source from the drop-down menu. | MongoDB Collection |
Connection String | Mandatory. Encrypted connection string. Exclude /[database] from the URL. If authenticating against a non-admin database, use the authSource parameter. Review MongoDB's Connection String guide for details. | Default: mongodb+srv://<username>:<password>@<mongo host URI> Different database: mongodb+srv://<username>:<password>@<mongo host URI>?authSource=<authentication_db> |
Database | Mandatory. Name of the MongoDB database containing the collection specified in the "Collection" parameter. | Blog |
Collection | Mandatory. Name of your MongoDB collection. | Article |
Type | Mandatory. Data retrieval method. Choose between db.collection.find() for basic queries without data transformation, and db.collection.aggregate() for complex queries requiring data transformation. Generally, db.collection.find() offers quicker performance unless you need specific aggregation operators. | |
Query (JSON Object) | Optional. Appears if db.collection.find() is selected. Define a query for data retrieval. | Example Query |
Projection (JSON Object) | Optional. Appears if db.collection.find() is selected. Since syncs are column-based, flatten the MongoDB source document using a projection. | Example Projection |
Pipeline (JSON Array of Objects) | Optional. Appears if db.collection.aggregate() is selected. Define one or more stages that process documents in an aggregation pipeline. | |
Use SSL | Optional. Check to use x.509 certificate authentication. Requires input of SSL Key PEM, SSL Certificate PEM, and SSL CLA PEM. |
**The** Schema section is where you define which source columns you want to sync in your connection. You can repeat the values for multiple columns.
Parameter | Description | Example |
---|---|---|
Name | Mandatory. The name of your column as it appears in the source. | Name |
Alias | Optional. You may choose to use an alias on your column so that it has a different name in the data sync. | |
Data Type | Mandatory. The data type of the column values. | Text |
Description | Optional. You may choose to add a description to your column. |
Select Show Advanced for more options for the Schema section.
Parameter | Description | Example |
---|---|---|
Mandatory |
| |
Validate Data |
| |
Trim Whitespace | Optional if data type = text. For Text data types, you can choose whether to trim the whitespace._ | |
Max Length | Optional if data type = text. You can input a numerical value in this field that represents the maximum length of the data that can be synced in your column. If the value is exceeded, the row will be rejected (you can find this error in the Execution Log). |
You can choose to add in a Transformation > String Replacement by inputting the following:
Parameter | Description | Example |
---|---|---|
Pattern | Mandatory if using a Transformation. The pattern for your string replacement. | |
Replacement | What you want to replace your pattern with. |
Query example
// query: where "Price" is less than 10
blog> db.Articles.find({ "Price": { "$lt": 10 } })
[
{
_id: ObjectId("63d8137bd755fcdeed234403"),
Name: 'Shirt',
Price: 9.95,
Details: { Color: 'White', Size: 'Small' },
Stock: 61
}
]
Projection example
// Flatten the document
blog> db.Articles.find({}, { Name: 1, Price: 1, Color: "Details.Color", Size: "Details.Size", Stock: 1 })
[
{
_id: ObjectId("63d812afd755fcdeed234402"),
Name: 'Shirt',
Price: 19.95,
Stock: 12,
Color: 'Details.Color',
Size: 'Details.Size'
},
{
_id: ObjectId("63d8137bd755fcdeed234403"),
Name: 'Shirt',
Price: 9.95,
Stock: 61,
Color: 'Details.Color',
Size: 'Details.Size'
}
]
Next steps
- Configure your Destination
- Define your Sync Actions.
- Add in your Post Sync Scripts, if required.
- To run a batch sync, select Jobs > Start Job.
Appendix A
Data types
The MongoDB Collection Data Source obtains BSON documents from MongoDB. BSON, short for Binary JSON, is a binary-encoded serialization of JSON-like documents. Like JSON, BSON supports the embedding of documents and arrays
within other documents and arrays. BSON also has extensions that allow representation of data types that aren't part of the JSON spec. For example, BSON makes a distinction between Int32
and Int64
.
The following table shows how MongoDB data types are translated in Cinchy.
MongoDB | Cinchy | Notes |
---|---|---|
Double | Number | Supported |
String | Text | Supported |
Object | Text (JSON) | Supported |
Array | Text (JSON) | Supported |
Binary Data | Binary | Unsupported |
ObjectId | Text | Supported |
Boolean | Boolean | Supported |
Date | Date | Supported |
Null | - | Supported |
RegEx | - | Unsupported |
JavaScript | - | Unsupported |
Timestamp | Number | Supported |
32-bit Integer | Number | Supported |
64-bit Integer | Number | Supported |
Decimal28 | Number | Supported |
Min Key | - | Unsupported |
Max Key | - | Unsupported |
- | Geography | Unsupported |
- | Geometry | Unsupported |
Retry configuration
A retry configuration will automatically retry HTTP Requests on failure based on a defined set of conditions. This capability provides a mechanism to recover from transient errors such as network disruptions or temporary service outages.
Note: the maximum number of retries is capped at 10.
To set up a retry specification:
- Select "Add Retry Configuration" from the Source tab.
- Select your Delay Strategy.
- Linear Backoff: Defines a delay of approximately n seconds where n = current retry attempt.
- Exponential Backoff: A strategy where every new retry attempt is delayed exponentially by 2^n seconds, where n = current retry attempt.
- Example: you defined Max Attempts = 3. Your first retry is going to be in 2^1 = 2, second: 2^2 = 4, third: 2^3 = 8 sec.
- Input your Max Attempts. The maximum number of retries allowed is 10.
4. Define your Retry Conditions. You must define the conditions under which a retry should be attempted. For the Retry to trigger, at least one of the "Retry Conditions" has to evaluate to true.
Retry conditions are only evaluated if the response code isn't 2xx Success.
Each Retry Condition contains one or more "Attribute Match" sections. This defines a Regex to evaluate against a section of the HTTP response. The following are the three areas of the HTTP response that can be inspected:
- Response Code
- Header
- Body
If there are multiple "Attribute Match" blocks within a Retry Condition, all have to match for the retry condition to evaluate to true.
The Regex value should be entered as a regular expression. The Regex engine is .NET and expressions can be tested by using this online tool. In the below example, the Regex is designed to match any HTTP 5xx Server Error Codes, using a Regex value of 5[0-9][0-9]
.
For Headers, the format of the Header string which the Regex is applied against is {Header Name}={Header Value}
. For example, Content-Type=application/json
.