Version: 2.12.X

Using Cogynt's S3 Connector for Data Management

The Cogynt S3 connector offers seamless and customizable data extraction from an AWS S3 bucket into Kafka within your Cogynt platform.

The process to configure an S3 connector follows three steps:

Configuring the connector.
Defining CSV format.
Mapping CSV fields to a schema.

S3 Connector Requirements

This guide assumes your organization has an S3 bucket storing CSV files containing data to process in Cogynt Authoring.

This guide is for data modelers, DevOps teams, or analysts.

Configuring S3 Connectors

A new S3 connector is configurable from the Data Management Tool.

To configure an S3 connector:

From the Data Management Tool homepage, click Connectors. The screen changes to the connector setup screen.
Click + S3 Connector.
Configure the required and optional advanced settings for the S3 bucket:
1. Enter a Name for the data connector.
2. Type an optional Description to explain the connector's use.
3. Beneath the Amazon S3 Bucket heading, select the Region that hosts your data from the dropdown menu. (The default selection is us-west-1.)
4. In the Bucket Path field, enter the S3 bucket path.
5. In the Access Key ID and Secret Access Key fields, enter the optional access keys to gain access to specific files within the S3 bucket. For more information on Access and Secret Access Keys, refer to the Amazon Web Services documentation.
6. In the S3 Buffer Size field, enter an adjusted buffer size (measured in bytes).
7. In the Poll Frequency field, enter the duration of time (measured in seconds) to poll the bucket for new data.
After verifying the preceding details, proceed to Configuring CSV File Formats.

Note

S3 Buffer size and poll frequency decide how often Cogynt attempts to process data. Cogynt processes more data at faster intervals with a higher buffer size and poll frequency. The default values are sufficient for most data sets.

Configuring CSV File Formats

The S3 connector requires a defined CSV separator and date-time file format to accurately read from the CSV files in the S3 bucket.

To configure the CSV file format:

On the connector setup screen, scroll to the CSV File Format subhead.
In the Field Separator dropdown, select the separator used in the CSV from these options:
- Comma separated (the default selection).
- Tab delimited.
In the Date Time Format dropdown, select the date time format used in the CSV. + Custom Date Time sets the format based on custom parameters. Enter the date-time format string used in the CSV. Refer to this table of date-time format strings for more details on expected standards.
Under the Target Topic section in the Select Topic dropdown menu, select an existing topic or define a new topic.

Tip

Specific field separators can be defined with the Field Separator field. For example, enter | into the field to use a pipe character.

Extracting Data Schemas

It is possible to extract data schemas from an event type or user data schema directly from a project, and map them to specific columns within the CSV files.

There are three sources for data schema extraction:

Event types.
User data schemas.
Kafka topics.

Extracting Schemas by Event Type

All event types are associated with a schema. It is possible to extract the schema of an event type to be used as a template schema for the target Kafka topic.

To extract a schema by event type:

In the Field Mappings section, from the Event Type field column, click Extract Schema. The Extract Schema modal opens.
At the top of the Extract Schema modal, select Event Type (the default selection).
From the Project dropdown, select an existing project.
From the Event Type dropdown, select an existing event type associated with the project.
Click Extract to extract the data schema, or Cancel to cancel the extraction.

Extracting Schemas from Existing User Data Schemas

The Data Management Tool can extract schemas directly from existing projects.

To extract the schema from an existing user data schema:

From the Field Mappings column, click Extract Schema. The Extract Schema modal opens.
At the top of the Extract Schema modal, select User Data Schema.
From the Project dropdown, select an existing project.
From the User Data Schema dropdown, select an existing data schema.
Click Extract to extract the data schema, or Cancel to cancel the extraction.

Extracting Schemas from Topics

The Data Management Tool can extract a schema from an existing Kafka topic by sampling data from the topic.

To extract a data schema from an existing topic:

From the Field Mappings column, click Extract Schema. The Extract Schema modal opens.
At the top of the Extract Schema modal, select Kafka Topic.
From the Topic dropdown menu, select an existing topic.
Click Extract to extract the data schema, or Cancel to cancel the extraction.

Defining CSV Field Mappings

Cogynt maps each row of the CSV file and publishes those rows as single JSON records in the target Kafka topic. Cogynt maps only the columns listed in the field mappings section as events.

Review the CSV file before setup to ensure the correct column header names.

Example

An example field mapping might select a CSV file column called First Name to map to the event type field first_name from the target Kafka topic.

S3 connector setup is completed after defining any existing CSV headers and extracting a data schema from existing topics (if needed).

To define CSV field mappings with headers:

At the top of Field Mappings, check the box for CSV Has Headers if your CSV has headers.
Click + Add Mapping Row. Repeat this step as often as needed for each column.
1. In the Header Name field, type the header's name. The name is case and space sensitive, and spelling must exactly match the header.
2. In the Field column dropdown, select the associated schema field to map to a column or create a new custom field.
After adding all data schemas, do one of the following:
- At the top of the page, click Create.
- Click the Create dropdown menu and select Create and Run to save all configurations and run the connector.
  - After clicking Create and Run, a modal containing the details of the configured S3 connector appears.
  - If the topic should replace an existing topic, check the box for Replace existing topic? when running the S3 connector.

To define CSV field mappings without headers:

At the top of Field Mappings, uncheck the box for CSV Has Headers.
Click + Add Mapping Row. Repeat this step as often as needed for each column.
1. In the CSV Column Number field, type an integer representing the column number. The first column begins with 0.
2. In the Field column dropdown, select the associated schema field to map to a column or create a new custom field.
After adding all data schemas, do one of the following:
- At the top of the page, click Create.
- Click the Create dropdown menu and select Create and Run to save all configurations and run the connector.
  - After clicking Create and Run, a modal containing the details of the configured S3 connector appears.
  - If the topic should replace an existing topic, check the box for Replace existing topic? when running the S3 connector.

Starting S3 Connectors

S3 connectors may be started at any time from the Data Management Tool.

To start an S3 connector:

To the right of any stopped S3 connector, click the More menu (⋮).
Click Run.
- A modal containing the details of the configured S3 connector appears.
- If the topic should replace an existing topic, check the box for Replace existing topic? when running the S3 connector.
Click Run again to start the S3 connector.

Stopping S3 Connectors

S3 connectors may be stopped at any time from the Data Management Tool.

To stop an S3 connector:

From the Data Management Tool homepage, click Connectors. The screen changes to the connector setup screen.
To the right of a running S3 connector, click the More menu (⋮).
Click Stop to stop the connector, or Stop and Edit to stop the connector and make changes afterward.

Pausing S3 Connectors

S3 connectors may be paused at any time from the Data Management Tool.

To pause an S3 connector:

From the Data Management Tool homepage, click Connectors. The screen changes to the connector setup screen.
To the right of a running S3 connector, click the More menu (⋮).
Select Pause to pause the connector.

Resuming S3 Connectors

Paused S3 connectors may be resumed at any time from the Data Management Tool. Resuming the connector starts from where the connector paused.

To resume a paused S3 connector:

To the right of any paused S3 connector, click the More menu (⋮).
Click Resume.

S3 Connector Requirements​

Configuring S3 Connectors​

Configuring CSV File Formats​

Extracting Data Schemas​

Extracting Schemas by Event Type​

Extracting Schemas from Existing User Data Schemas​

Extracting Schemas from Topics​

Defining CSV Field Mappings​

Starting S3 Connectors​

Stopping S3 Connectors​

Pausing S3 Connectors​

Resuming S3 Connectors​