Skip to main content
Version: 2.15.X

Configuring S3 Connectors

S3 connectors are configurable from the Data Management Tool. Both JSON and CSV connectors are supported.

tip

Review your JSON or CSV file before setup to ensure the correct key names or column headers/numbers.

To configure an S3 connector:

  1. From the Data Management Tool homepage, click Connectors. The screen changes to the connector setup screen.
  2. Click + S3 Connector.
  3. Under Connector Info, enter the required connector information. For more information, see Specifying Connector Information. When all required fields have been filled, click Next.
  4. Under Field Mapping, define the relevant field mappings. For more information, see Defining Field Mappings.
  5. When finished, click one of the following:
    • To create the connector, click Create.
    • To save all configurations and run the connector, click the dropdown menu and select Create and Run.
    • To abandon the creation, click Cancel.

Specifying Connector Information

The Connector Info section contains a number of fields for defining and customizing S3 connectors.

To fill in the required connector information:

  1. In the Name field, enter a name for the data connector.
  2. In the Description field, enter an optional description for the data connector (such as its use case or purpose).
  3. In the Amazon S3 Bucket section, specify the details for the connector's S3 bucket.
  4. In the Data Type section, specify the details for the data that the connector will process. See Defining JSON Data for JSON S3 connectors, or Defining CSV Data for CSV S3 connectors.
  5. In the Import Info section, from the Select Topic dropdown menu, select a Kafka topic to serve as the destination location where data is imported. You may select an existing topic, or create a new one by clicking "+ Use New Topic."
  6. Configure any Advanced Settings as needed.

Specifying S3 Bucket Details

Cogynt must identify the S3 bucket's path, and a select few other details, before the connection can be made.

To specify these S3 bucket details:

  1. In the Bucket Path field, enter the path to the connector's S3 bucket, including the directory where data is stored. If Cogynt cannot find the bucket based on the entered path, a warning message is displayed.
  2. From the Region dropdown menu, select the region that hosts your data.
  3. In the Access Key ID and Secret Access Key fields, enter the optional access keys to gain access to specific files within the S3 bucket. If the credentials are invalid for the specified bucket, a Failure to connect warning is shown. For more information, refer to the Amazon Web Services documentation.

Configuring Advanced Settings

Bucket setup includes advanced settings that define buffer size, how often the bucket is polled for new data, and other settings related to processing tasks.

To configure these advanced settings:

  1. In the S3 Buffer Size field, enter an adjusted buffer size (measured in bytes).
  2. In the Poll Frequency field, enter the duration of time (measured in seconds) to poll the bucket for new data.
  3. In the Processing Tasks field, enter the number of tasks that should process data. It is recommended that the entered number match the number of partitions in the selected Kafka topic.
  4. If necessary, click Reset to Defaults to restore the Advanced Settings fields to their default values.
note

S3 Buffer Size and Poll Frequency decide how often Cogynt attempts to process data. Cogynt processes more data at faster intervals with a higher buffer size and poll frequency. The default values are sufficient for most data sets.

Defining JSON Data

The Data Type section of Connector Info has several JSON-specific options to define if the intended connector is a JSON-type connector.

To define the necessary JSON data:

  1. Under Data Type, click the JSON option button to specify that JSON data is intended.
  2. Choose a JSON Format from the dropdown menu. The available options are Newline and Array.
  3. In the Date Time Format field, specify the date/time format the data uses. Cogynt will attempt to map data according to the specified format. The default format is yyyy-MM-dd'T'HH:mm:ss.SSSX.
note

For more information about expected date/time format standards, refer to Microsoft's table of date-time format strings.

Defining CSV Data

The Data Type section of Connector Info has several CSV-specific options to define if the intended connector is a CSV-type connector.

To define the necessary CSV data:

  1. Under Data Type, click the CSV option button to specify that CSV data is intended.
  2. Choose a CSV Delimiter from the dropdown menu. The available options are Comma(,) and Tab. Specific field separators can be defined. For example, enter | into the field to use a pipe character.
  3. In the Date Time Format field, specify the date/time format the data uses. Cogynt will attempt to map data according to the specified format. The default format is yyyy-MM-dd'T'HH:mm:ss.SSSX.
note

For more information about expected date/time format standards, refer to Microsoft's table of date-time format strings.

Defining Field Mappings

The Field Mapping section contains a number of options for ensuring the correct mapping between your data files and Kafka topics.

For JSON S3 connectors, see Defining JSON Field Mappings. For CSV S3 connectors, see Defining CSV Field Mappings.

Defining JSON Field Mappings

The Field Mapping section has several JSON-specific options to define if the intended connector is a JSON-type connector.

To define the necessary JSON field mappings:

  1. For each row:
    1. In the JSON Key Name field, enter the name of the JSON key to map. It must be spelled exactly as it appears in the JSON file.
    2. From the Field dropdown menu, either:
      1. Select the associated schema field to map.
      2. Select + Add Field to add a new field:
  2. Click Add Mapping Row to insert additional rows as needed, or click the dropdown icon and select Prepopulate from Topic Schema to prepopulate rows with data from an extracted schema, if it exists.

Adding New Data Fields

When mapping JSON fields, it is sometimes necessary to define a new field mapping.

Once + Add Field has been clicked, use these steps to add a new data field:

  1. From the Data dropdown menu, select the type of data the field contains.
  2. In the Path field, enter a name indicating where the field will be stored.

Defining CSV Field Mappings

The Field Mapping section has several CSV-specific options to define if the intended connector is a CSV-type connector.

To define the necessary CSV field mappings:

  1. Check the CSV File Has Headers checkbox if your CSV file has headers. Otherwise, uncheck it.
  2. For each row:
    1. If your CSV has headers, in the CSV Header Name field, enter the name of the CSV header to map. It must be spelled exactly as it appears in the CSV file. If your CSV does not have headers, in the CSV Column Number field, type an integer representing the column number. The first column begins with 0.
    2. From the Field dropdown menu, either:
      1. Select the associated schema field to map.
      2. Select "+ Add Field" to add a new field:
        1. From the Data dropdown menu, select the type of data the field contains.
        2. In the Path field, enter a name indicating where the field will be stored.
  3. Click Add Mapping Row to insert additional rows as needed, or click the dropdown icon and select Prepopulate from Topic Schema to prepopulate rows with data from an extracted schema, if it exists.

Extracting Data Schemas

Cogynt can extract an existing data schema from several different model artifacts and map it to your JSON or CSV.

To extract a data schema:

  1. Under the Field Mapping section, click Extract Schema.
  2. Under Referenced Schema, click the option button corresponding to the source of the schema. The available options are Event Type, User Data Schema, and Topic.
  3. From the Project dropdown menu, select the project containing the schema to extract.
  4. From the other dropdown menu (whose name matches the selected option button from Step 1, select the item containing the schema to extract.
  5. Click Extract to extract the schema, or Cancel to abandon the extraction.