Data Authoring
Telling the system how to recognize and deal with data is an essential task in the authoring process. In Cogynt Authoring, the metadata describing incoming data comes in two forms, both of which are top-level model artifacts:
- User Data Schema – Contains a list of fields consisting of names, identifiers, and data types.
- Event Type – Uses fields defined by user data schemas, and also has its own attributes for dealing with certain streaming characteristics of the incoming data.
The separation between the two entities allows multiple event types to link to one user data schema. Changes to the schema are automatically reflected to all event types linked to it. For more information, see Model Artifacts.
Managing User Data Schemas
The User Data Schema Management window provides tools for managing and working with user data schemas. This section describes them in detail.
To open the User Data Schema Management window, click the Schema Discovery button on the application toolbar.
If a project is currently active, the window displays a table showing all the user schemas belonging to the project. If there is no active project, selecting one from the Project dropdown menu opens it and displays its user data schemas.
Performing Schema Discovery
In addition to Apache Flink clusters, the Cogynt HCEP platform also uses Apache Kafka servers as its high throughput pub-sub messaging system. Cogynt Authoring's Schema Discovery utility expedites the creation of user data schemas by providing pre-populated data schemas with relevant values. Schema discovery returns the names and fields for all topics available in a configured Kafka cluster, and automatically detects the appropriate data type for each field.
To perform schema discovery:
- On the application toolbar, click the Schema Discovery button.
- In the Schema Discovery window:
- From the Project dropdown menu, select the project for which to perform schema discovery. The default selection is the current project.
- From the Kafka Broker dropdown menu, select the connect string for the Kafka broker storing your data.
- If selecting "Custom" from the Kafka Broker dropdown menu:
- In the Host field, specify the host of the Kafka cluster storing your data.
- In the Port field, specify the port of the Kafka cluster storing your data.
- Click Discover. Once Authoring completes data processing, it returns a list of Discovered Topics.
- In the Discovered Topics list, click a relevant topic. The Data Schema window opens, pre-populated with the topic's data schema.
- In the Data Schema window:
- Verify the information in the Name and Path fields for each schema item. If necessary, enter new values in each field. Check the Sync path to name checkbox if you want both fields to contain the same entries as you type.
- Click Sample Data to display or hide a sample of what each schema item's data looks like.
- Click Add Field (+) to add a new field. For more information on adding fields, see Adding Fields to User Data Schemas.
- Click the Delete (trash can) button on the right of each unnecessary schema item to remove it.
- Click and drag the dotted drag handle on the right of the field to reposition it in the list of fields.
- When finished, click Create User Data Schema to save the schema, or click Clear to discard the schema. (Note: The Create User Data Schema button is enabled only after selecting a project.)
- Repeat steps 4-5 as needed for each relevant topic in the Discovered Topics list.
Creating User Data Schemas
New user data schemas can be manually added to the project as needed.
Although Cogynt makes it possible to create new user data schemas manually, schema discovery is the primary method for creating new user data schemas.
For more information about schema discovery, see Performing Schema Discovery.
To create a new user data schema:
- In the User Data Schema Management window, click Add (+).
- In the Create User Data Schema dialog:
- In the Name field, enter a name for the user data schema. The name can't exceed 255 characters in length.
- In the Description field, enter a description for the user data schema.
- In the Fields section, add new fields to the user data schema as necessary. For more information, see Adding Fields to User Data Schemas.
- Click Create to create the new schema, or click Cancel to discard it.
Exporting User Data Schemas
User data schemas can be exported as downloadable files that can be loaded into other projects.
To export a user data schema:
- In the User Data Schema Management window, locate the user data schema to export.
- Click the corresponding Export button.
- Refer to the procedures in Exporting and Importing Authoring Files.
Duplicating User Data Schemas
User data schemas can be copied when necessary. For example, rather than starting from a blank schema, it may save time to create a modified version of an existing schema.
To duplicate a user data schema:
- In the User Data Schema Management window, locate the user data schema to duplicate.
- Click the corresponding Duplicate button.
- In the prompt, type a new name for the duplicate user data schema.
Deleting User Data Schemas
Unnecessary or erroneous user data schemas can be deleted as needed.
To delete a user data schema:
- In the User Data Schema Management window, locate the user data schema to export.
- Click the corresponding Delete (trash can) button.
Remove all event type linkages from the target user data schema before attempting to delete it.
Deletion can occur only if there are no event types linked to the schema. This prevents event types from having invalid schema linkages.
Editing User Data Schemas
Existing user data schemas can be modified as needed in order to keep them accurate and up-to-date. This section describes the various modifications that can be performed.
To edit a user data schema:
- In the User Data Schema Management window, locate the user data schema to export.
- Click the corresponding Edit (✎) button. The user data schema editor opens.
Adding Fields to User Data Schemas
User data schemas have an ordered list of fields, each given a name and a data type, that defines the basic structure of incoming data. New fields can be added to user data schemas as needed.
To add a field to a user data schemas:
- In the user data schema editor, click Add new field (+). (Note: All user data schemas require at least one field.)
- For each new field:
- Click the Data Type (Aa) button to select the field’s data type. For more information, see the note following these instructions.
- In the text box, enter a name for the field.
- In the Path field, specify the name of the field as it exists in the Kafka topic.
- Select the Personally Identifiable Information option appropriate for the field. Click True if the field may contain personally identifiable information. Otherwise, click False.
- Click the Create field (+) button to create the field, or click outside the dialog to discard it.
Cogynt Authoring supports the following data types for user data schema fields:
- Array
- Boolean
- Date/time
- Float
- Geo coordinate
- Geo polygon
- Integer
- IP
- String
- URL
- Unique ID
Selecting the "array" data type allows specifying the particular kind of array:
- Generic array (for dealing with special cases, such as arrays of objects)
- Boolean array
- Date/time array
- Float array
- Geo coordinate array
- Geo polygon array
- Integer array
- IP array
- String array
- URL array
- Unique ID array
Working with Nested Fields
On occasion, you might want to make a user data schema using a JSON with nested fields. In such cases, Cogynt recommends making a field for each individual subfield entry.
For example, consider a field like the following:
{
"city_proper": {
"definition": "Metropolis prefecture",
...
}
}
Here, the city_proper
field has a subfield called definition
whose value is the string Metropolis prefecture
.
To add a field that records this information:
- Follow the instructions in Adding Fields to User Data Schemas.
- For the Name of the field, enter any value you like. (For example,
City Definition
.) - For the Path, enter
city_proper|definition
. - For the Type of the field, specify "String."
Cogynt uses a pipe (|
) as a delimiter in such cases, not a dot (.
). This is to support situations where field names in JSON files contain a dot.
Reordering User Data Schema Fields
When editing an existing user data schema, the editor loads its full information. HCEP processes the fields in the order that they appear in the list. This order can be rearranged as needed.
To reorder user data schema fields:
- In the user data schema editor, under the Fields section, locate the field to move up or down the list.
- On the right of the field, to the right of the Delete (trash can) button, click the dotted drag handle and hold down the mouse button.
- Drag the field to the desired place in the list.
- Release the mouse button.
Editing User Data Schema Fields
The individual fields of user data schemas are modifiable.
To edit a user data schema field:
- In the user data schema editor, locate the field to edit.
- Verify that the Data Type is the appropriate kind.
- In the Name text box, enter a new name for the field as needed.
- In the Path text box, specify the name of the field as it exists in the Kafka topic.
- Check the Sync path to name checkbox to automatically match the field's Path value to the entered Name value. (This option is the default selection.)
- Click Save to save the changes, Revert to undo the changes and remain in the editor, or Cancel to discard the changes and exit the editor.
Deleting User Data Schema Fields
Unnecessary or erroneous fields in user data schemas can be removed.
To delete a user data schema field:
- In the user data schema editor, locate the field to delete.
- To the right of the field, click the corresponding Delete (trash can) button.
There are no confirmation dialogues for deletion. The field is removed as soon as Delete is clicked.
Linking and Unlinking Tags
Linking assigns tags to user data schema for grouping or classification.
To link a tag to a user data schema:
- In the user data schema editor, in the Tags dropdown menu, click the Expand button.
- Click a tag from the list to link it to the user data schema.
- Click Create New Tag to add a new tag to the list, if necessary. For more information, see Adding New Tags.
To unlink a tag from a user data schema:
- In the user data schema editor, in the Tags dropdown menu, click the Expand button.
- Click the Unlink button beside the tag to remove.
Managing Event Types
The system and event pattern views do not deal with user data schemas directly. Instead, they require event types.
The Event Type Management window provides tools for managing and working with event types as needed. This section describes them in detail.
To open the Event Type Management window, click the Event Types button on the application toolbar.
Creating New Event Types
New event types can be created as needed using the Create Event Type dialog.
To access the dialog, do one of the following:
- In the Event Type Management window, click Add new event type (+).
- Select the Create a new event type option when creating a connecting event, then click OK. For more information, see Establishing Event Pattern Data Flow.
To create a new event type in the Create Event Type dialog:
- In the Name field, specify a name for the event type.
- In the Description field, enter a description for the event type if desired.
- Under Link Analysis Type, define the node type that should represent the event type in Cogynt Analyst Workstation's link analysis visualization:
- Select None to exclude the event type from link analysis.
- Select Linkage to set a flag informing Cogynt Workstation that the event type concerns the relationship between entities (for example, if the event type is supposed to flag a connection or a match).
- Select Entity to set a flag informing Cogynt Workstation that the event type represents a concrete thing (such as a person or an address).
- In the Base Probability field, specify the default risk value of when the event does not occur.
- From the Stream Type dropdown menu, select the appropriate type of data stream:
- Select CRUD to store the data stream’s state in Flink. Flink will then keep track of create, update, and delete actions.
- Select Insert Only to disable saving of the data stream’s state, which will inhibit the recording of create, update, and delete actions. (This assumes that every incoming event is independent. It is typically used for optimization purposes.)
- Select Lookup to set a JDBC database table as the data stream. (With this option selected, the Source dropdown menu forbids selecting "Kafka".)
- Under Source, from the Type dropdown menu, select the appropriate type of source:
- Select Kafka to specify a data stream as the source:
- In the Data Source Name field, enter any of a deployment target's configured data source names. The dropdown menu lists all the valid data sources that exist in the project. Any data source of that connection type (e.g. Kafka or JDBC) configured in any deployment target is valid. For more information about deployment targets, see Deploying Models.
- In the Topic field, enter the name of the topic as it appears in Kafka.
- In the Topic Partitions field, enter the number of partitions to configure. (Note: This partition field is strictly a Kafka topic configuration, and should not be confused with the partitions in event patterns.)
- In the Topic Retention Time field, specify the time it should take for an event to expire (i.e., no longer be available in the topic). Select the appropriate unit of time from the Time Unit dropdown menu.
- In the Topic Retention Size field, specify the maximum amount of data for retention. Select the appropriate unit of measurement from the Size Unit dropdown menu. (Once the topic reaches the specified size, it becomes a first-in, first-out system, and drops older messages.)
- Select JDBC to specify a JDBC database table as the source (Note: Selecting JDBC switches the Type to "Lookup"):
- In the Data Source Name field, enter a data source name from a deployment target that points to a JDBC connection.
- In the Database Table Name field, enter the name of the table used as a database.
- Select Internal to avoid publishing data to Kafka, and instead propagate the data to downstream patterns directly in Flink.
- Select Kafka to specify a data stream as the source:
- Under Filter, select the appropriate lexicon filter from the Type dropdown menu, and fill out its accompanying fields:
- In the Filter Field field, specify the field on which to apply the filter.
- In the Lexicon Node field, specify the lexicon node to use as the filter.
- Set the Match toggle to the appropriate option:
- If Match Any is selected, any words found in the lexicon are considered matches, and will pass through the filter.
- If Match None is selected, any words found in the lexicon are not considered matches, and will not pass through the filter.
- Under Fields, select a user data schema to apply from the User Data Schemas dropdown menu. Click Add New Schema to create a new user data schema if desired, or click the Edit (pencil) button to edit the selected user data schema.
- Under Manual Actions, configure the names of any actions that should be available to Cogynt Workstation analysts when inspecting events published by the event type.
- Click Create to create the event type, or click Cancel to discard it.
The minimum unit necessary for link analysis in Workstation is two separate event types in Authoring, one of which is an "entity," and one of which is a "linkage."
For more information about link analysis in Workstation, refer to the Workstation User Guide.
Editing Event Types
The process for editing events is largely the same as creating them (see Creating New Event Types). The main difference is that tag linkage is available when editing event types. For more information about tags, see Managing Tags.
The Tags dropdown menu does not appear during event type creation, because tags can only be assigned to existing artifacts. The event type does not exist until the creation process has finished.
To edit an event type:
- In the Event Type Management window, click the Edit (✎) icon beside the event type to edit.
- Edit the event type’s information as needed. For descriptions of each field, see Creating New Event Types.
- In the Tags menu:
- Click the dropdown button to add a new tag or link an existing tag to the event type. For more information, see Managing Tags.
- Click the Edit (✎) button beside a linked tag to change it.
- Click the Unlink button beside a linked tag to unlink it from the event type.
- Click Save to save the changes, Revert to undo the changes and remain in the editor, or Cancel to discard the changes and exit the editor.
Exporting Event Types
Event types can be exported as downloadable files that can be loaded into other projects.
To export an event type:
- In the Event Type Management window, locate the event type to export.
- Click the corresponding Export button.
- Save the file to the desired location on your machine.
Duplicating Event Types
Event types can be copied when necessary.
To duplicate an event type:
- In the Event Type Management window, locate the event type to duplicate.
- Click the corresponding Duplicate button.
- In the prompt, type a new name for the duplicate event type.
Deleting Event Types
Unnecessary or erroneous event type can be deleted as needed.
To delete an event type:
- In the Event Type Management window, locate the event type to delete.
- Click the corresponding Delete (trash can) button.