Contact Us

If you still have questions or prefer to get help directly from an agent, please submit a request.
We’ll get back to you as soon as possible.

Please fill out the contact form below and we will reply as soon as possible.

  • Contact Us
  • Home
  • Entity Types
  • Custom Entity Types

Creating Regular Expression Entity Types

Learn how to create a custom entity type using regular expression and keywords.

Written by Andrea Harvey

Updated at May 8th, 2025

Contact Us

If you still have questions or prefer to get help directly from an agent, please submit a request.
We’ll get back to you as soon as possible.

Please fill out the contact form below and we will reply as soon as possible.

  • Insights
    Prebuilt Insights Custom Insights
  • Content
  • Accounts
  • Activity Log
  • Content Scans
  • Migrations
    Migration Jobs Migration Reports Maps Flagged Items Migration Tools
  • Connections
    Supported Platform Connections Creating Connections Connection Maintenance Connection Pools
  • Entity Types
    DryvIQ Available Entity Types Custom Entity Types Entity Type Maintenance
  • Action Sets
    Creating Action Sets Action Sets Maintenance
  • Settings
    License Performance Notifications Extensions Entity Types Settings Display Settings Configuration
  • API, CLI, and SDK Documentation
    REST API Documentation Command-line Interface SDK Development
  • POC Offering
  • Release Notes
+ More

Table of Contents

Overview Entity Type Description Category Regex Description Regex Pattern Confidence Validations Keyword Proximity Keywords Maximum Distance From Match Negative Keyword Proximity Negative Keywords Maximum Distance From Match

Overview

A regular expression is a pattern used to identify text. It allows you to have very fine-grained control over what content DryvIQ detects. The pattern must be constructed according to regular expression standards. Multiple online resources are available to explain how to build a regular expression pattern.

When creating an entity using a regular expression, you can add one or more patterns to ensure the entity type matches exactly what you want to find. You will specify both the pattern and the confidence level for each pattern. You can improve match accuracy by adding keywords and validation to the entity type.

Entity Type Description

You can add a description for the entity type you are creating. Separate from the name used to search for the entity type in the application, the description gives users an understanding of what the entity type is attempting to accomplish with the list you are using. The description is limited to 256 characters.

Click Edit to add a description to the entity type.

 

Type a description in the text field that displays and click Done.

Category

The category identifies the type of data being detected. The Category list includes the default categories and any custom categories you have created. Preinstalled entity types are assigned a corresponding category. All custom entity types default to “General,” but you can edit the category if a specific category needs to be used for an entity type. (See Managing Entity Type Categories for information about creating and managing custom categories.)

Regex

This section is where you will build your regular expression pattern and assign a confidence level.

Description

The Description is a user-defined name for the pattern you will use. This helps identify the pattern. While this is an optional field for the regular expression patterns you add, DryvIQ recommends adding a description since this makes it easier for other users to understand the pattern when reviewing the information.

Regex Pattern

The Regex pattern is the regular expression pattern you want to use for the entity type. Again, the pattern must be constructed according to regular expression standards.

The timeout for matching regex patterns in an entity type is five minutes per pattern. If a match takes longer than the timeout, DryvIQ will log an error in the Activity Log indicating the regex engine timed out.

 

Confidence

The confidence level provides a simple mechanism for controlling the number of false positives you will tolerate. The Confidence list displays the available levels. Each confidence level corresponds to a threshold used throughout the rest of the entity type model.

The confidence level mapping is as follows:

  • None= 0
  • Very weak = 0.05
  • Weak = 0.3
  • Medium = 0.5
  • Strong = 0.7
  • Very strong = 0.85

 

Click Add regex pattern to add additional patterns. You can add as many patterns to the entity type as you like to help strengthen the match.

 

 

Validations

Validations are a way to improve the match success. DryvIQ has preinstalled validation rules to validate social security numbers, checksums, driver’s license numbers, etc. You can choose validation rules from the list, and DryvIQ will run all the matches against the selected validation rules. DryvIQ filters out content that fails validation. For example, if you set a credit card pattern to detect credit card numbers, you should also select to apply the Luhn Check validation to ensure the matches are valid credit card numbers. This extra validation limits the number of false positive matches you must sort through. Again, adding validations increases the match success by the percentage identified.

Keyword Proximity

You can further improve the match accuracy by providing a list of keywords that may appear in close proximity to the entity you want to identify. A term is considered in close proximity if it is within five words before the match by default. These keywords boost the confidence level of a given match. In the example, the confidence level for the serial number pattern used may only be 0.5 (medium); however, if you add keywords, the confidence level increases by 35% (as noted by the green percentage displayed on the right of this section).

Keywords

Keywords can be manually added to the Keywords field or imported using a CSV file. When manually adding keywords, you can enter the terms as comma-separated values or add each keyword on a new line. (See Managing Keywords for more information on adding, uploading, editing, and removing keywords.)

Keywords cannot contain the following characters: ~ ` ! @ # $ % ^ & * ( ) = { } [ ] | \ : ; " ' < > ? . /

 

 

Adding keywords as comma-separated values

 

Adding keywords on a new line

Maximum Distance From Match

These fields allow you to set a custom keyword proximity. By default, a term within five words before the regular expression pattern will trigger a match confidence adjustment. You can edit the field to specify the distance you prefer to use. You can also turn on proximity to search for keywords after the regular expression pattern and specify the value you want to use. Clearing the checkbox for the words before or words after proximity field disables the proximity search in that direction. You should not disable both fields since doing so turns keyword matching off.

Negative Keyword Proximity

The Negative keyword list is an explicit list of words or phrases that should prevent a match if detected within proximity of the regular expression pattern. This helps reduce false positives. For an upload against an entity type, the match confidence will be 0% if a negative keyword is found. For a content scan, the presence of a negative keyword, even if other validation and keywords are present, will prevent the item from being matched.

Negative Keywords

Negative keywords can be manually added to the Negative keywords field or imported using a CSV file. When manually adding keywords, you can enter the terms as comma-separated values or add each keyword on a new line.

Maximum Distance From Match

These fields allow you to set a custom keyword proximity. By default, a term within five words before the regular expression pattern will trigger a match confidence adjustment. You can edit the field to specify the distance you prefer to use. You can also turn on proximity to search for keywords after the regular expression pattern and specify the value you want to use. Clearing the checkbox for the words before or words after proximity field disables the proximity search in that direction. You should not disable both fields since doing so turns keyword matching off.

 

pattern entity

Was this article helpful?

Yes
No
Give feedback about this article

Related Articles

  • Metadata Mapping Using REST API
  • Mapping Microsoft SharePoint Content Types Using REST API
  • Mapping Microsoft SharePoint Managed Metadata Using REST API

Copyright 2025 – DryvIQ.

Knowledge Base Software powered by Helpjuice

Expand