Include/exclude segments based on a pattern

Goal

Include or exclude segments from a segmentation using a regular expression

Prerequisites

Some text has been imported in Orange Textable (see Cookbook: Text input) and in all likelihood it has been segmented in smaller units (see Cookbook: Segment text in smaller units).

Ingredients

Widget Select
Icon select_icon
Quantity 1

Procedure

Include or exclude units based on a pattern with an instance of Select

Figure 1: Using the Select widget to include/exclude segments from a segmentation based on a regular expression

  1. Create an instance of Select on the canvas.
  2. Drag and drop from the output connection (righthand side) of the widget instance that emits the segmentation to be filtered (e.g. an instance of Segment) to the Select instance’s input connection (lefthand side).
  3. Open the Select instance’s interface by double-clicking on its icon on the canvas.
  4. In the Select section, choose either Mode: Include or Exclude.
  5. In the Regex field, insert the pattern that will select the units to be included or excluded, such as the single letter e in our example.
  6. Click the Send button (or make sure the Send automatically checkbox is selected).
  7. A segmentation containing the selected segments is then available on the Select instance’s output connections; to display or export it, see Cookbook: Text output.

Comment

  • In the Regex field you can use all the syntax of Python’s regular expression (cf. Python documentation).
  • The Select widget emits on a second output connection (not selected by default) a segmentation containing the segments that were not selected.