1.6. Partitioning segmentations

There are many situations where we might want so selectively in- or exclude segments from a segmentation. For instance, a user might be want to exclude from a word segmentation all those that are less than 4 letters long. The Select widget is tailored for such tasks.

The widget’s interface (see figure 1 below) offers a choice between two modes: Include and Exclude. Depending on this parameter, incoming segments that satisfy a given condition will be either included in or excluded from the output segmentation. By default (i.e. when the Advanced settings box is unchecked), the condition is specified by means of a regex, which will be applied to each incoming segment successively. (For now, the option Annotation key: (none) can be ignored.)

Example usage of widget Select

Figure 1: Excluding short words with widget Select.

In the example of figure 1, the widget is configured to exclude all incoming segments containing no more than 3 letters. Note that without the beginning of segment and end of segment anchors (^ and $), all words containing at least a sequence of 1 to 3 letters–i.e. all the words–would be excluded.

Note that Select automatically emits a second segmentation containing all the segments that have been discarded from the main output segmentation (in the case of figure 1 above, that would be all words less than 4 letters long). This feature is useful when both the selected and the discarded segments are to be further processed on distinct branches. By default, when Select is connected to another widget, the main segmentation is being emitted. In order to send the segmentation of discarded segments instead, right-click on the outgoing connection and select Reset Signals (see figure 2 below).

Right-clicking on a connection and requesting to "Reset Signals"

Figure 2: Right-clicking on a connection and requesting to Reset Signals.

This opens the dialog shown on figure 3 below, where the user can “drag-and-drop” from the gray box next to Discarded data up to the box next to Segmentation, thus replacing the existing green connection. Clicking OK validates the modification and sends the discarded data through the connection.

Dialog for modifying the connection between two widgets

Figure 3: This dialog allows the user to select a non-default connection between two widgets.