1.6. Partitioning segmentations¶
There are many situations where we might want so selectively in- or exclude segments from a segmentation. For instance, a user might be want to exclude from a word segmentation all those that are less than 4 letters long. The Select widget is tailored for such tasks.
The widget’s interface (see figure 1 below) offers a choice between two modes: Include and Exclude. Depending on this parameter, incoming segments that satisfy a given condition will be either included in or excluded from the output segmentation. By default (i.e. when the Advanced settings box is unchecked), the condition is specified by means of a regex, which will be applied to each incoming segment successively. (For now, the option Annotation key: (none) can be ignored.)

Figure 1: Excluding short words with widget Select.
In the example of figure 1, the
widget is configured to exclude all incoming segments containing no more than
3 letters. Note that without the beginning of segment and end of segment
anchors (^
and $
), all words containing at least a sequence of 1 to
3 letters–i.e. all the words–would be excluded.
Note that Select automatically emits a second segmentation containing all the segments that have been discarded from the main output segmentation (in the case of figure 1 above, that would be all words less than 4 letters long). This feature is useful when both the selected and the discarded segments are to be further processed on distinct branches. By default, when Select is connected to another widget, the main segmentation is being emitted. In order to send the segmentation of discarded segments instead, right-click on the outgoing connection and select Reset Signals (see figure 2 below).
This opens the dialog shown on figure 3 below, where the user can “drag-and-drop” from the gray box next to Discarded data up to the box next to Segmentation, thus replacing the existing green connection. Clicking OK validates the modification and sends the discarded data through the connection.