Exclude segments based on a stoplist

Goal

Filter out segments based on a stoplist.

Prerequisites

Some text has been imported in Orange Textable (see Cookbook: Text input) and it has been segmented into words (see Cookbook: Segment text in smaller units).

Ingredients

Widget Text Field Segment Intersect
Icon textfield_icon segment_icon intersect_icon
Quantity 1 1 1

Procedure

Exclude segments based on a stoplist with instances of Text Field, Segment and Intersect

Figure 1: Exclude segments based on a stoplist with instances of Text Field, Segment and Intersect

  1. Create an instance of Text Field on the canvas and paste into it the stoplist you want to use.
  2. Follow the indications given in Cookbook: Segment text in smaller units in order to segment the stoplist into words; in what follows, it is assumed that the label of the resulting segmentation is stop words.
  3. Create an instance of Intersect on the canvas.
  4. Drag and drop from the output connection (righthand side) of the widget instance that emits the segmentation to be filtered (here the top instance of Segment) to the Intersect instance’s input connection (lefthand side).
  5. Likewise, connect the Segment instance that emits the stop words segmentation to the Intersect instance.
  6. Open the Intersect instance’s interface by double-clicking on its icon on the canvas.
  7. In the Intersect section, choose Mode: Exclude.
  8. In the Source segmentation field, choose the label of the word segmentation to be filtered (here: words); in the Filter segmentation field, choose the label the segmentation containing the stopwords (here: stop words).
  9. Click the Send button (or make sure the Send automatically checkbox is selected).
  10. A segmentation containing the filtered segmentation is then available on the Intersect instance’s output connections; to display or export it, see Cookbook: Text output.

Comment

  • Stopword lists for various languages can be found here.