Build a concordance¶
Goal¶
Build a concordance to examine the context of occurrence of a given string.
Prerequisites¶
Some text has been imported in Orange Textable (see Cookbook: Text input) and possibly further processed (see Cookbook: Segmentation manipulation).
Procedure¶
- Create an instance of Segment and an instance of Context on the canvas.
- Drag and drop from the output connection (righthand side) of the widget instance that emits the segmentation in which occurrences of the query string will be retrieved (e.g. Text Field) to the Segment widget instance’s input connection (lefthand side).
- Also connect both the Text Field instance and the Segment instance to the Context instance (thus forming a triangle).
- Open the Segment instance’s interface by double-clicking on its
icon on the canvas and type the string whose context of occurrence will be
examined in the Regex field (here:
hobbit
); assign it a recognizable Output segmentation label, such as key_segments for instance. - Click the Send button (or make sure the Send automatically checkbox is selected).
- Open the Context instance’s interface by double-clicking on its icon on the canvas.
- In the Units section, select the segmentation that contains the occurrences of the query string (here: key_segments) using the Segmentation drop-down menu.
- In the Contexts section, choose Mode: Containing segmentation and select the segmentation that contains the original text (here: text_string, as emitted by the Text Field instance) using the Segmentation drop-down menu.
- Tick the Max. length checkbox and set the maximum number of characters that should be displayed on either side of each occurrence of the query string.
- Click the Compute button (or make sure the Compute automatically checkbox is selected).
- A table showing the results is then available at the output connection of the Count instance; to display or export it, see Cookbook: Table output.
Comment¶
- In the Regex field of the Segment widget you can use all the
syntax of Python’s regular expression (cf. Python documentation); for instance, if you wish to
restrict your search to entire words, you might frame the query string with
word boundary anchors
\b
(in our example\bhobbit\b
).