Count unit frequency¶
Goal¶
Count the frequency of each segment type that appears in a segmentation.
Prerequisites¶
Some text has been imported in Orange Textable (see Cookbook: Text input) and it has been segmented in smaller units (see Cookbook: Segment text in smaller units).
Procedure¶
- Create an instance of Count on the canvas.
- Drag and drop from the output connection (righthand side) of the widget instance that emits the segments that will be counted (e.g. Segment) to the Count widget instance’s input connection (lefthand side).
- Open the Count instance’s interface by double-clicking on its icon on the canvas.
- In the Units section, select the segmentation containing units to be counted in the Segmentation drop-down menu (here: letters).
- Click the Compute button (or make sure the Compute automatically checkbox is selected).
- A table showing the results is then available at the output connection of the Count instance; to display or export it, see Cookbook: Table output.
Comment¶
- The total number of segments in your segmentation appears in the Info section (here: 14).
- It is also possible to define units as segment pairs (bigrams), triples (trigrams), and so on, by increasing the Sequence length parameter in the Units section.
- If Sequence length is set to a value greater than 1, the string
appearing in the Intra-sequence delimiter field will be inserted between
the elements composing each n-gram in the column headers, which can
enhance their readability. The default is
#
but you can change it by inserting the delimiter of your choice.