Convert XML tags into Orange Textable annotations¶
Goal¶
Convert XML markup into Orange Textable data structures such as segments and their annotations.
Prerequisites¶
Some text containing XML markup has been imported in Orange Textable (see Cookbook: Text input) and possibly further processed (see Cookbook: Segmentation manipulation).
Ingredients¶
Widget Extract XML Icon Quantity 1
Procedure¶

Figure 1: Convert XML tags into Orange Textable annotations with an instance of Extract XML
- Create an instance of Extract XML on the canvas.
- Drag and drop from the output connection (righthand side) of the widget instance that emits the data containing XML markup (e.g. Text Field) to the Extract XML widget instance’s input connection (lefthand side).
- Open the Extract XML instance’s interface by double-clicking on its icon on the canvas.
- In the XML Extraction section, insert the desired XML element
(here
w
). - Click the Send button (or make sure the Send automatically checkbox is selected).
- A segmentation containing a segment for each occurrence of the specified tag is then available on the Segment instance’s output connections; to display or export it, see Cookbook: Text output.
Comment¶
- The XML tags that have been retrieved are actually discarded from the resulting segmentation: only their content is included in the output.
- The attributes of the XML tags are automatically converted to annotations associated with the created segments.
- Note that it is only possible to extract instances of a single XML element
type at a time (here
w
). - However, it is possible to chain several Extract XML instances in
order to successively extract instances of different XML elements. For
example, a first instance to extract
div
type elements, a second to extractw
type elements, and so on. In this case, it is important to make sure that the Remove markup option is not selected.