.. meta:: :description: Orange Textable documentation, convert XML tags into Orange Textable annotations :keywords: Orange, Textable, documentation, cookbook, convert, XML, tags, markup, annotations Convert XML tags into Orange Textable annotations ================================================= Goal ---- Convert XML markup into Orange Textable data structures such as segments and their annotations. Prerequisites ------------- Some text containing XML markup has been imported in Orange Textable (see :ref:`Cookbook: Text input `) and possibly further processed (see :ref:`Cookbook: Segmentation manipulation `). Ingredients ----------- ============== =================== **Widget** :ref:`Extract XML` **Icon** |extract_xml_icon| **Quantity** 1 ============== =================== .. |extract_xml_icon| image:: figures/ExtractXML_36.png Procedure --------- .. _convert_XML_tags_into_Orange_Textable_annotations_fig1: .. figure:: figures/convert_xml_tags_widget_interfaces.png :align: center :alt: Convert XML tags into Orange Textable annotations with an instance of Extract XML :scale: 80% Figure 1: Convert XML tags into Orange Textable annotations with an instance of :ref:`Extract XML` 1. Create an instance of :ref:`Extract XML` on the canvas. 2. Drag and drop from the output connection (righthand side) of the widget instance that emits the data containing XML markup (e.g. :ref:`Text Field`) to the :ref:`Extract XML` widget instance's input connection (lefthand side). 3. Open the :ref:`Extract XML` instance's interface by double-clicking on its icon on the canvas. 4. In the **XML Extraction** section, insert the desired **XML element** (here ``w``). 5. Click the **Send** button (or make sure the **Send automatically** checkbox is selected). 6. A segmentation containing a segment for each occurrence of the specified tag is then available on the :ref:`Segment` instance's output connections; to display or export it, see :ref:`Cookbook: Text output `. Comment ------- * The XML tags that have been retrieved are actually discarded from the resulting segmentation: only their content is included in the output. * The attributes of the XML tags are automatically converted to annotations associated with the created segments. * Note that it is only possible to extract instances of a single XML element type at a time (here ``w``). * However, it is possible to chain several :ref:`Extract XML` instances in order to successively extract instances of different XML elements. For example, a first instance to extract ``div`` type elements, a second to extract ``w`` type elements, and so on. In this case, it is important to make sure that the **Remove markup** option is *not* selected. See also -------- * :doc:`Getting started: Converting XML markup to annotations ` * :ref:`Reference: Extract XML widget ` * :ref:`Cookbook: Text input ` * :ref:`Cookbook: Segmentation manipulation ` * :ref:`Cookbook: Text output `