Tag: xml

  • Document types, formats: file size, part 0

    Looking at relative file sizes for the same content.

    The following is by my understanding.

    One issue that’s long interested me is file size. I opened a plain-text document, then typed Hello world! and saved it. Next, I did the same for a Word (docx) document. Then, I also saved the docx as a pdf in Word. Finally, I compressed both the docx and the pdf to respective zip files.

    Here are the relevant file sizes:

    • .txt file: 12 bytes
    • .docx file: 13 kilobytes
    • .pdf file: 71.9 kilobytes
    • compressed .docx file: 10.4 kilobytes
    • compressed .pdf file: 60.6 kilobytes

    Today I looked into the .docx format specifically, why it might be so much bigger than its plain-text version.

    It turns out that the .docx format consists of numerous .xml files that describe said document. Said files, one might suspect, amount to overhead of about 13 kilobytes.

    Interesting, eh?

    Source:

    agiledocumentation.co.uk

    -JS

  • Formats: XML

    What is XML? It’s a document format.

    The following is how I understand it.

    XML is used to format documents that need to impart specific information. Somewhat like html, XML places the information between tags that classify it.

    An XML document will typically start with a document declaration such as <?xml version=‘1.0’ encoding=‘UTF-8’?>. For that declaration only, the question marks serve to nest it.

    Next, the XML document needs what’s called a root element. All other elements will appear within the root element. The root element must be unique at that level; there can’t be two.

    Finally, each element, including the root, must have an opening and closing tag. Tags can be nested. An opening tag and closing tag look like this:

    <mood_tag>happy</mood_tag>

    So, a very simple xml document might be like so:

    <?xml version=‘1.0’ encoding=‘UTF-8’?>

    <the_root>

    <an_element>Element1</an_element>

    <another_element>Element2</another_element>

    </the_root>

    XML code can be checked for following the rules (aka validated) at the site JSON formatter: XML Validator.

    Source:

    ibm.com: XML Syntax Rules

    -JS