A way out of the increased interests and new requirement is to extract the semantics (the meaning of the content of a HTML Web page = generic markup) from its visual appearance (formatting = generic coding). This was one of the aims of CSS (Cascading Style Sheets), XHTML , XML and XSL to separate semantics from visual appearance.
SGML was available (Standard Generalized Markup Language ISO8879) that serves a meta language to define the semantics of the language being described. A practical example: SGML was used to describe HTML. SGML was too heavy, so it got reduced to the max and XML (Extensible Markup Language) was born. XML can be understood as subset of SGML, in fact every XML document can be considered to be a SGML document.
Document Typ Definition (DTD) files, contain the definition of a language. The definition of HTML is described in a DTD file that follows SGML. The elements in the DTD files define the tags in HTML.
Parser can be used to verify that the HTML file conforms to the definitions in the DTD file. Instead to look how to install and call a parser, type the URL into http://validator.w3.org/. Don't be surprised how bad the web pages are that you daily use. For CSS see http://jigsaw.w3.org/css-validator/.
Also browsers as firefox have the possibility to validate the (X)HTML being viewed add the plugins firebug an web developer (see http://chrispederick.com/work/web-developer/).
Document Object Model (DOM) shows in a tree model how the document is structured. Knowing the structure individual fields can be accessed.