[ ]
[ How did XML come to exist? ]  Some notes on the history of XML  [ ]
[ ]

Some days ago, I privately replied to a question Brad McCormick posted on comp.text.sgml: "Is SGML a dead language?" Brad asked me if I would be willing to have what I wrote publicly distributed, since he felt it had value as a contribution to the history of computing. I realized that my initial response was neither clear nor well argued, and that Brad's question was worth something better. May I request once again your kind attention, despite the additional inconvenience that this message is considerably longer than my previous one?

Thank you in advance. Here we go.

August, 1998
Laurent Sabarthez


Many people in the SGML community have had their attention caught recently by a number of apparently unrelated signals from organizations, journals, individuals, and software vendors. A paper in <TAG> about the end of SGML; a new peer newsgroup next to comp.text.sgml; SGML conferences renamed; SGML products re-packaged. That's not very much. That's enough to trigger lots of rumors, and this is neither new nor important. That's enough, however, to raise a number of legitimate questions, because the above-mentioned faint signals are just the visible part of a dramatic change in the SGML world.

This change has a name: it's XML [eXtensible Markup Language]. I don't care for the time being if XML is a plain SGML subset; or if it is an entirely new thing under SGML disguises. XML's birth is of utmost importance because it introduces completely new business perspectives and cultural paradigms into both the SGML world and the Web world. Regardless of its technical essence, XML is an economic and cultural *change*. It is not an evolution. Hence the questions, hopes, fears, and rumors, around it.


Information systems have their own evolution history, made of a succession of small, discrete steps. From time to time, however, an evolutionary step is understood, often in retrospect, as having had a much larger stride and impact than others, and above all, as being of a very different nature. Such evolutionary steps are actually breaks in an otherwise nearly linear evolution. They are *changes*, "revolutions" if you prefer. Such breaks change the way people think about information systems; they change the way information systems are applied to human endeavours; and of course, they change the economic deal in the information systems industry.

True changes in information systems have not been so many. In the recent past, I can see three of them: cheap and powerful personal computers in the eighties; new user's operating modes (GUIs) in the nineties; a worldwide network of computing systems today. Please observe that I am speaking of economic and cultural changes, not of technical ones. Economic and cultural changes may become real years after a technical revolution made them possible: it took 25 years to turn the Internet into a mass media.

Observe, too, that it would be over-simplistic to chop the information systems history into successive, disjoint "eras". The birth of a new economic and cultural paradigm does not imply that the old ones disappear. Rather, they would co-exist, at least for a period of time, which can be moreover very long. Old information systems precisely become socially deprecated as "old" because new paradigms exist to evaluate them, both economically and culturally; conversely, "new" information systems become praised because there is a social consent about them as being materializations of the new paradigms. If you can manage with over-simplifications, then "new" means "good" to "progressive" people, while it means "bad" to "conservative" ones.

Co-existence is not peace. Economic pressures with opposite goals are at work. On the vendors' side, the competition is between "progressive" and "conservative" products and services; between different "progressive" vendors; and between "conservative" vendors as well, who would compete about what and how to be conservative about! On the customers' side, from line employees to top management, different and often opposite interests are conflicting inside a single company or organization, regarding investment policies, users' training, operating and maintaining costs, ergonomics, and many more factors, up to and including personal motivations, tastes, prejudices, feelings, and emotions.


Back to SGML. Since my first professional exposure to SGML, around 1990, I've been fascinated by the way the SGML community has reacted to paradigmatic changes in information systems: personal computers, user-friendly interfaces, and now the Web. Symmetrically, I've been interested too in the way other people in the information systems milieu reacted to the SGML phenomenon. Both "reactive systems" originated in the conditions in which SGML has been put at work ten years ago.

At that time, SGML lived on mainframes and, more rarely, on mini-computers. Expensive and organization-centric machines, with very basic GUIs or no GUI at all. Yet at the same time personal computers had began their massive spread into work places. Cheap and... well, personal machines, with nice and easy-to-use GUIs. In the office next door, people could compare both environments, and make up their mind.

A conceptual gap was born. On one side, long-term investments, long and intricate design phases, concerns in reusability, mistrust of proprietary environments, expensive and slow training curves, data consolidation, presentation issues viewed as a separate ancillary matter, focus on structure and contents, etc... On the other side, fast short-term investment turn-over, quick-and-dirty design phases, litle reusability concerns, love for proprietary systems, fast and cheap training curves, data dispersal, presentation as the central issue, structure and contents haphazardly derived from presentation.

The spread of personal computers, GUIs, and extremely sophisticated low-end publishing software introduced conflicting values into the companies and organizations who were using SGML, or who planned to use it. SGML became, in many situations, extremely difficult to sell. Whizz-bang software paradigms had so much polluted people's, including managers', perceptions, judgements, and prejudices, that the lethal argument against SGML amounted to that it was not *immediately rewarding*, and especially not *visually rewarding*. Of course, a manager could not state such a silliness so crudely. Rather, he would argue about "ergonomics", "learning curves", and so forth.

I don't mean that technical issues are unimportant. To the contrary, technical improvements can make new solutions possible and desirable, they can make existing ones cheaper and easier, they can boost quality and shorten delays. However, human beings always evaluate technical matters through a complex filtering process, in which rational thinking (e.g. carefully weighting economic implications) interacts with irrational backgrounds and impulses (e.g. blindly sticking to a software brand). Collective evaluation processes -- and decisions -- may be even more erratic.

I am not arguing that personal computers and GUIs are inherently bad and useless. The point here is that their introduction into mass markets has been something radically new. Computer users had never been exposed before to such marketing pressures. Marketing hype turned everybody into a computer expert. Rationality levels have dramatically lowered regarding the evaluation of information systems: just listen at any of your colleagues (or yourself!) talking about her/his new laptop...

Another widespread argument against SGML has been that it is "complex and difficult". Yes, SGML is complex and diificult. MacOS is complex and difficult. The MS Windows API is complex and difficult. POSIX + sockets + TCP/IP + PPP + POP3 + MIME is complex and difficult. Anything powerful and generic is likely to be complex and difficult. Again, deciding to stay away from SGML and to use InterLeaf + Lisp, say, instead, is a matter of human choice. Such a choice can be defensible and make good sense. What is irrational is to argue about SGML difficulty and complexity on this occasion, thus inferring that InterLeaf + Lisp is simple and easy, *given equivalent pre- and post-conditions*. [Unclear: Do you mean here that people assume that both alternatives will have the same initial conditions and result in the same outcomes, so that the question is merely which will do it more easily, etc., since all other conditions are assumed to be equal?]

Somehow, many SGML users feel frustrated and guilty: They missed the First and Second Information Systems Revolutions. The SGML community is suffering a kind of "original sin" syndrome. I don't mean that SGML users are actual sinners in any respect! I mean that there is a *social feeling* (this implies a fundamental degree of irrationality) within the SGML community, and among the broader computing milieu regarding the SGML community, a social feeling of *lateness* about SGML, SGML providers, and SGML users. I know very well that SGML users have now their own whizz-bang software. But again, please note that I'm talking about something irrational: feelings. Nevertheless feelings are real. They have actual consequences, including economic ones.


Now the Third Revolution is on the way. The Web. Electronic mass business. The SGML community cannot afford to miss this one. Until now, successful SGML software vendors had a sales figure in the thousands (of units sold) worldwide (correct me if I'm wrong). SGML-on-the-Web means a jump to tens, maybe hundreds, of thousands. Nobody wants to miss that.

Unfortunately, before putting SGML on the Web (actually, before putting anything on the Web), SGML vendors had to overcome an absolute prerequisite: a "nihil obstat" from Microsoft. You don't want to invest time and money in a new Web product, if you know that Microsoft is cooking up its own (remember Netscape?). You don't want to commit yourself and your customers to a new protocol, language, format, standard, or whatever, if at Microsoft there is a big no-no (remember Java?). That's a huge problem. That's the *main* problem with SGML-on-the-Web. As you probably noticed, I do not think this is primarily a technical problem.

This leads us to SGML-at-Microsoft, a seldom evoked topic. Microsoft consistently ignored SGML. There have been many reasons to that. One of them, the most unseemly, but having had measurable consequences, is that SGML was said to come from IBM. It was almost a death sentence. This particular misconception, regarding SGML, is a part of a more general trend toward ignorance of foreign matters at Microsoft.

Whenever a company builds a strategy, this strategy ends up in some sort of embodiment into people's brains in the company. That's not necessarily a bad thing. But when the strategy mainly consists in fencing a fortified island, trying to attract and enslave customers into the fence, and fighting very hard to keep competitors away, the net result is *isolation*. Even highly talented and open-minded people at Microsoft (there are several) had a second-hand, rumor-based SGML knowledge. When the company eventually set up a small SGML working group in 1995, SGML specialists at Microsoft remained, so to speak, "in partibus".

Another reason to Microsoft's attitude toward SGML is that SGML is an ISO standard, and Microsoft does not love standards. True standards mean a high degree of user's freedom. You can count the standards Microsoft adheres to: less than ten. Microsoft reluctantly adopts a standard when doing otherwise has become really impossible, or would prove commercially detrimental. For example, Microsoft included TCP/IP and socket support into MS Windows years after public domain and third party software had appeared for these platforms. Microsoft did not suddenly fall in love with TCP/IP. It happened that so many Microsoft customers had connection needs to Unix sites they implemented with -- "horresco referens" -- non-Microsoft software, that significant market losses for Windows NT became very likely.

The last reason is that, if Microsoft have had to consider doing SGML at all, it would have been *against* the MS Office suite. A strategic nonsense. SGML users typically don't throw away their tools every summer like bathing suits, to buy new ones just because they are available. The MS Office suite was (and still is) aimed at low- to middle-end desktop publishing, because it is a true mass market with fast product renewal rates.

SGML paradigms do not apply there. One may well find the situation unfortunate, but the fact is: desktop publishing software users rarely bother about document reuse, content structuring, platform and vendor independence, etc. They care about low costs, ease of use, rapid learning curve, WYSIWYG presentation, "features". They are happy with built-in templates they can use inconsistently. They are happy if they can swap their files between MS Windows and MacOS. They are happy if they can cut and paste across files (that's apparently what "document reuse" means to them). They are happy if their printed pages look exactly like their displays. Full stop.

NOTE - Everybody has a personal repertoire of Microsoft horror stories. Here's my favorite: Three years ago, a small group of Microsoftees have had an in-house SGML training. One of them decided to practice a bit. He tried for days to shoehorn elements from the DocBook DTD into MS Word templates. Eventually he achieved something not bad at all, if one considers the initial challenges. The SGML evangelist then came in, looked at the job done, and said: "Great, indeed! Let's try another DTD now." The hacker's face expressed the utmost stupor and consternation: "Hey! Do you mean... do you mean there are *many* SGML DTDs?" he asked.


Besides Microsoft, SGML-on-the-Web had another nightmare: HTML. We all learned the First Web Axiom: "HTML is an SGML application". So why bother with SGML at all, since HTML is *already* SGML?

The often-described HTML limitations are quite paradoxical, since HTML is claimed to be an SGML application. Yet SGML allows you to create new document types, or to enrich existing ones, ad infinitum. It allows you notations for foreign objects, whatever they are, including active objects. It allows platform-independent entity management with exactly the granularity level you wish. It allows arbitrary character sets. It allows sophisticated HyTime links. It allows independent DSSSL rendering for any media. It allows strict conformance checking. And so on.

How come an SGML application (that's what HTML is supposed to be, isn't it?) cannot do anything like that? How come you *must* hack ugly and inherently non-standard features into HTML in order to get what you want? How come "enriched" versions of HTML become mutually incompatible, and are used mainly in a suicidal war between software vendors, since SGML is a vendor-independent standard?

The first HTML implementors did not feel necessary to dig very deep into ISO 8879:1986, and considering in retrospect their limited goals, they have been right. They paid lip service to SGML: as we all know, "HTML is an SGML application". At the time, it was just wishful thinking. Forgive them. But years later, the legend goes on, carefully maintained by the W3C.

Ladies and gentlemen, I'm proud to bring to public evaluation the Revised First Web Axiom: "HTML is an *empty* SGML application", because the most widely used HTML *systems* are *not* SGML systems. HTML systems do not support entities at all. HTML systems do not support markup declarations at all. Etc.

Around 1996, the Web had tens of millions of users. HTML and HTML systems were demonstrating everyday their weaknesses. Meanwhile, vendors of browsers, servers, messaging tools, and database systems, released new plug-ins, new add-ons, new bells-and-whistles, almost overnight, increasing the Web chaos. It was just about time to get rid of quick-and-dirty HTML implementations, to try to tame the anarchy, to redesign HTML from the ground up, to agree on really SGML-conforming HTML systems -- or maybe to design something entirely new, and unrelated to SGML, why not?

Instead, the W3C wasted its time in exhausting vendor wars. Because of unwillingness, true ignorance, short-sighted visions, stubbornness, the daunting SGML-on-the-Web issue has been consistently procrastined while HTML versions and revisions were piling up.


Eventually, the W3C could not delay any more giving a clear answer to the question: Should we do SGML-on-the-Web, or get rid of it? That's what the ERB [Expand the acronym!] was set up for.

Quite quickly, it appeared that doing SGML was impossible, because Microsoft (and others) disagreed. Getting rid of it, on the other hand, seemed reckless: the ISO label is highly praised by W3C members, and the SGML market is of significant weight. Therefore, the only solution was to stay somewhere in between. The ERB chosed to keep an SGML look-and-feel the SGML community could be fooled with, and to remove from SGML the things Microsoft disliked. That's XML. XML comes with a new First Web Axiom: "XML is an SGML subset". That's really an axiom, this time: take it or throw it, but don't try to prove it. Unfortunately, many people spent lots of time on comp.text.sgml, arguing that XML was not an SGML subset; or just the opposite as well. Too bad. Nobody tries to "prove" the Peano's axioms any more.

Incidentally, ISO 8879:1986 deals with conformance, variants, optional features...; it does not deal with "subsets". It is common practice in standard texts to make explicit provisions for variants, options, subsets, and the like. So standard implementors can chose to limit themselves to a variant, an option, a subset, or whatever limitation the standard permits, and still legitimately claim they are conforming to the standard. However, as long as a given standard text does not explicitly allow either variants, or options, or subsets, something claiming to be anyway a variant, an option, or a subset of the said standard, no longer conforms to the said standard. It's something *else*. Hence, the much-debated question: "Is XML an SGML subset?", has no sensible answer inside of the conceptual framework provided by ISO 8879:1986.

OK. XML is something else. Of course, it is *related* to SGML (much like cans of Budweiser are related to beer, would I say). So why is it so important to refer to XML as "an SGML subset"? Why is it so important to refer to Budweiser cans as "beer", giving that anybody knows that they are something *else*? Well, you probably already know the an$wer.

I think that some of the ERB members sincerely (and naively) expected that Microsoft would embrace SGML. At least SGML-on-the-Web. They were ready for bloody compromises. Maybe were they sharing the widespread belief that something that Microsoft does not embrace is something doomed to die (My personal view is rather that they kill everything they hug!). They wanted SGML to live. They didn't want SGML to share the fate many predict to Unix, the NC, and non-Intel chips, to name but a few. They didn't notice the trap leading straight to the elephants' graveyard.

I've said that before fighting against SGML, Microsoft has been ignoring it. Microsoft's scorn in the pre-ERB era contributed a lot to SGML users' frustration, as I've described it above. Now Microsoft loves XML. We can then understand why so many SGMLers have been eager to bury SGML, to promote XML, to maintain public confusion about both, and to do and tell anything Microsoft wants. At last, <CHIMES>Microsoft</CHIMES> moves and speaks. Mind you! The SGML community were expecting that since years! The end of ghetto life! Their pathetic relief has been literally *readable* into the enthusisastic prose from the former ERB, and in many postings to comp.text.sgml.

So, the ERB threw away a significant part of ISO 8879:1986. Roughly speaking, they simplified and froze the concrete syntax, and they allowed DTD-less document instance sets. (I don't want to deal with syntactic issues here, not because they are unimportant, but because I have to keep this document inside reasonable size limits.)

The central issue with XML, as far as SGML comparison is meaningful, is the alleged "degree of freedom" introduced by the ability to build entire XML systems without any DTD at all. Document type declarations are the heart of SGML. They have been put into the standard because they are invaluable tools in consistent design, reusability, exchange, data structuring, the very basic and strong concepts SGML has been built upon.

Of course, doing without any DTD is an XML option. XML does allow DTDs. But you can bet your parser (or purser) that 90% of the XML applications that will spread over the Web (and elsewhere) will be DTD-less applications, if there is the faintiest opportunity to do so. And opportunities to dispense from DTDs are plenty. DTD design is time-consuming. It is expensive. It is skill-demanding. It forces you to build long-term plans about your data. It is in many situations a death sentence against your previous structuring practices, or lack of them. All of these requirements go contrary to the "Web culture".

The Web culture is about "freedom". To most Web geeks, DTDs are just dull and boring stuff, intolerable obstacles to their creative freedom. Joe Webmaster wants bouncing logos, blinking commercials, 3D buttons, flashy fonts, background music like in Starwars. Joe Webmaster had extensive training with MS InterDev, JavaScript, ActiveX. He had no training with SGML and does not plan to have: old-timer stuff. Joe Webmaster's freedom is about what amazing font to use, how many frames to pack into a 21' screen, and so forth. Joe Webmaster's freedom is immense. Do you want an indication of how immense it is? Just have a look to the shelves at your favorite computer store. Awsome, isn't it? (Then try to find the SGML books -- if there are any....)

G.W.F. Hegel had his definition for freedom: "a well-understood necessity". Electronic mass business and professional intranets will need Joe Webmaster's skills and talents, sure. They will need much more: inventories, customer profile databases, payment tracking systems, transaction design and support, robust electronic forms, etc. I'm confident that smart software vendors will flood us with plenty of fully-featured XML tools for everything, from business cards to IRS forms. And I'm pretty sure that they will not support any kind of DTD, because a DTD is something *you* choose, not the software vendor.

Removing DTDs from SGML is a lethal strike. SGML environments *cannot survive* DTD removal, any more than an animal can survive removal of its skeleton. I'm afraid that many SGML addicts recently turned into XML enthusiasts do not yet realize that. SGML users will have to build a fence against DTD-less XML systems, if they want to keep their SGML systems and applications alive. DTD users will become a minority, an endangered species in the XML world (as the <TAG> article Brad cited says: like Latin).


ISO did a lot of work on network protocol standards. They were based on the well-kown "ISO OSI Model". Well, I'm not so sure if the OSI model is actually well-known. At least, it is very often cited. You could hardly find any text about some network protocol, which does not have a respectful reference to the OSI model, usually inside a short paragraph in the front matter, with a small picture of the seven little boxes quietly piled up. Some pages later, you eventually discover that the protocol under consideration has actually little, if anything, to do with ISO standards and recommendations. Yet the OSI model still remains a useful paragon for network protocol designers.

The truth is that the ISO OSI model is now dead. This certainly has technical justifications. But the main reason is that network hardware and software vendors no longer bother with standards, be them from the ISO or from other organizations. Vendors want do deliver new modems, new routers, new switches, new software stacks, at will. They call for standardization *afterwards*. Organizations who were traditionally responsible for network standardization no longer *issue* network standards. They merely *register* the ones kindly provided by the industry.

This situation has far-reaching consequences. Network standards (I would definitely prefer to call them "registrations") have become commercial weapons. Their life-cycles are short. They overlap and contradict one another. They no longer provide their users (end-users and systems implementors) stability and inter-operability. Making investment decisions based on such registrations has become unwise. New hardware and software must deal with competing and unstable registrations, and they are born crippled. Operating and maintaining costs raise to the unimaginated. Network registrations today mean network chaos.

The registration wars do not limit their battlegrounds to networks. They pervade application APIs, file formats, character encodings, fonts, graphic widgets, object models, database systems, programming languages, up to and including hardware interfaces and chipsets. They eventually reached the area of document management architectures and systems. It was unavoidable. There is no more room for real, collaborative, long-term standardization processes in the information systems industry. Forget about that.

I think ISO 8879:1986 is now in the situation of the ISO OSI Model a few years ago. New document management systems will pay lip service to it, just to get rid of it. We will see more and more vendors registering dozens of mutually incompatible ?ML specializations, each one religiously stating in its preamble: "This is an SGML subset". We will see more and more two-hatted salespersons for the same product, claimed to be SGML- or ?ML-compliant, depending on the sales target. We will see more and more quick-and-dirty "SGML Lite" implementations, "subsets", and "stripped-down applications", deviating more and more from the standard, until vendors feel free to not refer to SGML at all. We will see more and more naive users fooled by SGML look-alikes, who will contribute devastating SGML reputation. Not within ten years. Just now. The primary agency for the confusion of tongues (The Tower of Babel...) is now ourselves.

Meanwhile, some others will continue to do their best to build reliable, stable, open systems. To share and communicate, rather than to fight and dissimulate. To compete for usefulness and freedom, not for colonization and flashiness. To look for consent and harmony, not for coercion and chaos. ". . . and History continues . . ."

-= END =-

Laurent Sabarthez

Go/Return to SGML document introducing you to SGML.
Go/Return   to another intro to SGML: *Darwin Among the Machines* (Susanne Langer and SGML).
Read some of my (BMcC) thoughts about XML/HTML/SGML.
[ Learn about SGML! ]

Read document announcing SoftQuad Panorama SGML viewer.
Read document announcing HTML 3.2.
Go/Return to page introducing you to APL.
Return to Brad McCormick's Tower of Babel page.
What does "Lorem ipsum dolor..." mean?
[ What's new here? ]
What's new on this website?
Go to website Table of Contents.
Return to Brad McCormick's home page.
Go to site map.
[ ] [ Go to Site Map! ] [ ] [ Go to website Table of Contents! ] [ ] [ Go home! (BMcC website Home page!) ] [ ]
[ ]

[ Go to: The duty of communicators! ]
[ ]
Copyright © 1998 Laurent Sabarthez, & Brad McCormick, Ed.D.
bradmcc@cloud9.net [ Email me! ]
02 March 2006 (2006-03-02 ISO 8601)
[ ]
[ HTML 3.2 Checked! Test me! ]
[ ]
[ Download Panorama SGML Viewer! ]
Dead link
[ ]
[ Learn about SGML on the Web! ]
[ ]