XHTML.K _____________________________________ Differences between xhtml.k and xml.k (1) Usage of Latin-1 ISO-8859-1 character set based on the following parsed documents Latin-1 characters http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent Special characters http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent Symbols http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent (2) &#dddd; &#xHHHH; &#xhhhh; numerical entities are translated to &entity; references. Internally all Latin-1 entity references will be translated to 8-bit "\311"-style octals and back again to &entity; references. Exceptions - characters in the 16-bit range like for ex., λ λ will not be translated and stay as they are. - "unknown &entities;" will be translated to "unknown &entities;" (3) The octal representation of xml.k was discarded because the W3C specs only allow &#ddd; or &#xhh; numerical references. Besides that, reportedly Netscape and MS Word cannot cope with hex references. HTML editors like DreamWeaver also generate those opcode-like numerical references. Therefore all numerical references will be parsed by DX but never emitted by XD. Only *readable* &entity; references are to be found in a resulting XHTML text. (4) attribute=value pairs containing space chars in the value string are parsed correctly now. Also, value strings containing attr=value pairs themselves will be recognized. Valid white space after the last attr=value pairs is handled correctly, e.g., xhtml.k does no longer create dummy attributes tuples. "" __________='____________' _______='_________________________________' (5) Mixed-content as well as elements-only models are supported. If there is plain text as well as container tags on the same axis the plain text part will be represented by an empty-symbol tuple equivalent to <>text, like in

leftcenterright

(`p ((`;"left") (`strong;"center") (`;"right"))) (6) Although w3c-xhtml1.0 is not clear about the distinction between an empty string in and a NULL string in this implementation reduces empty container tags to empty tags according to the reuirements of older browsers, for ex.
instead of

(7) All empty tags emitted get an additional space between the tag name and the right terminating angle bracket, like in (8) will be suppressed as required (Not defined in the xhtml specs but nevertheless useful:) ASP/JSP/KSP inline code <%code%>, <%=expr%>, <%@directive%>, <%!stmt%> will be mapped to (`_;"[=@!]*code") tuples. This could also be rewritten for , but don't mismatch it w/ xml PIs. The feature can easily be disabled by removing the cloaking pp[] function. ___________________ Not implemented yet escaping for <% code %>, , etc. This is not a validating parser, however, the KML layer (see below) already knows which tags are valid with each XHTML DTD. ___________________ Why I wrote xhtml.k Last year I wrote KSP as a technology testbed in order to try out some web technologies w/o the penalty of large piles of conventional code or -even worse- no code at all. I never published the thing because it finally became was it was, ironically: just another feature-complete, but boring template engine. KSP basically consists of a standalone web application server written entirely in K with features like HTTP/1.1 support, request / response / server / session "objects", appl/page-level-security / url-rewrite, error trapping, multiple cookies handling, emebbeded browser w/ chunked transfer-encoding, mime-types as usual. The only part missing yet is an accelerator cache. Now, by the end of last year I detected LAML (Kurt Nørmark's Lisp abstracted markup language) and decided to write a piece of software based on its principles in K ("KML"), this being much more in the spirit of functional languages - and also much more fun to use. After a while I redesigned KML to support validating XHTML code. Whereas previous versions generated xhtml pages directly (by text manipulation), the next version should use a xhtml parser which makes page generation much more elegant and faster. xhtml.k as well as kml.k (coming soon) use the same internal data structures as proposed by Arthur Whitney's original xml.k parser. __________________ Thanks to Christian Langreiter from Austria (of the KXR alias XML-RPC for K fame) who participated in mail debates and tested all this stuff across European borders. __________________ Eberhard Lutz Frankfurt, Germany 2002-02-06