AGOCG logo
Graphics Multimedia VR Visualization Contents
Training Reports Workshops Briefings Index
Back Next

Conversion Issues and Tools

John Knight's paper presented a useful case study of the need for file conversion.

Electronic Publishing over the World Wide Web

Jon Knight

Publishers typically hold documents in an electronic format at some point prior to the production of paper based journals and books. Often this is in the form of author supplied electronic documents. These can be in any number of formats, usually dictated by what the publisher is prepared to handle. Some publishers accept files from some of the "heavyweight" professional word processing systems such as Microsoft Word or WordPerfect. Others, typically in the more scientific and technical fields, prefer the TeX and/or LaTeX typesetting language. PostScript files are also sometimes taken, although these are usually only used for review purposes with the data being re-entered when a document is accepted for publication. A very few publishers accept SGML directly from authors, probably due to the relative lack of widely deployed SGML authoring tools.

If the publisher wishes to make his documents available via the World Wide Web, it is to his advantage to make use of one of the widely deployed document formats so that the widest user base possible will be able to access his information with their existing tools. The three most common document formats in the Web are plain ASCII text, HTML marked up documents and PostScript files. Each has advantages and disadvantages:

In Project ELVYN, a research project funded by the British Library Research and Development Department, the Institute of Physics Publishing agreed to allow electronic versions of an existing paper journal to be delivered to a number of sites in the UK and Europe. Each site was free to choose between TeX, SGML and PostScript for the document format delivered by the publisher and how this was delivered to its patrons. At Loughborough University of Technology we opted for the SGML format and devised a conversion process to generate HTML documents which could be viewed using normal WWW browsers such as NCSA Mosaic or MacWeb.

This project has demonstrated some of the short comings of the current HTML markup languages and the underlying HTTP transfer mechanism. Specifically, as the journal was very technical, the lack of mathematics and table generating constructs in HTML lead to the use of a large quantity of inlined bitmaps generated from TeX codes embedded in the publisher's SGML. This has resulted in unacceptably long downloading times for each page, even if each journal article is split into a number of smaller hyperlinked documents based on the logical sections in the paper. It has shown the need for multiple objects to be retrieved with one HTTP transaction using the MIME multipart response, as much of the overhead is connection setup (especially as we have to have identity information logged for usage profiling in the project).

It would also be desirable for a standard vector drawing package to be embedded in popular WWW browsers in much the same way as GIF and X bitmap rendering engines are. Vector graphics can deal with a whole class of figures (such as graphs), can be scaled and printed accurately and may not take up as much bandwidth to transmit as equivalent bitmaps. However, HTML does seem to be a good compromise between the simplicity of plain ASCII and the full scale presentation abilities of PostScript. The use of SGML as the publisher's own format also made conversion relatively straight forward as much of the mark up was structural in nature and mapped easily into the available elements in HTML's DTD.

In his presentation Jon also made the following points:

The project used the Copenhagen SGML Tool (CoST)

"Classic" HTML has no markup for maths and tables (this is coming in HTML+). However, neither did the publisher's own DTD! The maths and tables in the publishers SGML source files appear as embedded TEX codes. The CoST processing script strips the TEX codes for maths and tables out into separate files, pro cesses them with TEX and then converts the resulting DVI files into X bitmaps for inlining in the HTML documents.

The figures are supplied as TIFF files. The filenames are included in the publisher's source SGML documents and are used to generate hyperlinks to the external figures. These may soon also appear as thumbnails inlined in the documents.

Points Raised in this Project:


Conversion Tools

Chris Osland

In his presentation Chris discussed a number of conversion tools. Many of these are public domain and are best found using network tools such as Archie. The choices when doing conversion are:
  1. Use convertors built in to applications
  2. Use convertors built in to applications
  3. Use separate tools

RALCGM

This is held at:
UMXFE.CC.RL.AC.UK:/PUB/GRAPHICS/RALCGM
It allows conversion from CGM in all 3 encodings to CGM in a different encoding or to PostScript, EPS, HPGL and X.

MPEG

Examples can be found in:

JPEG

Conversion

We need to consider:

Tools

The available tools include: URT is available via:
SRC.DOC.IC.AC.UK:

/COMPUTING/OPERATING-SYSTEMS/LINUX/SUNSITE.UNC-MIRROR/APPS/GRAPHICS/
HREF="ftp://src.doc.ic.ac.uk/computing/operating-systems/linux/sunsite.unc-mirror/apps/graphics/urt-3.1b-bin.tar.z">URT-3.1B-BINN.TAR.Z

and

IACRS1.UNIBE.CH:/PUB/
URT-3.1A.TAR.Z

Back Next

Graphics     Multimedia      Virtual Environments      Visualisation      Contents