PDF/A Seminar in Washington

Posted March 10, 2010 by Gary McGath
Categories: News

Tags: , ,

A seminar on PDF/A will be held in Washington, DC, on March 26. The registration fee is $125. PDF/A is a restricted subset of PDF designed to promote long-term data viability for the purpose of preservation.

The press release contains a bizarre statement:

“At this time, the use of PDF/A is not mandatory in the United States,” said Betsy Fanning, Director, Standards and Member Services, AIIM, “however, that is changing.” “We are learning of draft legislation that is being debated that will make the use of PDF/A mandatory for preserving electronic documents.”

Congress has neither the right nor the technical competence to order us to use particular file formats. Hopefully this was an out-of-context quote about the government’s own use of PDF/A, though even there legislation requiring a specific subset of a specific format would be very strange.

So what is HTML 5 exactly?

Posted March 7, 2010 by Gary McGath
Categories: News

Tags: ,

Paul Cotton, co-chair from Microsoft on the W3C HTML Working Group, has some interesting comments on exactly what people mean by “HTML 5.” This may help explain some odd statements about “HTML video” which I’ve commented on in recent posts. The interview includes other remarks on the status of HTML 5.

First, I believe that most people use the term “HTML 5” to refer to the HTML 5 specification currently being worked on by the HTML WG. The HTML 5 specification defines the syntax and the semantics of the elements and attributes in the HTML markup language and several of the APIs that are used to process HTML documents. Recently the HTML WG has started to break the HTML 5 specification into more modular and separate Working Drafts e.g. HTML+RDFa, HTML Microdata, and HTML Canvas 2D Context. The HTML WG is also publishing two additional documents to aid users of HTML 5: the HTML 5 differences from HTML4 specification and HTML: The Markup Language which is aimed at developers that produce HTML 5 output.

Each of these additional Working Drafts are still part of “HTML 5” and are all on track to become separate but related W3C Recommendations or Working Group Notes. I believe that the content of these WDs taken together will define the part of “HTML 5” being worked on by the HTML WG.

But I believe that some use the term “HTML 5” to refer also to the important related API specifications being worked on by the WebApps WG. The WebApps WG is chartered to create client-side APIs that can be used with the HTML markup language – in fact some of its specifications started as part of the HTML 5 specification but were migrated over to be separate modular specifications managed by the WebApps WG. In addition there are some very interesting APIs under development by the Device APIs and Policy Working Group which are related to HTML 5 since they can be used with the HTML language and in user agents.

Others use the term “HTML 5” to also include the ECMAScript-262 Language which defines the programming language that we use today to build dynamic web applications.

Flash “vs.” HTML: the shadowboxing continues

Posted February 24, 2010 by Gary McGath
Categories: commentary

Tags: ,

The shadowboxing between Flash and HTML 5 is getting pretty serious. A lot of people are using “HTML 5 video” as a shorthand for “non-Flash video technologies which HTML 5 facilitates,” and Adobe is clearly worried.

An article by Justin Nichols regards HTML 5 and Flash as competitors, and that article is showing a solid five-star rating on feeds.adobe.com, though it isn’t written by an Adobe employee, so it probably expresses a view that’s popular at Adobe. It refers to Flash as a “platform,” and that may be the key point; there’s an unstated suggestion that it can’t just live inside standardized HTML elements. But if it can’t, we’re in for still more rounds of browser incompatibility. Just as “the end of history” when the Soviet empire collapsed was a delusion, the “end of the browser wars” is most likely another.

A New York Times article on the lack of Flash on the iPad is entertaining for its disclaimer at the bottom. The body of the article says:

But concerns over the lack of Flash in the iPad and iPhone may be short-lived. Many online video sites have been experimenting with a new Web language that can support video, called HTML5. Unlike Flash, which is a downloaded piece of software that can interact with a computer’s operating system, HTML5 works directly in a Web browser. And although this new video format does not work in all browsers, it will allow iPhone and iPad users to enjoy more Web-based video content.

Then in a correction it notes that that was wrong:

An article on Monday about the absence of the multimedia software Flash in Apple’s new iPad tablet computer referred incorrectly to the Web language HTML5. While HTML5 can support video, it is not itself a video format. The article also misstated the ownership of HTML5 patents. HTML5, like other versions of Hypertext Markup Language, is open source; it is not owned by a group of companies, including Apple.

Can I hope they learned their error by reading this blog? Probably not. Even the disclaimer isn’t completely right; HTML 5 is a specification, not a program, so it’s meaningless to call it “open source.” Some implementations of it are open source, and others aren’t.

Standardization of the means of embedding video is a good thing. If that has Adobe worried it will face competition, that’s a good thing too.

iPRES 2010 call for papers

Posted February 5, 2010 by Gary McGath
Categories: News

Tags: ,

iPRES 2010 (September 19-24, Vienna) has issued a call for papers. Submissions are due by May 5, and final versions by July 11.

Flash “vs.” HTML? Not so.

Posted February 3, 2010 by Gary McGath
Categories: commentary

Tags: ,

CNET has a rather confused article titled “HTML vs. Flash: Can a turf war be avoided?” This is like asking whether a turf war can be avoided between mixing bowls and batter.

The article says: “Bruce Lawson, Web standards evangelist for browser maker Opera Software, believes HTML and the other technologies inevitably will replace Flash and already collectively are ‘very close’ to reproducing today’s Flash abilities.” Further on: “Perhaps the most visible HTML5 aspect is built-in support for audio and video.”

This is complete nonsense. HTML 5 does not include “built-in support” for video. All that it does is provide a standardized means for browsers to support it. The video and audio tags provide a standardized means of expressing video and audio content, but don’t define any means of interpreting the content. That’s left up to the browser, just as it is with HTML 4 with its lack of standardized media tags. The browser can support MPEG 4, Flash, Ogg, all of them, none of them, or something else entirely.

Perhaps author Stephen Shankland is thinking of a different issue. There are some Web pages whose content is made up entirely of Flash. If you bring them up on a browser where Flash support is lacking or disabled, you generally get a blank page, not even a clue about what’s wrong. This could be considered Flash vs. HTML competition, but it’s an area where Flash has no excuse for being there and deserves to be beaten. The appropriate use of Flash, to present animation and video, is actually better supported by HTML 5 than by earlier versions, and the idea that the technologies compete is meaningless.

FITS user guide

Posted January 25, 2010 by Gary McGath
Categories: News

Tags: ,

There’s now a user guide online for Harvard University Libraries’ File Information Tool Set (FITS). FITS extracts technical metadata using serveral different tools, including JHOVE, Exiftool, NLNZ Metadata Extractor, DROID, FFIdent, and File Utility.

Does anyone reading this know if FFIdent is still alive somewhere on the Web? A web search for it turns up nothing useful, and the number 1 hit is the FITS site itself.

JHOVE 1.5 — oops!

Posted December 23, 2009 by Gary McGath
Categories: News

Tags:

Argh! I always forget something in a JHOVE build, and carefully checking all the nitpicking things just means I forget the important ones.

The JHOVE 1.5 which I uploaded to SourceForge a few days ago had all the right sources, release notes, checksums, etc. … but it didn’t have up-to-date JAR files, which kind of defeats the whole point!!

This is now fixed. If you’ve already downloaded it, please download it again. Check your download against the corresponding MD5 file to be sure.

A happy holiday-of-your-choice to all!

JHOVE 1.5

Posted December 18, 2009 by Gary McGath
Categories: News

Tags:

JHOVE 1.5 is now out, and so far no one’s complained of anything missing. If you notice any problems, please comment.

Thanks to Thomas Ledoux, JHOVE now has an option to output TextMD metadata. There are minor bug fixes for PDF and UTF-8. Full details are in the release notes.

PASIG in Boston

Posted December 9, 2009 by Gary McGath
Categories: Personal

Tags: ,

I’ll be at the Sun PASIG (Preservation and Archives SIG) at Northeastern University tomorrow.

XSD 1.1 reaches last call status

Posted December 8, 2009 by Gary McGath
Categories: News

Tags: ,

W3C XML Schema Definition Language 1.1 has reached the status of Last Call Working Draft. The Last Call period ends at the end of December.