Libtiff and search engines

Posted November 24, 2009 by Gary McGath
Categories: commentary

Tags:

The current version of libtiff, a widely used C library for processing TIFF images, is found on remotesensing.org. The domain libtiff.org used to belong to the people who maintain libtiff but doesn’t any more. The holder of the domain claims to be “Lib Tiff” in Ottawa. It’s not a fraud or malware site, but it has an outdated version of libtiff. I don’t know what the domain holder’s game is, and I’m not sure anyone does. I can’t even see how it’s making money; maybe it has popups which my browser is suppressing?

Anyway, I got curious about how various search engines would do when I searched for “libtiff.” Here’s the rundown:

  • Google puts libtiff.org in first and second place and remotesensing.org in third, and it has numerous subsidiary links inits listing of libtiff.org.
  • Ask.com does the same, minus the subsidiary links. The fourth-place item is the Wikipedia entry, which correctly lists remotesensing.org (Google puts it fifth).
  • Yahoo puts remotesensing.org in first place, and the bogus site doesn’t show up at all on the first page of results.
  • Clusty.com puts remotesensing.org in first and the cheap imitation in second.
  • Dogpile puts the mutt first and the purebred down in eleventh place.
  • Alltheweb.com puts remotesensing.org first and doesn’t show the imitator in the first page of results.

It’s not clear exactly what this proves, except that the big names don’t always do the best.

ECA 2010

Posted November 17, 2009 by Gary McGath
Categories: News

Tags: ,

By way of Digitization 101: ECA 2010, the 8th European Conference on Digital Archiving, will be held in Geneva, Switzerland, on April 28-30, 2010. The announcement is in German; here’s a quick translation.

From April 28 through 30, 2010, the European Converence on Digital Archiving will take place in Geneva. This stands in the tradition of European archiving conferences of the last decade. With the accent on the digital, and archiving as a function rather than the archive as an institution, the conference will set new priorities. The future will be digital; we will maintain the analog tradition; the archive of the future must have a safe refuge for the analog and digital trails of the past. That is our responsibility.

 
We are sure that you can expect an attractive and rich conference program.

I know German, but not natively, so I offer my apologies for any clumsiness and mixed metaphors.

So is anyone reading this?

Posted October 30, 2009 by Gary McGath
Categories: administrative

Since I moved over from Blogspot, I’ve received 11 comments, or rather WordPress has marked 11 comments as spam and deleted them, usually before I could look at them. Other than a couple of pieces of email, I’ve been getting no feedback. The readership graph shows blips when I post something new, but I don’t know whether that’s actual people or not.

So I’d really like to hear from anyone who’s still reading this. Did I lose everyone with the host switch? is occasional news on file formats and JHOVE just too boring to read? Are legitimate comments disappearing down the maw of the antispambot? Or is everything I say so self-evidently true and complete that nothing more needs to be said? I rather doubt the last.

Microsoft to open up Outlook format

Posted October 29, 2009 by Gary McGath
Categories: News

Tags:

A report on CNET says that Microsoft will be publicly documenting the formats of .pst files used by Outlook. Microsoft’s Paul Lorimer is quoted as saying the format specification will be available “under our Open Specification Promise, which will allow anyone to implement the .pst file format on any platform and in any tool, without concerns about patents, and without the need to contact Microsoft in any way.” No timetable is given.

What ever happened to .SIT?

Posted October 28, 2009 by Gary McGath
Categories: commentary

Tags:

With the increasing use of ZIP compression on the Macintosh, the Stuffit or .SIT format has fallen into relative obscurity. But not only is it still around, its publishers claim it’s “the ultimate in compression.” Five to ten years ago, lots of computer products were promoted as “the ultimate.” But when the next revision is the new “ultimate,” and so is the one after that, the claim starts to look ridiculous, and most advertisers have dropped it.

Stuffit’s compression is, according to most studies, about as good as competing technologies. It has no claim on being “the ultimate.” Its ad in the MacConnection catalogue says that “Stuffit Deluxe(R) 2009 can compress files up to 98% of their original size.” This is a nicely ambiguous claim; does that mean that the compressed file is reduced by 98%, or that it’s 98% of its original size? The latter isn’t hard to achieve at all, and hardly worth bragging about. But it’s extremely rare that Stuffit, or any other compression, can reduce a file to 2% of its original size. Perhaps a file of all 1’s would get 98% reduction, but that’s seldom useful.

Stuffit once had the advantage of recognizing the two-fork file format of the Macintosh Classic OS. But now that virtually everyone has gone to OS X, which doesn’t use dual file forks, it’s just one more compression format.

Unicode 5.2.0

Posted October 15, 2009 by Gary McGath
Categories: News

Tags:

Unicode 5.2.0 is now out. It adds 6,648 new characters but still doesn’t officially include Klingon.

JHOVE2 at iPres

Posted October 8, 2009 by Gary McGath
Categories: News

Tags: ,

Unfortunately, I wasn’t in California for the post-iPres workshop on JHOVE2, but there is some information online. The JHOVE2 project presentations page includes a short and a long version of the slides. An early version of the code has been made available for testing and progress continues.

P2 registry

Posted October 7, 2009 by Gary McGath
Categories: News

Tags:

I’ve just come across yet another file format registry: the P2 Registry at the University of Southampton in the UK. It’s identified as a beta and was pretty slow when I tried it, but it has some interesting features, including risk assessments of formats. PRONOM and other data sources are used. There is a short PDF article on the aims of P2, which tells us that “the key feature of the registry is the ability to import arbitrary ontologies that can be used both to infer new facts from existing information as well as to align (in the case where two concepts are similar or the same in nature) information already in the registry.”

Its web user interface is minimal at the moment, but it’s worth keeping an eye on this.

The most annoying HTML tag

Posted September 29, 2009 by Gary McGath
Categories: commentary

Tags:

This past weekend, at a singing gathering, someone was trying to remember the words to “Flow Gently, Sweet Afton.” Trying to help, I did a search on my MacBook and found lots of matches. When I clicked on the first likely-looking one, it started playing the song, to my great embarrassment. This was in spite of the fact that I use NoScript to disable JavaScript, Java, and Flash on unfamiliar sites. (Here’s the offending page; it seems harmless in other respects, but I’ve added rel="nofollow" to the link anyway, so as not to give it any aid with search engines.)

The page uses a non-standard (in HTML 4 and earlier) but widely supported tag called embed. With the parameter autostart=true, this tag will immediately start up a plugin, which could be a sound or audio file or anything else, depending on what plugins are installed with your browser. The only way to prevent this with NoScript is to disable plugins across the board.

In HTML 5, the embed tag gains official status but there’s a standard way to disable the functionality:

When the sandboxed plugins browsing context flag is set on the browsing context for which the embed element’s document is the active document, then the user agent must render the embed element in a manner that conveys that the plugin was disabled. The user agent may offer the user the option to override the sandbox and instantiate the plugin anyway; if the user invokes such an option, the user agent must act as if the sandboxed plugins browsing context flag was not set for the purposes of this element.

A sandbox can be set for a frame, window, or tab. For a frame, it can be specified in the HTML, letting a page incorporate not fully trusted HTML from another site. The window or tab sandbox settings are evidently intended to be controlled by user preferences.

There’s no longer an autostart parameter. I think this means that the behavior is whatever the plugin creator wants; it could start up immediately or could provide a user interface with start, stop, and pause controls.

If future browsers let users control the plugin sandbox through preferences, that will mean one less way that web page authors can get around the user’s desire not to be annoyed.

Planets digital preservation conference

Posted September 28, 2009 by Gary McGath
Categories: News

Tags: ,

The Planets project will host a three-day training event on digital preservation in Bern, Switzerland, on November 17-19, 2009. According to the announcement: “Day 1 will consider the case for preserving digital objects, the technical issues involved, and the Planets framework, tools and services. On days 2 and 3 delegates will gain hands-on experience of working with Planets and a scenario (sample collection) to develop a preservation plan and preserve digital objects.”

Day 1 is recommended for “Heads of IT, Curation and Preservation, CEOs and preservation/curation/IT staff.” Days 2-3 are recommended for “digital preservation staff (e.g. librarians, archivists, digital librarians and archivists, repository managers, software developers, policy managers etc.).”

Attendance is limited.