Tag Archives: JHOVE

JHOVE Tips for Developers

I got a request for my ebook, JHOVE Tips for Developers. It’s no longer for sale on Smashwords, since I haven’t updated it since 2012, but if anyone wants it, you can download JHOVE Tips for Developers from this site.

Aside

JHOVE 1.22 is now available from OPF.

JHOVE 1.22 Release Candidate 2

JHOVE 1.22 Release Candidate 2 is available today (April 2).

An issue which was noted but isn’t fixed in this release is the handling of the command line parameters. I don’t think that code has changed significantly since I worked on it. It’s so old that it was already there when I took over the project in 2005, so don’t blame me. :) Hopefully version 1.23 will have revamped command line handling using a modern code library.

JHOVE online hack week

Open Preservation Foundation has scheduled an online hack week for JHOVE. The focus for this one will be on development. Another hack week is planned for September, focusing on documentation. JHOVE just keeps going and going, and this is a chance for volunteer Java developers to reduce its issue list.
JHOVE logo

Introductory JHOVE workshop, January 25, 2019

JHOVE is still alive and active! The Open Preservation Foundation is holding a workshop on “Getting Started with JHOVE” on January 25, 2019 in the Hague, Netherlands. The announcement says, “This workshop is aimed at beginners, or anyone who is new to JHOVE.”

OPF members get priority for registration.

How to approach the file format validation problem

For years I wrote most of the code for JHOVE. With each format, I wrote tests for whether a file is “well-formed” and “valid.” With most formats, I never knew exactly what these terms meant. They come from XML, where they have clear meanings. A well-formed XML file has correct syntax. Angle brackets and quote marks match. Closing tags match opening tags. A valid file is well-formed and follows its schema. A file can be well-formed but not valid, but it can’t be valid without being well-formed.

With most other formats, there’s no definition of these terms. JHOVE applies them anyway. (I wrote the code, but I didn’t design JHOVE’s architecture. Not my fault.) I approached them by treating “well-formed” as meaning syntactically correct, and “valid” as meaning semantically correct. Drawing the line wasn’t always easy. If a required date field is missing, is the file not well-formed or just not valid? What if the date is supposed to be in ISO 8601 format but isn’t? How much does it matter?
Continue reading

What are “positives” in format validation?

Articles about JHOVE, such as Good GIF Hunting, grab my attention for obvious reasons. This article talks about false positive and negative results, and got me to thinking: What constitutes a “positive” result in file format validation? There are two ways to look at it:

  1. The default assumption is that the file is of a certain format, perhaps based on its extension, MIME type, or other metadata. The software sets out to see if it violates the format’s requirements. In that case, a positive result is that the file doesn’t conform to the requirements.
  2. The default assumption is that the file is just a collection of bytes. The software matches it against one or more sets of criteria. A positive result is that the file matches one of them.

Continue reading

JHOVE webinar

An Open Preservation Foundation webinar, “Putting JHOVE to the acid test: A PDF test-set for well-formedness validation in JHOVE,” will be held on November 21, 10 AM GMT (that’s 11 AM in Central Europe and a ludicrous 5 AM or earlier in the US).
Continue reading

JHOVE online hack day

My venture into the Techno-Liberty blog didn’t work so well. In fact, I’m getting more views on this blog, in spite of not having posted in months, than I got on my best days on the other blog. So … I’m back.

JHOVE is still doing well too, thanks to excellent work by Carl Wilson and others at the Open Preservation Foundation. There will be an online hack day for JHOVE on April 27. The aim is to find ways to improve JHOVE by improving error reporting, collecting example files, and documenting the preservation impact of JHOVE validation issues. (I think that last one means “Why does McGath’s PDF module suck?” :)

The time listed is 8 AM-8 PM. I asked what time zone that is, and was told it means any and all, from New Zealand the long way around to Hawaii.

Last time I said I’d drop in and didn’t really manage to. This time I won’t make promises, but I’ll try to be around in some form. If nothing else, people can ask me questions about JHOVE in the comments.

JHOVE Online Hack Day

I’ve just learned that the Open Preservation Foundation is hosting a JHOVE Online Hack Day on October 11. I’m flattered people are still interested in the work I started doing over a decade ago, though getting some paying work would be far more satisfying.
Continue reading