Mavenized JHOVE

I’m not a Maven maven, but more of a Maven klutz. Nonetheless, I’ve managed to push a Mavenized version of JHOVE to Github that compiles for me. I haven’t tried to do anything beyond compiling. If anyone would like to help clean it up, please do.

This kills the continuity of file histories which Andy worked so hard to preserve, since Maven has its own ideas of where files should be. The histories are available under the deleted files in their old locations, if you look at the original commit.

Posted in News. Tags: , . Comments Off

JHOVE on Github

The JHOVE repository on Github is now live. The SourceForge site is still there and holds the documentation. The Github site is a work in progress.

Posted in Uncategorized. Comments Off

JHOVE, continued

There’s been enough encouragement in email and Twitter to my proposal to move JHOVE to Github that I’ll be going ahead with it. Andy Jackson has told me he has some almost-finished work to migrate the CVS history along with the project, so I’m waiting on that for the present. Watch this space for more news.

Posted in News. Tags: , . Comments Off

The state of JHOVE

As you may have noticed, I’ve been neglectful of JHOVE since last September, when 1.11 came out. Issues are continuing to arise, and people are still using it, and I’m not getting anything done about them.

The problem is that my current job has rather long hours, and when I come home from it, looking at more Java code isn’t at the top of my list of things to do. I’m very glad people are still using JHOVE, close to a decade after I started work on it as a contractor to the Harvard Library, but I’m not getting anything actually done.

It would help if there were more contributions from others, and its being on the moribund SourceForge isn’t helping. I think I could undertake the energy to move it to Github, where more contributors might be interested. There’s already a Mavenized version by Andy Jackson there, which doesn’t include the Java source code but provides some important scaffolding and pom.xml files. It probably makes sense to start by forking this. This migration should also make the horrible JHOVE build procedure easier.

If this is something you’d like to see, let me know. I’d like some reassurance that this will actually help before I start.

Posted in Query. Tags: , . Comments Off

FITS website

Last spring, I attended a Hackathon at the University of Leeds, which resulted in my getting a SPRUCE Grant for a month’s work enhancing FITS, a tool which at the time was technically open source but which the Harvard Library treated a bit possessively. After I finished, it seemed for a while that nothing was happening with my work, but it was just a matter of being patient enough. Collaboration between Harvard and the Open Planets Foundation has resulted in a more genuinely open FITS, which now has its own website. There’s also a GitHub repository with five contributors, none of which are me since my work was on an earlier repository that was incorporated into this one.

It really makes me happy to see my work reach this kind of fruition, even if I’m so busy on other things now that I don’t have time to participate.

Posted in News. Tags: , , , , . Comments Off

Ninjas, samurai, and artists

Lately I haven’t been posting as much on this blog. My professional responsibilities have shifted, and much as I still love the issues of file formats, I don’t have as much time to give attention to them. There are still general programming issues that are worth blogging about, though, and I’ll occasionally address these issues here, hopefully along with occasional file format posts.

cover thumbnail, Secrets of the JavaScript NinjaThis weekend I borrowed a book from the company library called Secrets of the JavaScript Ninja. It’s a better book than I expected from the title.

In an inspired error, the cover shows not a ninja but a samurai, with colorful armor, a banner, and a long sword. A ninja is a hit-and-run assassin; he shows up out of nowhere, attacks, and vanishes. A samurai is a dedicated soldier, the Japanese equivalent of a knight, and he follows a code of honor. The software development world has too many ninjas. The samurai is a better, if not ideal, model.

From the title I was expecting a cookbook, one of the many books that provide formulas to follow but no deep understanding. Instead, Secrets of the JavaScript Ninja is about understanding the language. Such books are even rarer for JavaScript than for most programming languages. I’d always tended to think of JavaScript as a half-baked derivative of Java, one where features such as classes, packages, and inheritance were left out to cut it down to a scripting language. This book, though, shows that it’s really quite a different language, a very powerful one in its own right. I still think the language has serious problems, the biggest being lack of standardization, but reading through the book, I’m learning how to think in JavaScript’s terms and to make use of features which other languages don’t have.

It’s not the language I want to talk about here, though; it’s the approach to any language or software technology. Too many programmers don’t have any deep understanding of their craft; they just have a bag of tools that they expect to solve problems for them. The worst can’t do much more than run a Web search for the code they need or beg on Stack Overflow for a solution to their problem. Once I actually had to fix some PHP that consultants from a big-name company had pasted from a website and didn’t know how to adapt to the problem at hand — and I don’t even know PHP! Those are your ninjas.

Even the samurai isn’t a great metaphor. They were a part of Japan’s entrenched feudal culture, and their opposition to capitalism promoted the warlike mindset that culminated in Japan’s role in World War II. Our metaphors should be based on creativity, not war and violence. We should think of ourselves as architects, sculptors, artists. We learn a craft and master the tools that go into it. There’s a real psychological affinity; I know a lot more software developers who are skilled musicians than are skilled fighters.

When I see a book called Secrets of the JavaScript Artist, then I’ll be pleased.

Update: I just noticed that the book itself says: “Ninjas were chosen for their martial arts skills rather than for their social standing or education. Dressed in black and with their faces covered, they were sent on missions alone or in small groups to attack the enemy with subterfuge and stealth, using any tactics to assure success; their only code was one of secrecy.” Just what you want in a programmer, right?

Posted in commentary. Tags: , , . Comments Off

In MS Word, the bullet bites back

There’s nothing new about Microsoft’s ignoring standards and ruining compatibility, but knowing the details is useful. One case I just learned about, from Mark Mandel, is the way it does bullet lists. This applies to the old Word DOC format on Mac OS X.

A 2008 OpenOffice Forum discussion explains the problem. If you create a bullet list in Word and import it into OpenOffice, the bullets are turned into something odd-looking. The file doesn’t use Unicode bullets, but instead uses the Microsoft Symbol font, which has its own nonstandard encoding. This applies only to bullets generated by list styles, not to ones you type in. On Windows, OpenOffice will display the files correctly, since it has access to the needed fonts and mapping.

Apparently the issue can also be manifested when creating a DOC file with OpenOffice and importing into Word, though I’m not clear on how that happens.

The problem is that Word 97/2000/2002 isn’t fully Unicode-compatible, mapping Unicode characters to the 8-bit encodings that its fonts need. This has presumably been fixed in the more recent versions that use DOCX (Office Open XML), but DOC is still widely used as an interchange format, so it’s an important issue. It’s also an illustration of the risks of using undocumented interchange formats.

The FITS Blitz

Back in May, after an enjoyable trip to the University of Leeds, I worked for a month on improving the Harvard Library’s FITS tool for combining the results of several file format identification and validation tools. The results were well received and the Harvard Library incorporated some of my work in the main line of FITS. Still, there were a lot of loose ends left and more work to be done.

Things are picking up again with a “FITS Blitz” that’s starting this week. Paul Wheatley writes that “in partnership with Harvard and the Open Planets Foundation (with support from Creative Pragmatics), SPRUCE is supporting a two week project to get the technical infrastructure in place to make FITS genuinely maintainable by the community. ‘FITS Blitz’ will merge the existing code branches and establish a comprehensive testing setup so that further code developments only find their way in when there is confidence that other bits of functionality haven’t been damaged by the changes.”

I’ve moved on to other things, so I won’t be able to participate, but I wish them every success.

Posted in News. Tags: , , , . Comments Off

Charles Stross on Microsoft Word

Not many people are brilliant writers and also have the technical knowledge to comment on file formats intelligently. When it does happen, it’s worth reading. So I recommend to you Why Microsoft Word Must Die by Charles Stross.

I’ve been on a digital preservation panel with Stross, and he can talk as expertly on the subject as I can. When it comes to Word, he knows a lot more about the format than I do, and he can demolish it more eloquently than I could even if I had the same level of knowledge.

Posted in Links. Tags: , . Comments Off

Tools come and go, effort must be ongoing

In a comment on a JHOVE bug, I said offhandedly that it’s approaching the end of its life. This caused a certain amount of concern in Twitter discussions. Andy said that software tools are one of the best ways to “preserve specific, reproducible knowledge about processes.” I don’t think dropping support of a rather dated tool is a big concern, though, as long as the code doesn’t vanish.

A software application is good for a certain number of years before it needs to be either left as legacy code or completely rewritten. Throwing out code and starting over takes a lot of effort, but it can result in much better code. I started on JHOVE in 2003 as a contractor to the Harvard University Libraries. After a few years it became clear that some of the design decisions weren’t ideal. Its all-or-nothing approach and its tendency to give up after the first error have long been obvious problems. The PDF module is a kludge built on a crock, and that’s without even talking about its profiles. The TIFF module, on the other hand, has a fair amount of elegance.

JHOVE2 was supposed to be the successor to JHOVE. Its creators learned from JHOVE and produced a better design. What they didn’t have was enough time and money to cover all the formats that JHOVE covered. I’ve continued to work on JHOVE because I know it inside and out. Someone else could pick up the work, but it might make more sense for a newcomer to the code to join the JHOVE2 effort instead. However, Maurice noted on Twitter that there hasn’t been much activity lately on JHOVE2 issues.

Both JHOVE and JHOVE2 were funded under grants. When the grant money ended, progress slowed down. The one-time grant model is the wrong way to fund preservation software. It’s an ongoing effort; new formats arise and old ones change, and there are always bugs to fix. What I’d like to see happen is for major libraries in the US to create an ongoing consortium for preservation work, similar to the Planets project in Europe. Or better yet, a consortium bringing together libraries all over the world. It wouldn’t take a lot from any individual institution. Its job would be to maintain information, preservation tools, test suites, and so on, on an ongoing basis. Instead of rushing to create a tool and then leaving it to freelancers like (formerly) me to maintain, it would support maintenance of tools for as long as it made sense and creation of new ones when it’s appropriate.

My voice isn’t enough to call anything like this into existence, but I can hope.

Posted in commentary. Tags: , , . Comments Off

Get every new post delivered to your Inbox.

Join 41 other followers