Mad File Format Science

Downgrading this blog

Posted on May 5, 2023 | 2 comments

As of July 2, 2023, this blog will no longer be available as madfileformatscience.garymcgath.com. It will continue to exist as fileformats.wordpress.com. I’m making very little use of the blog lately, and while I want the old posts to stay around, it’s not worth continuing to spend money to maintain it. Nothing should disappear, but links may break. I may still make the occasional post.

2 Comments

Posted in administrative

Email migration

Posted on April 22, 2023 | Comments Off

Recently I migrated my email to a new host and discovered a hazard I hadn’t known about before. I didn’t lose anything, but I came closer to it than I would have liked. Since it’s a personal digital preservation issue, it merits a rare post on this blog.

There are two widely used open protocols for email clients: POP3 and IMAP. The latter is far more widely used today, because it lets you synchronize your mailboxes across multiple devices. The downside is that your mail lives on the server and may or may not be saved on your client. If you change your mailbox to a new server, all your existing mail could disappear. As long as you haven’t terminated the old service, you should be able to get it back, but it could be a pain.

Continue reading →

Comments Off on Email migration

Posted in commentary

Tagged email, preservation

The curse of HTML mail

Posted on February 16, 2023 | Comments Off

It’s been most of a year since I last posted here, but I wanted to rant about HTML mail, and this is the right blog for it. People complain about the intrusiveness of Web tracking, but email tracking is even worse. I’ve noticed this especially after subscribing to a couple of Substack newsletters. They’re sent as HTML, and whenever possible, I click the link to the equivalent Web page, which is less intrusive. Every link in a Substack newsletter is a tracking link, with the odd exception of the link to the Substack page.

The links in a Substack newsletter don’t go to the target page but to a Substack redirection URL. Their purpose is to let Substack know about everything you click on. There are no terms or privacy policy in the email telling you what Substack uses the information for.

Continue reading →

Comments Off on The curse of HTML mail

Posted in commentary

Tagged email, HTML

The Secret Service text message situation

Posted on July 30, 2022 | Comments Off

The disappearance of the Secret Service’s text messages from January 6, 2021 is a data preservation issue, so I’m briefly reviving this blog from its long sleep to analyze it the best I can.

What we know

“Text messages” sent between Secret Service phones on January 6, 2021, during the unrest in Washington, DC, became unavailable within the bureau. News reporting has gotten so bad that it’s hard to find out just what this means; this CNN article contains more detail than most of the reports I’ve found.

Continue reading →

Comments Off on The Secret Service text message situation

Posted in Uncategorized

Tagged preservation

The Argoknot project: JSON song data

Posted on May 30, 2022 | Comments Off

I’ve got a new project which I ought to blog about somewhere, and it’s related to file formats, so it’s going here.

There have been projects to archive information about filk songs. They’ve tended toward wikis such as the Filk Discography Wiki, which contains information about filk recordings. Many filk albums have gone out of publication and might otherwise be forgotten, and the wiki keeps them in the cultural memory. Wikis are fine, and they’re easy to participate in with little technical knowledge. They’re also fragile; if the hosting for a wiki goes away, it might find a new home, but it might disappear if no one takes prompt action.

Structured information has advantages. It’s easy for anyone with a little file storage to keep a copy and give it to others. People can create their own repositories, perhaps of songs which they have published. It’s easy to search them and extract information, e.g., all the songs by an author. This isn’t to say that we should abandon wikis, but having structured information as well strengthens the effort. With a little work, it can be fed to wikis.

This is why I’ve created the Argoknot project. It’s a Python-based project to process song data in JSON format. As of this post, it can do one thing: convert CSV files to JSON. I’m planning to add the ability to convert XML files that use the MODS schema. There is a pile of such files in the MASSFILC Filk Book Index.

One of the project’s aims is to create a JSON nomenclature for the filk community. That will let other projects work with the same JSON files to create websites, import into wikis, or do lots of other things.

What I’m doing here is just a start, and it won’t get far without the participation of others. I encourage others in the filk community to join the effort, whether working directly on Argoknot, offering suggestions on how to organize the data, or creating other coding projects.

UPDATE: I’ve enabled discussions on the project and posted an initial message inviting comments and suggestions. So please comment there rather than here if it’s OK with you.

Comments Off on The Argoknot project: JSON song data

Posted in Uncategorized

Tagged JSON, metadata, software

How broadcast FM can wreck your receiving system

Posted on February 13, 2022 | Comments Off

Today I came upon some news weird enough to justify a post on this long-dormant blog. Ars Technica reports that it “began on January 30 and afflicted Mazdas from model years 2014 to 2017 when the cars were tuned to the local NPR station, KUOW 94.9. At some point during the day’s broadcast, a signal from KUOW caused the Mazdas’ infotainment systems to crash—the screens died and the radios were stuck on 94.9 FM.”

That shouldn’t be possible, right? A broadcast FM signal is just frequency-modulated audio. It might deafen you or damage the speakers, but it shouldn’t make the receiver stop working! Well, actually, it isn’t just audio. Broadcasters can optionally use the Radio Broadcast Data System (RBDS), which supports encoded digital data. It uses a 57 kHz subcarrier, well above the limits of human hearing. The data is encoded at 1187.5 bits per second, a strange-sounding number that yields 48 cycles of the subcarrier for every bit. Error correction codes bring the effective data rate down to 730 bits per second.

Continue reading →

Comments Off on How broadcast FM can wreck your receiving system

Posted in commentary, News

Tagged radio, software

JHOVE Tips for Developers

Posted on August 19, 2021 | Comments Off

I got a request for my ebook, JHOVE Tips for Developers. It’s no longer for sale on Smashwords, since I haven’t updated it since 2012, but if anyone wants it, you can download JHOVE Tips for Developers from this site.

Comments Off on JHOVE Tips for Developers

Posted in News

Tagged JHOVE

Looking at some ballot scanning issues

Posted on May 28, 2021 | Comments Off

The town of Windham, New Hampshire, became the site of a controversy when the results of a ballot recount by hand didn’t match the original results. This has some interesting implications for ballot scanning and errors in the process, so I think it’s fair game for this blog. I’ll have to get into the politics to give it context, though.

The four winning candidates for the state legislature were found to have gotten about 300 additional votes each, while the one who requested the recount got fewer, so the results weren’t affected. Still, it was appropriate to ask why the scanners’ total was so far off. The town accordingly had an audit conducted.

Continue reading →

Comments Off on Looking at some ballot scanning issues

Posted in commentary, News

Tagged scanners, scanning, voting

Remembering the DAT war

Posted on April 17, 2021 | Comments Off

Waking up briefly to mention an interesting article…

In 1986, the RIAA was outraged that Sony’s Digital Audio Tape (DAT) would let ordinary consumers record high-quality sound. The format was expensive and never caught on in the mass market, but it led to other digital audio formats. In retrospect, we’re lucky to have reached a state where we can record sound without mandatory DRM. (If you don’t believe me, recall that strong encryption was once outlawed.) The article mentions that “Computer manufacturers successfully lobbied to exempt CD-ROM drives from copyright protection technology.” Our technology would be much less advanced today if we had to jump through copy-protection hoops every time we used a computer.

Comments Off on Remembering the DAT war

Posted in commentary

Tagged audio, DAT, DRM

Price drop on Files that Last

Posted on March 2, 2021 | Comments Off

Files that Last: Digital Preservation for Everygeek has been out for quite a few years. While the relevant technology has changed in many ways, especially with the explosive growth in cloud storage, I think most of its advice is still useful. I’ve permanently dropped its price to $1.99, no discount code required. The price drop is immediate on Smashwords. It may take longer to percolate to other sources supplied by Smashwords.

Comments Off on Price drop on Files that Last

Posted in Promotion

Tagged books, ebooks, Files that Last, preservation

Mad File Format Science

Downgrading this blog

Email migration

The curse of HTML mail

The Secret Service text message situation

The Argoknot project: JSON song data

How broadcast FM can wreck your receiving system

JHOVE Tips for Developers

Looking at some ballot scanning issues

Remembering the DAT war

Price drop on Files that Last

Subscribe by email

Recent Posts