Blog Posts

Blog posts about technology for scholarly publishing.

Unicode in DOIs

December 22, 2024

The previous post looked at how long DOIs are. One of the questions was:

Do UTF-8 encodings from Unicode characters make any difference on the statistics around DOI length?

How long is a DOI?

December 22, 2024

In 2024 DataCite released their first public data file. It’s easy to get a copy. Crossref have made a data dump available for the past few years.

Having both files available opens up some interesting possibilities in comparing and combining the two data sources.

The most obvious place to look is the DOIs themselves…

… and the simplest question you could ask is “How long is a DOI?”

SPPIES: A slack group for Scholarly Programmers

November 4, 2024

Are you interested in the technical side of scholarly publishing technology and infrastructure? If so, you’re welcome to join the SPPIES Slack group.

SPPIES is short for Scholarly Publishing Programmers and Infrastructure Enthusiasts, in homage to programmers’ love of acronyms. One of the ‘P’s is silent.

Falsehoods Programmers believe about DOIs

October 30, 2024

DOIs, or Digital Object Identifiers, are everywhere, for a given value of ’everywhere’. They are the identifiers used to identify and link research outputs, and a lot more besides.

Humans are good at spotting patterns, and with something as ubiquitous as DOIs, there are plenty of patterns to spot. However, with hundreds of millions of DOIs and decades of history, it pays not to make generalisations.

These all cropped up in my 10 years at Crossref. Either observed in the scholarly community using DOIs, or when writing software to find and handle DOIs.

Appropriately Technical

October 1, 2024

The word ’technical’ hasn’t always had a great reputation. Think of phrases like ’technically correct’, ’technical details’, or ‘acquitted on a technicality’. It’s easier to think of more negative uses than positive ones. I think there are a couple of reasons for that.

Five principles for community altmetrics data

May 29, 2018

I presented these five principles at the altmetrics18 workshop. You can read the paper submitted to the workshop here. This post is a few years old, but all the ideas still stand up. At the time I was building Crossref Event Data, and discussing what it would take to build an data model that would support community-generated bibliometrics. A lot has changed since, but I think the principles are still relevant today.