Disruptive Library Technology Jester

Engaging with Open Source Technologies

Engaging with Open Source Technologies

These are the presentation notes for the
Engaging with Open Source Technologies
presentation during the
Open Source Publishing Technologies: Current Status and Emerging Possibilities
webinar on Wednesday, August 14, 2019.
Webinar Description
This session will focus on discussions of open source publishing platforms and systems. What is the value proposition? What functionalities are commonplace? Where are the pitfalls in adoption and use by publishers or by libraries? What potential is there for scholarly societies who are similarly responsible for publication support and dissemination? Given the rising interest in open access and open educational resources, this session will offer professionals a sense of what is available, a sense of practical concerns and a general sense of their future direction.
Talk Abstract
An open source project that focuses only on the code is missing out on some of the biggest opportunities that the open source philosophy offers. To be sure, developing software with an open source philosophy brings a diversity of knowledge and shares the development burden over a wide group. But a community that embraces that philosophy in the conception, design, specification, and development of a project can build exceptionally useful software and a fulfilling experience for all involved. This portion of the program  explores some of the structures and processes found in successful open source communities using examples from projects inside and outside of field.
Slides
PDF of slides
Resources
Arp, Laurie Gemmill, and Megan Forbes. “It Takes a Village: Open Source Software Sustainability,”
LYRASIS
, February 2018.
https://doi.org/10.7916/D89G70BS
Fitzgerald, Brian. (2006). “The Transformation of Open Source Software.”
MIS Quarterly
,
30
(3), 587.
https://doi.org/10.2307/25148740
Maxwell, John W,
et al
«Mind the Gap: A Landscape Analysis of Open Source Publishing Tools and Platforms,» July 2019.
https://mindthegap.pubpub.org/
Photo/Illustration Acknowledgments
Slide 1: “
Codex Claustroneoburgensis 980
” from College of Saint Benedict & Saint John’s University via DPLA
Slide 10: “
Agile Project Management by Planbox
” via Wikimedia Commons
Slide 15: “
kiyomi gets chin scratches in PHX airport pet relief area
” by Taro the Shiba Inu via Flickr
Slide 16: “
Sunset
” from the National Archives and Records Administration via DPLA
Key Quotations from Resources
Brian Fitzgerald in 2006 wrote of a significant shift in how open-source software projects were being considered and operated. Fitzgerald noted that the rise of successful open-source software (which he called “OSS 1.0”) was characterized by self-organized, Internet-based projects that gathered loose communities around sheer willingness to participate. Fitzgerald identified a newer mode, which he called “OSS 2.0,” characterized by “purposeful design” and institution-sponsored “vertical domains,” and much more likely to include paid deve…

Reflections on «Responsibilities of Citizenship for Immigrants and our Daughter»

Reflections on «Responsibilities of Citizenship for Immigrants and our Daughter»

Eighteen years ago, on Friday, September 7th, 2001, I was honored to be asked to participate in a naturalization ceremony for 46 new citizens of the United States in a courtroom of
Judge Alvin Thompson
in Hartford, Connecticut.
I
published
those remarks on a website that has long since gone dormant.
In light of the politics of the day, I was thinking back to that ceremony and what it meant to me to participate.
I regret the corny reference to
Star Trek
, but I regret nothing else I said on that day.
I titled the remarks «Responsibilities of Citizenship for Immigrants and our Daughter».
Good afternoon. I’m honored to be here as you take your final step to become a citizen of the United States of America. My wife Celeste, who will soon give birth to another new American citizen, is here to celebrate this joyous occasion with you. And if you’ll pardon the musings of a proud soon-to-be father, I would like to share some thoughts about citizenship inspired by this ceremony and the impending arrival of our first child.
Our daughter will be a citizen by birth, but you have made a
choice
to become an American. This choice may or may not have been easy for you, but I have the utmost respect for you for making that choice.
I don’t know what compelled you to submit yourself to the naturalization process — perhaps economic, political, social, or religious reasons. I have to think that you did it to better your life and the lives of your family. But you should know that the process does not stop here.
Along with the rights of citizenship come the responsibilities expected of you. Perhaps you are more aware of these responsibilities than I given your choice to become a citizen, but please allow me to enumerate some of them. Exercise your right to be heard on matters of concern to you. Vote in every election that you can. When asked to do so, eagerly perform your duty as a member of a jury. Watch what is happening around you, and form your own opinions. Practice your religion and respect the right of others to do the same. These are the values we will try to instill in our daughter; I hope you take them to heart, instill them in your family members, and inspire your fellow citizens to do the same.
But as you take this final, formal step of citizenship, be aware that becoming an American does not mean you have to leave your native culture behind. A part of American culture is the 1960’s show
Star Trek
, which promoted the concept of IDIC:
Infinite Diversity in Infinite Combinations
. In that futuristic world, diverse cultures and ideas are respected with the realization that society is stronger because of them. While we cannot claim to have reached that ideal world, one can say that the American Dream is best realized when our diversity is celebrated and shared by the members of this country. My daughter will be the celebration of that diversity: the product of Irish, …

Publishers going-it-alone (for now?) with GetFTR

Publishers going-it-alone (for now?) with GetFTR

In early December 2019, a group of publishers announced
Get-Full-Text-Research
, or GetFTR for short.
I read about this first in Roger Schonfeld’s »
Publishers Announce a Major New Service to Plug Leakage
» piece in
The Scholarly Kitchen
via Jeff Pooley’s
Twitter thread
and
blog post
.
Details about how this works are thin
, so I’m leaning heavily on Roger’s description.
I’m not as negative about this as Jeff, and I’m probably a little more opinionated than Roger.
This is an interesting move by publishers, and—as the title of this post suggests—I am critical of the publisher’s «go-it-alone» approach.
First, some disclosure might be in order.
My background has me thinking of this in the context of how it impacts libraries and library consortia.
For the past four years, I’ve been co-chair of the
NISO Information Discovery and Interchange topic committee
(and its predecessor, the «Discovery to Delivery» topic committee), so this is squarely in what I’ve been thinking about in the broader library-publisher professional space.
I also traced the early development of RA21 and more recently am volunteering on the SeamlessAccess Entity Category and Attribute Bundles Working Group; that’ll become more important a little further down this post.
I was nodding along with Roger’s narrative until I stopped short here:
The five major publishing houses that are the driving forces behind GetFTR are not pursuing this initiative through one of the major industry collaborative bodies. All five are leading members of the STM Association, NISO, ORCID, Crossref, and CHORUS, to name several major industry groups. But rather than working through one of these existing groups, the houses plan instead to launch a new legal entity.
While [Vice President of Product Strategy & Partnerships for Wiley Todd] Toler and [Senior Director, Technology Strategy & Partnerships for the American Chemical Society Ralph] Youngen were too politic to go deeply into the details of why this might be, it is clear that the leadership of the large houses have felt a major sense of mismatch between their business priorities on the one hand and the capabilities of these existing industry bodies. At recent industry events, publishing house CEOs have voiced extensive concerns about the lack of cooperation-driven innovation in the sector. For example,
Judy Verses from Wiley spoke to this issue in spring 2018
, and
several executives did so at Frankfurt this fall
. In both cases, long standing members of the scholarly publishing sector questioned if these executives perhaps did not realize the extensive collaborations driven through Crossref and ORCID, among others. It is now clear to me that the issue is not a lack of knowledge but rather a concern at the executive level about the perceived inability of existing collaborative vehicles to enable the new strategic directions that publishers feel they must pursue.
This is the publishers going-it…

What is known about GetFTR at the end of 2019

What is known about GetFTR at the end of 2019

In early December 2019, a group of publishers announced
Get-Full-Text-Research
, or GetFTR for short.
There was a heck of a response on social media, and the response was—on the whole—not positive from my librarian-dominated corner of Twitter.
For my early take on GetFTR, see my December 3rd blog post »
Publishers going-it-alone (for now?) with GetFTR

As that post title suggests, I took the five founding GetFTR publishers to task on their take-it-or-leave-it approach.
I think that is still a problem.
To get you caught up, here is a list of other commentary.
Roger Schonfeld’s December 3rd »
Publishers Announce a Major New Service to Plug Leakage
» piece in
The Scholarly Kitchen
Tweet from
Herbert Van de Sompel
, the lead author of the OpenURL spec, on
solving the appropriate copy problem
December 5th post »
Get To Fulltext Ourselves, Not GetFTR.
» on the
Open Access Button
blog
Twitter thread on December 7th between
@cshillum
and
@lisalibrarian
on the
positioning of GetFTR in relation to link resolvers and an unanswered question about how GetFTR aligns with library interests
Twitter thread started by
@TAC_NISO
on December 9th
looking for more information
with a link to
an STM Association presentation
added by
@aarontay
A tree of tweets starting from
@mrgunn
‘s
[I don’t trust publishers to decide] is the crux of the whole thing.
In particular, threads of that tweet that include
Jason Griffey of NISO saying he knew nothing about GetFTR
and
Bernhard Mittermaier’s point about hidden motivations behind GetFTR
Twitter thread started by
@aarontay
on December 7th saying
«GetFTR is bad for researchers/readers and librarians. It only benefits publishers, change my mind.»
Lisa Janicke Hinchliffe’s December 10th »
Why are Librarians Concerned about GetFTR?
» in
The Scholarly Kitchen
and take note of the follow-up discussion in the comments
Twitter thread between
@alison_mudditt
and
@lisalibrarian
clarifying
PLOS is
not
on the Advisory Board
with some
@TAC_NISO
as well.
Ian Mulvany’s December 11th »
thoughts on GetFTR
» on ScholCommsProd
GetFTR’s December 11th »
Updating the community
» post on their website
The Spanish Federation of Associations of Archivists, Librarians, Archaeologists, Museologists and Documentalists (ANABAD)’s December 12th »
GetFTR: new publishers service to speed up access to research articles
» (original in Spanish,
Google Translate to English
)
December 20th news entry from eContent Pro with the title »
What GetFTR Means for Journal Article Access
» which I’ll only quarrel with this sentence: «Thus, GetFTR is a service where Academic articles are found and provided to you at absolutely no cost.» No—if you are in academia the cost is born by your
library
even if you don’t see it. But this seems like a third party service that isn’t directly related to publishers or libraries, so perhaps they can be forgiven for not getting that nuance.
Wiley’s
Chemistry Views
news post o…

Managing Remote Conference Presenters with Zoom

Managing Remote Conference Presenters with Zoom

Bringing remote presenters into a face-to-face conference is challenging and fraught with peril.
In this post, I describe a scheme using
Zoom
that had in-person attendees forgetting that the presenter was remote!
The
Code4Lib conference
was this week, and with the
COVID-19 pandemic
breaking through many individuals and institutions made decisions to not travel to Pittsburgh for the meeting.
We had an unprecedented nine presentations that were brought into the conference via Zoom.
I was chairing the livestream committee for the conference (as I have done for several years—skipping last year), so it made the most sense for me to arrange a scheme for remote presenters.
With the help of the on-site A/V contractor, we were able to pull this off with minimal requirements for the remote presenter.
List of Requirements
2
Zoom Pro
accounts
1 PC/Mac with video output, as if you were connecting an external monitor (the «Receiving Zoom» computer)
1 PC/Mac (the «Coordinator Zoom» computer)
1 USB audio interface
Hardwired network connection for the Receiving Zoom computer (recommended)
The
Pro
-level Zoom accounts were required because we needed to run a group call for longer than 40 minutes (to include setup time).
And two were needed: one for the Coordinator Zoom machine and one for the dedicated Receiving Zoom machine.
It would have been possible to consolidate the two Zoom Pro accounts and the two PC/Mac machines into one, but we had back-to-back presenters at Code4Lib, and I wanted to be able to help one remote presenter get ready while another was presenting.
In addition to this equipment, the A/V contractor was indispensable in making the connection work.
We fed the remote presenter’s video and audio from the Receiving Zoom computer to the contractor’s A/V switch through HDMI, and the contractor put the video on the ballroom projectors and audio through the ballroom speakers.
The contractor gave us a selective audio feed of the program audio minus the remote presenter’s audio (so they wouldn’t hear themselves come back through the Zoom meeting).
This becomes a little clearer in the diagram below.
Physical Connections and Setup
This diagram shows the physical connections between machines.
The
Audio Mixer
and
Video Switch
were provided and run by the A/V contractor.
The
Receiving Zoom
machine was the one that is connected to the A/V contractor’s Video Switch via an HDMI cable coming off the computer’s external monitor connection.
In the Receiving Zoom computer’s control panel, we set the external monitor to mirror what was on the main monitor.
The audio and video from the computer (i.e., the Zoom call) went out the HDMI cable to the A/V contractor’s Video Switch.
The A/V contractor took the audio from the Receiving Zoom computer through the Video Switch and added it to the Audio Mixer as an input channel.
From there, the audio was sent out to the ballroom speakers the same way audio from the pod…

Tethering a Ubiquity Network to a Mobile Hotspot

Tethering a Ubiquity Network to a Mobile Hotspot

I saw it happen.
The cable-chewing device
The contractor in the neighbor’s back yard with the Ditch Witch trencher burying a cable.
I was working outside at the patio table and just about to go into a Zoom meeting.
Then the internet dropped out.
Suddenly, and with a wrenching feeling in my gut, I remembered where the feed line was buried between the house and the cable company’s pedestal in the right-of-way between the properties.
Yup, he had just cut it.
To be fair, the utility locator service did not mark the my cable’s location, and he was working for a different cable provider than the one we use.
(There are three providers in our neighborhood.)
It did mean, though, that our broadband internet would be out until my provider could come and run another line.
It took an hour of moping about the situation to figure out a solution, then another couple of hours to put it in place: an iPhone tethered to a Raspberry Pi that acted as a network bridge to my home network’s UniFi Security Gateway 3P.
Network diagram with tethered iPhone
A few years ago I was tired of dealing with spotty consumer internet routers and upgraded the house to
UniFi
gear from Ubiquity.
Rob Pickering, a college comrade, had
written about his experience with the gear
and I was impressed.
It wasn’t a cheap upgrade, but it was well worth it.
(Especially now with four people in the household working and schooling from home during the
COVID-19 outbreak
.)
The UniFi Security Gateway has three network ports, and I was using two: one for the uplink to my cable internet provider (WAN) and one for the local area network (LAN) in the house.
The third port can be configured as another WAN uplink or as another LAN port.
And you can tell the Security Gateway to use the second WAN as a failover for the first WAN (or as load balancing the first WAN).
So that is straight forward enough, but do I get the Personal Hotspot on the iPhone to the second WAN port?
That is where the Raspberry Pi comes in.
The
Raspberry Pi
is a small computer with USB, ethernet, HDMI, and audio ports.
The version I had laying around is a Raspberry Pi 2—an older model, but plenty powerful enough to be the network bridge between the iPhone and the home network.
The toughest part was bootstrapping the operating system packages onto the Pi with only the iPhone Personal Hotspot as the network.
That is what I’m documenting here for future reference.
Bootstrapping the Raspberry Pi
The Raspberry Pi runs
its own operating system called Raspbian
(a Debian/Linux derivative) as well as more mainstream operating systems.
I chose to use the
Ubuntu Server for Raspberry Pi
instead of Raspbian because I’m more familiar with Ubuntu.
I tethered my MacBook Pro to the iPhone to download the Ubuntu 18.04.4 LTS image and follow the
instructions for copying that disk image to the Pi’s microSD card
.
That allows me to boot the Pi with Ubuntu and a basic set of operatin…

With Gratitude for the NISO Ann Marie Cunningham Service Award

With Gratitude for the NISO Ann Marie Cunningham Service Award

During the inaugural NISO Plus meeting at the end of February, I was surprised and proud to receive the Ann Marie Cunningham Service award.
Todd Carpenter, NISO’s executive director,
let me know by tweet
as I was not able to attend the conference.
Pictured in that tweet is my co-recipient, Christine Stohn, who serves NISO with me as the co-chair of the Information Delivery and Interchange Topic Committee.
This got me thinking about what NISO has meant to me.
As I think back on it, my activity in NISO spans at least four employers and many hours of standard working group meetings, committee meetings, presentations, and ballot reviews.
NISO Ann Marie Cunningham Service Award
I did not know Ms Cunningham, the award’s namesake.
My first job started when she was the NFAIS executive director in the early 1990s, and I hadn’t been active in the profession yet.
I read her
brief biography
on the NISO website:
The Ann Marie Cunningham Service award was established in 1994 to honor NFAIS members who routinely went above and beyond the normal call of duty to serve the organization. It is named after Ann Marie Cunningham who, while working with abstracting and information services such as Biological Abstracts and the Institute for Scientific Information (both now part of NISO-member Clarivate Analytics), worked tirelessly as an dedicated NFAIS volunteer. She ultimately served as the NFAIS Executive Director from 1991 to 1994 when she died unexpectedly. NISO is pleased to continue to present this award to honor a NISO volunteer who has shown the same sort of commitment to serving our organization.
As I searched the internet for her name, I came across the
proceedings of the 1993 NFAIS meeting
, in which Ms Cunningham wrote the introduction with Wendy Wicks.
These first sentences from some of the paragraphs of that introduction are as true today as they were then:
In an era of rapidly expanding network access, time and distance no longer separate people from information.
Much has been said about the global promise of the Internet and the emerging concept of linking information highways, to some people, “free” ways.
What many in the networking community, however, seem to take for granted is the availability of vital information flowing on these high-speed links.
I wonder what Ms Cunningham of 1993 would think of the information landscape today?
Hypertext linking has certainly taken off, if not taken over, the networked information landscape.
How that interconnectedness has improved with the adaptation of print-oriented standards and the creation of new standards that match the native capabilities of the network.
In just one corner of that space, we have the adoption of PDF as a faithful print replica and HTML as a common tool for displaying information.
In another corner, MARC has morphed into a communication format that far exceeds its original purpose of encoding catalog cards…

As a Cog in the Election System: Reflections on My Role as a Precinct Election Official

As a Cog in the Election System: Reflections on My Role as a Precinct Election Official

I may nod off several times in composing this post the day after election day.
Hopefully, in reading it, you won’t.
It is a story about one corner of democracy.
It is a journal entry about how it felt to be a citizen doing what I could do to make other citizens’ voices be heard.
It needed to be written down before the memories and emotions are erased by time and naps.
Yesterday I was a precinct election officer (PEO—a poll worker) for Franklin County—home of Columbus, Ohio.
It was my third election as a PEO.
The first was last November, and the second was the
election aborted by the onset of the coronavirus
in March.
(Not sure that second one counts.)
It was my first as a Voting Location Manager (VLM), so I felt the stakes were high to get it right.
Would there be protests at the polling location?
Would I have to deal with people wearing candidate T-shirts and hats or not wearing masks?
Would there be a crash of election observers, whether official (scrutinizing our every move) or unofficial (that I would have to remove)?
It turns out the answer to all three questions was «no»—and it was a fantastic day of civic engagement by PEOs and voters.
There were well-engineered processes and policies, happy and patient enthusiasm, and good fortune along the way.
This story is going to turn out okay, but it could have been much worse.
Because of the complexity of the election day voting process, last year Franklin County started allowing PEOs to do some early setup on Monday evenings.
The early setup started at 6 o’clock.
I was so anxious to get it right that the day before I took the printout of the polling room dimensions from my VLM packet, scanned it into OmniGraffle on my computer, and designed a to-scale diagram of what I thought the best layout would be.
The real thing only vaguely looked like this, but it got us started.
What I imagined our polling place would look like
We could set up tables, unpack equipment, hang signs, and other tasks that don’t involve turning on machines or breaking open packets of ballots.
One of the early setup tasks was updating the voters’ roster on the electronic poll pads.
As happened around the country, there was
a lot
of early voting activity in Franklin County, so the update file must have been massive.
The electronic poll pads couldn’t handle the update; they hung at step 8-of-9 for over an hour.
I called the Board of Elections and got ahold of someone in the equipment warehouse.
We tried some of the simple troubleshooting steps, and he gave me his cell phone number to call back if it wasn’t resolved.
By 7:30, everything was done except for the poll pad updates, and the other PEOs were wandering around.
I think it was 8 o’clock when I said everyone could go home while the two Voting Location Deputies and I tried to get the poll pads working.
I called the equipment warehouse and we hung out on the phone for hours…re…

User Behavior Access Controls at a Library Proxy Server are Okay

User Behavior Access Controls at a Library Proxy Server are Okay

Earlier this month, my Twitter timeline lit up with mentions of a half-day webinar called
Cybersecurity Landscape – Protecting the Scholarly Infrastructure
.
What had riled up the people I follow on Twitter was the first presentation: «Security Collaboration for Library Resource Access» by
Cory Roach
, the
chief information security officer at the University of Utah
.
Many of the tweets and articles linked in tweets were about a proposal for a new round of privacy-invading technology coming from content providers as a condition of libraries subscribing to publisher content.
One of the voices that I trust was urging caution:
I highly recommend you listen to the talk, which was given by a university CIO, and judge if this is a correct representation. FWIW, I attended the event and it is not what I took away.
— Lisa Janicke Hinchliffe (@lisalibrarian)
November 14, 2020
As near as I can tell, much of the debate traces back to this article:
Scientific publishers propose installing spyware in university libraries to protect copyrights – Coda Story
https://t.co/rtCokIukBf
— Open Access Tracking Project (@oatp)
November 14, 2020
The article describes Cory’s presentation this way:
One speaker proposed a novel tactic publishers could take to protect their intellectual property rights against data theft: introducing spyware into the proxy servers academic libraries use to allow access to their online services, such as publishers’ databases.
The «spyware» moniker is quite scary.
It is what made me want to seek out the recording from the webinar and hear the context around that proposal.
My understanding (after watching the presentation) is that the proposal is not nearly as concerning.
Although there is one problematic area—the correlation of patron identity with requested URLs—overall, what is described is a sound and common practice for securing web applications.
To the extent that it is necessary to determine a user’s identity before allowing access to licensed content (an unfortunate necessity because of the state of scholarly publishing), this is an acceptable proposal.
(Through the university communications office,
Corey published a statement
about the reaction to his talk.)
In case you didn’t know, a web proxy server ensures the patron is part of the community of licensed users, and the publisher trusts requests that come through the web proxy server.
The point of Cory’s presentation is that the username/password checking at the web proxy server is a weak form of access control that is subject to four problems:
phishing
(sending email to tricking a user into giving up their username/password)
social engineering
(non-email ways of tricking a user into giving up their username/password)
credential reuse
(systems that are vulnerable because the user used the same password in more than one place)
hactivism
(users that intentionally give out their username/password so other…

Should All Conference Talks be Pre-recorded?

Should All Conference Talks be Pre-recorded?

The
Code4Lib conference
was last week. That meeting used all pre-recorded talks, and we saw the benefits of pre-recording for attendees, presenters, and conference organizers.
Should all talks be pre-recorded, even when we are back face-to-face?
Note!
After I posted a link to this article on Twitter, there was a great response of thoughtful comments. I’ve included new bullet points below and
summarized the responses in another blog post.
As an entirely virtual conference, I think we can call Code4Lib 2021 a success.
Success ≠ Perfect, of course, and last week the conference coordinating team got together on a Zoom call for a debriefing session.
We had a lengthy discussion about what we learned and what we wanted to take forward to the 2022 conference, which we’re anticipating will be something with a face-to-face component.
That last sentence was tough to compose: «…will be face-to-face»? «…will be both face-to-face and virtual»? (Or another fully virtual event?)
Truth be told, I don’t think we know yet.
I think we know with some certainty that the COVID pandemic will become much more manageable by this time next year—at least in North America and Europe.
(Code4Lib draws from primarily North American library technologists with a few guests from other parts of the world.)
I’m hearing from higher education institutions, though, that travel is going to be severely curtailed…if not for health risk reasons, then because budgets have been slashed.
So one has to wonder what a conference will look like next year.
I’ve been to two online conferences this year:
NISOplus21
and Code4Lib. Both meetings recorded talks in advance and started playback of the recordings at a fixed point in time.
This was beneficial for a couple of reasons.
For organizers and presenters, pre-recording allowed technical glitches to be worked through without the pressure of a live event happening.
Technology is not nearly perfect enough or ubiquitously spread to count on it working in real-time.
1
NISOplus21 also used the recordings to get transcribed text for the videos.
(Code4Lib used live transcriptions on the synchronous playback.)
Attendees and presenters benefited from pre-recording because the presenters could be in the text chat channel to answer questions and provide insights.
Having the presenter free during the playback offers new possibilities for making talks more engaging: responding in real-time to polls, getting forehand knowledge of topics for subsequent real-time question/answer sessions, and so forth.
The synchronous playback time meant that there was a point when (almost) everyone was together watching the same talk—just as in face-to-face sessions.
During the Code4Lib conference coordinating debrief call, I asked the question: «If we saw so many benefits to pre-recording talks, do we want to pre-record them all next year?»
In addition to the reasons above, pre-recorded talks benefit those who are no…

More Thoughts on Pre-recording Conference Talks

More Thoughts on Pre-recording Conference Talks

Over the weekend, I posted an article here about
pre-recording conference talks
and sent a
tweet
about the idea on Monday.
I hoped to generate discussion about recording talks to fill in gaps—positive and negative—about the concept, and I was not disappointed.
I’m particularly thankful to Lisa Janicke Hinchliffe and Andromeda Yelton along with Jason Griffey, Junior Tidal, and Edward Lim Junhao for generously sharing their thoughts.
Daniel S and Kate Deibel also commented on the Code4Lib Slack team.
I added to the previous article’s bullet points and am expanding on some of the issues here.
I’m inviting everyone mentioned to let me know if I’m mischaracterizing their thoughts, and I will correct this post if I hear from them.
(I haven’t found a good comments system to hook into this static site blog.)
Pre-recorded Talks Limit Presentation Format
Lisa Janicke Hinchliffe made this point early in the feedback:
@DataG For me downside is it forces every session into being a lecture. For two decades CfPs have emphasized how will this season be engaging/not just a talking head? I was required to turn workshops into talks this year. Even tho tech can do more. Not at all best pedagogy for learning
— Lisa Janicke Hinchliffe (@lisalibrarian)
April 5, 2021
Jason
described
the «flipped classroom» model that he had in mind as the NISOplus2021 program was being developed.
The flipped classroom model is one where students do the work of reading material and watching lectures, then come to the interactive time with the instructors ready with questions and comments about the material.
Rather than the instructor lecturing during class time, the class time becomes a discussion about the material.
For NISOplus, «the recording is the material the speaker and attendees are discussing» during the live Zoom meetings.
In the previous post, I described how the speaker could respond in text chat while the recording replay is beneficial.
Lisa went on to say:
@DataG Q+A is useful but isn’t an interactive session. To me, interactive = participants are co-creating the session, not watching then commenting on it.
— Lisa Janicke Hinchliffe (@lisalibrarian)
April 5, 2021
She
described an example
: the SSP preconference she ran at CHS. I’m paraphrasing her tweets in this paragraph.
The preconference had a short keynote and an «Oprah-style» panel discussion (not pre-prepared talks).
This was done live; nothing was recorded.
After the panel, people worked in small groups using Zoom and a set of Google Slides to guide the group work.
The small groups reported their discussions back to all participants.
Andromeda
points out
(paraphrasing twitter-speak): «Presenters will need much more— and more specialized—skills to pull it off, and it takes a lot more work.»
And Lisa
adds
: «Just so there is no confusion … I don’t think being online makes it harder to do interactive. It’s the pre-recording. Interactive means participants co…

Thoughts on Growing Up

Thoughts on Growing Up

It ‘tis the season for graduations, and this year my nephew is graduating from high school.
My sister-in-law created a memory book—»a surprise Book of Advice as he moves to the next phase of his life.»
What an interesting opportunity to reflect!
This is what I came up with:
Sometime between when I became an adult and now, the word «adulting» was coined. My generation just called it «growing up.» The local top-40 radio station uses «hashtag-adulting» to mean all of those necessary life details that now become your own responsibility. («Hashtag» is something new, too, for what that’s worth.)
Growing up is more than life necessities, though. This is an exciting phase of life that you’ve built up to—many more doors of possibilities are opening and now you get to pick which ones to go through. Pick carefully. Each door you go through starts to close off others. Pick many. Use this life stage to try many things to find what is fun and what is meaningful (and aim for both fun and meaningful). You are on a solid foundation, and I’m eager to see what you discover «adulting» means to you.

Digital Repository Software: How Far Have We Come? How Far Do We Have to Go?

Digital Repository Software: How Far Have We Come? How Far Do We Have to Go?

Bryan Brown’s tweet
led me to
Ruth Kitchin Tillman’s
Repository Ouroboros
post
about the treadmill of software development/deployment.
And wow do I have thoughts and feelings.
Ouroboros: an ancient symbol depicting a serpent or dragon eating its own tail. Or—in this context—constantly chasing what you can never have. Source:
Wikipedia
Let’s start with feelings.
I feel pain and misery in reading Ruth’s post.
As Bryan said in a subsequent tweet, I’ve been on both sides: a system maintainer watching much-needed features put off to major software updates (or rewrites) and the person participating in decisions to put off feature development in favor of major updates and rewrites.
It is a bit like a serpent chasing its tail (a reference to «Ouroboros» in Ruth’s post title)—as someone who just wants a workable, running system, it seems like a never-ending quest to get what my users need.
I think it will get better.
I offer as evidence the fact that almost all of us can assume network connectivity.
That certainly wasn’t always the case: routers used to break, file servers crash would under stress, network drivers go out of date at inopportune times.
Now we take network connectivity for granted—almost (
almost!
) as if it a utility as common as water and electricity.
We no longer have to chase our tail to assume those things.
When we make those assumptions, we push that technology down the stack and layer on new things.
Only after electricity is reliable can we layer on network connectivity.
With reliable network connectivity, we layer on—say—digital repositories.
Each layer goes through its own refinement process…getting better and better as it relies on the layers below it.
Are digital repositories as reliable as printed books?
No way!
Without electricity and network connectivity, we can’t have digital repositories but we can still use books.
Will there come a time when digital repositories are as reliable as electricity and network connectivity?
That sounds like a
Star Trek
world, but if history is our guide, I think the profession will get there.
(I’m not necessarily saying
I’ll
get there with it—such reliability is probably outside my professional lifetime.)
So, yeah, I feel pain and misery in Ruth’s post about the achingly out-of-reach nature of repository software that can be pushed down the stack…that can be assumed to exist with all of the capabilities that our users need.
That brings me around to one of Bryan’s tweets:
If the idea of a digital preservation platform is that it is purpose-built to preserve assets for a long period of time, then isn’t it an obvious design flaw to build it with an EOL in mind? If the system is no longer supported, then can it really be trusted for preservation?
— Bryan J. Brown (@bryjbrown)
June 22, 2021
Can digital repositories really be trusted in-and-of-themselves?
No.
(Not yet?)
That isn’t to say that steps aren’t bein…

DLTJ Now Uses Webmention and Bridgy to Aggregate Social Media Commentary

DLTJ Now Uses Webmention and Bridgy to Aggregate Social Media Commentary

When I converted this blog from WordPress to a static site generated with
Jekyll
in 2018, I lost the ability for readers to make comments.
At the time, I thought that one day I would set up an installation of
Discourse
for comments like
Boing Boing did in 2013
.
But I never found the time to do that.
Alternatively, I could do what NPR has done—
abandon comments on its site in favor of encouraging people to use Twitter and Facebook
—but that means blog readers don’t see where the conversation is happening.
This article talks about
IndieWeb
—a blog-to-blog communication method—and the pieces needed to make it work on both a static website and for social-media-to-blog commentary.
The IndieWeb is a combination of
HTML markup
and an
HTTP protocol
for capturing discussions between blogs.
To participate in the IndieWeb ecosystem, a blog needs to support the »
h-card
» and »
h-entry
» microformats.
These microformats are ways to add HTML markup to a site to be read and recognized by machines.
If you follow the
instructions at IndieWebify.me
, the «Level 2» steps will check your site’s webpages for the appropriate markup.
The Jekyll theme I use here,
minimal-mistakes
, didn’t include the microformat markup, so I
made a pull request
to add it.
With the markup in place, dltj.org uses the
Webmention protocol
to notify others when I link to their content and receive notifications from others.
If you’re setting this up for yourself, hopefully someone has already
gone through the effort
of adding the necessary Webmention communication bits to your blog software.
Since
DLTJ
is a static website, I’m using the
Webmention.IO service
to send and receive Webmention information on behalf of dltj.org and a Jekyll plugin called
jekyll-webmention_io
to integrate Webmention data into my blog’s content.
The plugin gets that data from webmention.io, caches it locally, and builds into each article the list of webmentions and
pingbacks
(another kind of blog-to-blog communication protocol) received.
Webmention.IO and jekyll-webmention_io will capture some commentary.
To get comments from Twitter, Mastodon, Facebook, and elsewhere, I added the Bridgy service to the mix.
From their
About page
: «Bridgy periodically checks social networks for responses to your posts and links to your web site and sends them back to your site as webmentions.»
So all of that commentary gets fed back into the blog post as well.
I’ve just started using this Webmention/Bridgy setup, so I may have some pieces misconfigured.
I’ll be watching over the next several blog posts to make sure everything is working.
If you notice something that isn’t working, please reach out to me via one of the mechanisms listed in the sidebar of this site.

On the Code4Lib Journal’s Two Proposed Metrics article

On the Code4Lib Journal’s Two Proposed Metrics article

Code4Lib Journal (C4LJ) editor here. Becky Yoose’s Twitter thread has stirred up a great deal of attention to an article published yesterday. This post has my own thoughts on the issue…
published on Twitter
to match Becky’s medium and here on my blog for posterity.
So yeah that Code4Lib Journal editorial and article privacy debacle.
I have a story to tell you all.
Grab a beverage of your choice and get comfortable. It’s going to be a long story.
🧵
— Becky Yoose (@yo_bj)
September 23, 2021
This first part is going to come across as defensive. «The Code4Lib Journal exists to foster community and share information among those interested in the intersection of libraries, technology, and the future.» (
mission
) Its editorial committee are volunteers. (I’m not paid by my employer to be on the editorial committee; the time I’m using during the middle of the work day to write these thoughts will have to be made up later. I don’t think any of the committee members have it in their job description to be on the committee.)
First: assume best intentions. The editorial committee (EC) selected an article for publication called
«On Two Proposed Metrics of Electronic Resource Use»
; it presented a unique approach to a hard problem: characterizing the value of subscribed resources. The EC is aware that measuring value in this way does involve recording and processing patron identifying information, and the EC discussed the privacy implications in the article. As @yo_bj pointed out in her thread, the EC sought out her expertise because of previous comments. The EC reflected on Becky’s feedback on the article—it is good feedback and I hope she does repurpose it for publication in a more public and tangible form—and discussed it with the article author. We also discussed our process of shepherding articles to publication. (If you haven’t published with the C4LJ before, it is helpful to know that the editors take a more collaborative approach to working with article authors. It is not blind peer review, nor is it co-authorship; it is somewhere in between. Good for first-article authors.)
Best intentions: The EC had an insightful potentially useful article…with ideas worthy of publication and debate. We know we asked for @yo_bj’s thoughts late in the editorial process. While the points she raised have merit, the concerns are not high enough to block publication. C4LJ does not have a point-counterpoint mode of publication. It may have been useful to invent one for this article, but we didn’t do that. It may have been useful for the EC to invite Becky to firm up her analysis and publish it along side the article; we didn’t do that. The EC did have a self-imposed deadline and nine other articles awaiting publication. We could have held publication of this article, but we elected not to do that. There may be ideas that others have—let’s hear ‘em. Instead, the coordinating editor wrote an editorial an…

What EDUCAUSE’s 2022 Top 10 IT Issues Mean for Libraries

What EDUCAUSE’s 2022 Top 10 IT Issues Mean for Libraries

Last month, EDUCAUSE published its
Top 10 IT Issues for 2022
with the subtitle «The Higher Education We Deserve».
To reach the top 10, EDUCAUSE members were asked to prioritize 17 issues identified by the EDUCAUSE IT Issues Panel members.
The members of the Issue Panel then broke up into groups to write essays on the 10 topics.
This report starts with a 1,500-word summary of the common themes in the pieces, followed by the essays themselves.
There is significant overlap in the essays to wade through with this publication style, but some valuable thoughts and observations are also there.
Here are my highlights.
In a number of places below, I will refer to sections of the EDUCAUSE article using Hypothes.is annotation links.
If you’d like to see more or carry on a conversation, see the
Hypothes.is-enabled version
of the page.
Side note before we start:
Psst. EDUCAUSE. Over here.
First, kudos for publishing this as an HTML page and not some excessively designed PDF file.
But why in the world did you publish what must be a 15,000 word HTML article with
no
table-of-contents anchors?
It sure would be nice to refer to specific essays and sub-parts within each essay.
The Big Picture
At the top of the article, the EDUCAUSE editors put a rosy hue on the opportunities for higher education coming out of the pandemic that can be enabled by educational technology.
The EDUCAUSE 2022 Top 10 IT Issues take an optimistic view of how technology can help make the higher education we deserve—through a shared transformational vision and strategy for the institution, a recognition of the need to place students’ success at the center, and a sustainable business model that has redefined «the campus.»
At least they are admitting upfront that it is an optimistic view.
If I were to write it, I’d say something like:
The EDUCAUSE 2022 Top 10 IT Issues describe a watershed moment in higher education at a time when there isn’t much water behind the dam. Faculty and staff are tired (several essays acknowledge this), and students are anxious. Calls for digital transformation mean that old ways of doing things must be replicated in two new ways: in-person/online hybrid and entirely online. And the transformation must be done at or below current budget levels. By the way: if we screw this up, our institution might die on the vine.
I’m not naturally a pessimistic person, but all this talk of Digital Transformation—that phrase is used so often in the article that the writers shorten it to a new buzzword: «Dx»—has me somewhat concerned.
There are some profound implications here, and I’m unsure where the capacity to carry out the vision described in these 10 issues will come from.
The 10 Issues
Cyber Everywhere! Are We Prepared?: Developing processes and controls, institutional infrastructure, and institutional workforce skills to protect and secure data and supply-chain integrity
Evolve or Become Extinct: Acceleratin…

Refactoring DLTJ, Winter 2021 Part 1: Picking up Obsidian

Refactoring DLTJ, Winter 2021 Part 1: Picking up Obsidian

As 2021 comes to a close, I’ve been thinking about this blog and my own «personal knowledge management» tools.
It is time for some upgrades to both.
The next few posts will be about the changes I’m making over this winter break.
Right now I think the updating will look something like this:
Ramp up automation for adding reading sources to Obsidian (this post)
Refactor the process of building this static website on AWS
Recreate the ability for readers to get updates by email
Turn the old DLTJ «Thursday Threads» concept into a newsletter
I’ll go back and link the bullet points above when (if?) I create the corresponding blog posts.
I’ve been using
Obsidian
for about six months as a place to note and link ideas on stuff I’m reading and watching.
In case you haven’t run across it yet, Obsidian is a personal wiki of sorts.
It is software that sits atop a folder of Markdown files to provide indexing as well as inter-page linking and graph views of the folder’s contents.
Most people use it to build up their own personal knowledge management (PKM) database.
You can make notes for the sources you are reading, then build knowledge by linking sources together using keywords and adding commentary at the intersection of related ideas.
Before Obsidian, I was using the
Pinboard service
to store bookmarks of interesting sources and using the paid subscription search engine and my own memory to find stuff.
I’ve found that this setup works okay for retrieval—I can usually find things that I know I’ve read about before—but doesn’t do so well for making new connections or creating new knowledge.
The
Thursday Threads
series on this blog years ago was, in part, a way to find those connections and explore them a little bit in writing.
I’m expecting Obsidian to help improve this area.
The start of the knowledge curation process is creating pages in Obsidian for the important/useful things I’m reading—each of these is a «source».
I like the idea of having a bookmark service as the start of the queue of sources feeding into the PKM; It is a universal tool that is available from a wide variety of entry points.
In my desktop browser, I use the
Pinboard Bookmarklet
to add new sources.
On iOS, I use the
Pins app
on the share sheet to add things.
The Pins app works not only in Safari but also in other places like the New York Times and Twitter apps.
To get sources from Pinboard into my Obsidian PKM database, I wrote a
Python script
that uses the Pinboard API to copy bookmarks into an intermediate SQLite3 database, and then every morning creates a page in the Obsidian database for each new source.
Please note that this Python script is quite the mess; it started simple but has had functionality grafted into it a dozen times now, and it is in need of a serious rewrite.
For better or for worse, it is out there for others to inspect and get ideas from.
For the sources I add to my PKM, I’m also concern…

Refactoring DLTJ, Winter 2021 Part 2: Adopt AWS Amplify

Refactoring DLTJ, Winter 2021 Part 2: Adopt AWS Amplify

Look at that!
Progress is being made down the list of to-dos for this blog in order to start the new year on a fresh footing.
As you might recall from the last blog post, I set out to do some upgrades across the calendar year boundary:
Ramp up automation for adding reading sources to Obsidian
Refactor the process of building this static website on AWS (this post)
Recreate the ability for readers to get updates by email
Turn the old DLTJ “Thursday Threads” concept into a newsletter
DLTJ is a «static site» blog, meaning that the page you are reading right now is a straight-up HTML file.
This page is converted from the simple
Markdown format
to HTML by the
Jekyll
program.
The DLTJ blog used to be based on WordPress, which meant a server was always running to dynamically generate each webpage out of a database.
(If you go back in the DLTJ archives you’ll see notes on top of pages that were part of the automatic conversion from WordPress to Markdown.)
That WordPress server was quite costly to have constantly run for a small blog.
(Yes, it is possible to pay someone a small amount to host your WordPress blog for you, but I’m a do-it-yourself kind of person.)
So
at the end of 2017 I migrated the site
to
Markdown stored in a GitHub repository
with the Jekyll conversion and content delivery through Amazon Web Services (AWS).
Serving up static web pages from AWS S3/CloudFront is really simple.
Processing the Markdown on GitHub into HTML via Jekyll on AWS is more complicated, and that process was something that I wanted to happen automatically every time I published a change to GitHub.
I ended up hand-crafting about 650 lines of an AWS CloudFormation configuration file plus a few dozen lines of Python in some AWS Lambda functions.
It worked, but it was fragile and very hard to maintain.
That was in 2017 and technology marches on; now AWS has a service that does all of the automation for you.
Called
Amplify
, it bundles together a bunch of other AWS tools to help developers to create «full-stack web and mobile apps.»
The Amplify tools are really quite overkill for a static website, but
building a static website
is one of the hands-on «Getting Started» examples that AWS offers.
For a static website, Amplify handles:
creating an S3 bucket and CloudFront distribution to store and serve up the content
provisioning a webhook API that notifies AWS to start the content building process and adds that webhook to the GitHub repository
setting up the CodeBuild process for Jekyll to generate the static web pages
creating the HTTPS security certificate and adding the appropriate DNS entries to the domain
All of the stuff I was doing in that 650-line CloudFormation file.
(Plus Amplify has a lot more interesting features built into the service.)
AWS Amplify Console
One Problem: Getting the Correct Version of Ruby
Now for the two-hour detour.
At least one of the Jekyll Gems I’m using to build th…

Refactoring DLTJ, Winter 2021 Part 2.5: Fixing the Webmentions Cache

Refactoring DLTJ, Winter 2021 Part 2.5: Fixing the Webmentions Cache

Okay, a half-step backward to fix something I broke yesterday.
As I
described earlier this year
, this static website blog uses the
Webmention protocol
to notify others when I link to their content and receive notifications from others.
Behind the scenes, I’m using the Jekyll plugin called
jekyll-webmention_io
to integrate Webmention data into my blog’s content.
Each time the contents of this site is built, that plug-in contacts the
Webmention.IO service
to receive its Webmention data.
(Webmention.IO holds onto it between Jekyll builds since there is no always-on «dltj.org» server to receive notifications from others.)
The plug-in caches that information to ease the burden on the Webmention.IO service.
The previous CloudFormation-based process was using AWS CodeBuild natively, and the Webmention cache was stored in
CodeBuild’s caching function
.
CodeBuild automatically downloads the previous cache into the working directory for each build iteration and then automatically uploads the cache as the build is completed.
Handy, right?
Well, AWS Amplify simplifies some of the setup of working with the underlying CodeBuild tool.
One of the configuration options that is no longer available is the ability to specify which S3 bucket to use as the CodeBuild cache; so I couldn’t point it at the previous cache files and all of the previous Webmention entries no longer appeared on the blog pages.
Fortunately, I hadn’t decommissioned the CloudFormation stuff, so I still had access to the old cache; I was able to extract the four webmention files (but see below for a discussion about that).
Since Amplify doesn’t allow me to have direct access to the CodeBuild cache, I decided it was high time to use a dedicated cache location for these webmention files.
To do that took three steps:
1. Create the S3 bucket (with no public access)
2. Add read/write policy for that bucket to the AWS role assigned to the Amplify app
3. Add lines to the
amplify.yml
file to copy files from the S3 bucket into and out of the working directory
For step 2, the IAM policy for the Amplify role:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{
«Version»
:
«2012-10-17»
,
«Statement»
:
[
{
«Sid»
:
«VisualEditor0»
,
«Effect»
:
«Allow»
,
«Action»
:
[
«s3:DeleteObject»
,
«s3:PutObject»
,
«s3:GetObject»
,
«s3:ListBucket»
],
«Resource»
:
«arn:aws:s3:::org.dltj.webmentions-cache»
},
{
«Sid»
:
«VisualEditor1»
,
«Effect»
:
«Allow»
,
«Action»
:
[
«s3:ListAllMyBuckets»
],
«Resource»
:
«*»
}
]
}
For the
amplify.yml
file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
version
:
1
frontend
:
phases
:
preBuild
:
commands
:

aws s3 cp s3://org.dltj.webmentions-cache webmentions-cache –recursive

rvm use $VERSION_RUBY_2_6

bundle install –path vendor/bundle
build
:
commands
:

rvm use $VERSION_RUBY_2_6

bundle exec jekyll build –trace
postBuild
:
commands
:

aws s3 cp webmentions-cache …

Refactoring DLTJ, Winter 2021 Part 3: «Serverless» Newsletter System

Refactoring DLTJ, Winter 2021 Part 3: «Serverless» Newsletter System

So it has been quiet here for a couple of days.
Rest assured: the quietness comes from heads-down work, not from giving up.
Here are the refactor-DLTJ activities so far:
Ramp up automation for adding reading sources to Obsidian
Refactor the process of building this static website on AWS
Fix the webmentions cache, an unanticipated diversion
Recreate the ability for readers to get updates by email (this post)
Turn the old DLTJ “Thursday Threads” concept into a newsletter
Since New Years Day, I’ve been working on a way to send the contents of blog posts by email…commonly known nowadays as a newsletter.
Years ago, I was using the Feedburner service to do that.
Then Feedburner was bought by Google, and things were mostly okay for a while.
Which is to say that most everything was working, and the things that weren’t—
like HTTPS for custom RSS domain names
—had workarounds.
But last summer
Feedburner-Google discontinued the distribution of blog posts by email
, which necessitated the need to buy or build my own email distribution system.
There are certainly «buy» options.
For instance, one might use
Medium
for writing and distribution.
But I’ve seen too many services come and go to come to rely on a business to be a good steward of my content.
The
Substack service
has the same problem.
For a while I considered the
follow.it service
as an alternative to Feedburner that included a newsletter-like add-on, but its «white label» service inserts the «follow.it» domain name in critical places where I would lose control over my list of subscribers.
(After all, I’m only able to do this cleanly because I kept control over my RSS feed by using «feeds.dltj.org» as a hostname.)
So I’m running it myself.
I briefly considered
listmonk
, but I don’t know the Go programming language so that make troubleshooting and enhancing more of a challenge.
Not readily spotting other alternatives, I created my own system using AWS tools, the
Serverless.com framework
, and the Python programming language.
Thanks to a
great outline by Marco Lancini
and
ideas from Victoria Drake
.
The
newsletter infrastructure software is on GitHub
.
It deserves a decent README file and some documentation to help others use it if they are so inclined.
There are also a number of hard-coded areas that would need to be made more general.
(See, for instance,
these couple of lines
that are used to pull out the body of the blog post for inclusion into the newsletter email.)
But Why
I’ve been asked,
why do you go through all of this work instead of just hosting your blog on WordPress.com
?
That is a reasonable question and it deserves a thoughtful response.
I like control of my content.
My writings have always been stored on devices that I have a moderate amount of control over—first WordPress on a personal server in a co-location space, then WordPress on an Amazon Web Services (AWS) server, then as static files cr…

Refactoring DLTJ, Winter 2021 Part 4: Thursday Threads Newsletter Launches

Refactoring DLTJ, Winter 2021 Part 4: Thursday Threads Newsletter Launches

Success!
Four parts plus a half (or a «re-do»» of part 2):
Ramp up automation for adding reading sources to Obsidian
Refactor the process of building this static website on AWS
Fix the webmentions cache, an unanticipated diversion
Recreate the ability for readers to get updates by email
Turn the old DLTJ “Thursday Threads” concept into a newsletter (this post)
Earlier today, the newsletter launched with
issue 79
.
It wasn’t without hiccups, but I don’t think any of the problems leaked out to the subscribers.
I started with a list of 286 email addresses that were subscribed to the 2015 edition.
This morning I sent an email to all of them on the blind-carbon-copy line from my regular email.
That way I could see which addresses bounced back as undeliverable (94 addresses) before loading the list into the newsletter database.
(Undeliverable email counts as a strike against you when using Amazon’s Simple Email Service, so I didn’t want to start with a bad reputation with them.)
One of the issues I ran into was with the multiprocessing code that I found on the web.
It didn’t work as claimed, and when I tried to adjust it, the loop to process email stalled, so I ripped out that code.
In the end, with about 200 email addresses, it took just a minute or two of single-threaded, sequential sending to get them all out.
Perhaps I won’t need that multi-threaded capability until
Thursday Threads
gets much bigger.
How the Newsletter is Put Together
Like everything on this static site blog, an issue starts as a Markdown file.
Markdown is a light-weight markup language that translates very easily into HTML, and makes it easy for a writer to create valid HTML.
It is also possible to mix HTML inside a Markdown file and have the right thing happen.
The Jekyll processor (the program that turns a folder of Markdown files into a folder of HTML files) has a mechanism for including macros in the markup, and each «thread» in the issue is a macro file.
If you look at the
Markdown source for issue 79
, you’ll see each heading (marked with
##
) has a
{% include thursday-threads-quote.html %}
macro definition.
1
2
3
4
5
6
7
8
{
% include thursday-threads-quote.html
blockquote
=
«The EDUCAUSE 2022 Top 10 IT Issues take an optimistic view of how technology can help make the higher education we deserve—through a shared transformational vision and strategy for the institution, a recognition of the need to place students’ success at the center, and a sustainable business model that has redefined ‘the campus.'»
url
=
«https://er.educause.edu/articles/2021/11/top-10-it-issues-2022-the-higher-education-we-deserve»
versiondate
=
«2021-11-12»
versionurl
=
«https://web.archive.org/20211127031010/https://er.educause.edu/articles/2021/11/top-10-it-issues-2022-the-higher-education-we-deserve»
anchor
=
«Top 10 IT Issues, 2022: The Higher Education We Deserve»
post
=
«, EDUCAUSE»
%}
Each of th…

Issue 79: Educational Technology Futures, Social Media Legislation, Apollo 11 Launch at 50

Issue 79: Educational Technology Futures, Social Media Legislation, Apollo 11 Launch at 50

Welcome to the re-inaugural issue of
DLTJ Thursday Threads.
Counting backward, there were
78 previous issues
(all by the most recent still need to be converted from the old WordPress style of formatting) with—all told—several hundred references and commentary.
Here at the start of 2022, I’m making a resolution to restart
Thursday Threads
with links and thoughts about library technology, general technology trends, and internet culture.
What EDUCAUSE’s 2022 Top 10 IT Issues Mean for Libraries
Legislation in the Works for Social Media Regulation
Relive the 50th Anniversary of the Apollo 11 Launch…Projected onto the Washington Monument!
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to
DLTJ’s Thursday Threads
, visit the
sign-up page
.
If you would like a more raw and immediate version of these types of stories,
follow me on
Mastodon
where I post the bookmarks I save. Comments and tips, as always, are welcome.
What EDUCAUSE’s 2022 Top 10 IT Issues Mean for Libraries
The EDUCAUSE 2022 Top 10 IT Issues take an optimistic view of how technology can help make the higher education we deserve—through a shared transformational vision and strategy for the institution, a recognition of the need to place students’ success at the center, and a sustainable business model that has redefined ‘the campus.’

Top 10 IT Issues, 2022: The Higher Education We Deserve
, EDUCAUSE
Let’s start with this report from EDUCAUSE from a panel of its members that reviewed survey results on what they see as the big educational technology issues for the year.
I cover this report in more depth in a
separate
DLTJ
article
, but I think it is useful to provide some of the headline commentaries here.
First, these IT leaders anticipate an acceleration of the role of technology in teaching and learning.
The pandemic has spawned a new recognition of how big the cohort of «non-traditional» students is—part-time learners, remote learners, asynchronous learners, etc.
Instructional technologists will certainly be called upon to support new tools and new roles; the academic librarian’s instructional experience and traditional «high-touch» approach to supporting users can be an asset for institutions that choose to tap that capability.
There is recognition that we are all tired and stretched as well as the reality that one-time emergency money is drying up.
Still, there is room for growth for academic libraries seeking to re-form their mission for a new era.
Legislation in the Works for Social Media Regulation
Washington is awash in proposals for reforming social media, but in a narrowly divided Congress, it’s little surprise that none have passed. Many Democrats believe that social media’s core problem is that dangerous far-right speech is being amplified. Many Republicans believe that the core problem is that the p…

Router Behind a Uverse/Pace 5268ac Gateway Loses its Mind Every 10 Minutes

Router Behind a Uverse/Pace 5268ac Gateway Loses its Mind Every 10 Minutes

Late last year, I had my AT&T Uverse residential gateway replaced.
For reasons that truly baffle me, AT&T has decided that
they are going to run unsupported equipment on their residential customer network
.
When the replacement was swapped in, my family noticed that video conference calls—Zoom and Facetime and Slack—would occasionally drop out for about 10 seconds before continuing.
After much frustration, I started timing the outages and found that they were happening at roughly 10-minute intervals (plus or minus just a few seconds).
Some internet searching lead to a forum post (
page 1
,
page 2
) on AT&T’s customer site.
As it turns out, there is a conflict with the DHCP address assignment messages when the residential gateway is in DMZplus mode.
1
Forum user «weshunt» had the right solution:
I’m not a network confguration expert, but it bothered me that the Pace [residential gateway] and the USG both wanted to use 192.168.1.x for DHCP allocations. I noticed that even after putting the USG into the DMZPlus, I could connect a wireless device and it would get an address in the Pace’s default 192.168.1.x range, which conflicted with the IP range the USG was trying to manage. And of course the Pace answered to 192.168.1.254, which was also in the default allocation range of the USG.
So I changed the DHCP settings on the Pace to answer to a different subnet (192.168.100.1 with a DHCP allocation range inside 192.168.100.x as well). Like magic, the USG immediately picked up the DHCP assignment from the Pace and got the public IP exactly like I wanted. Now the networks don’t seem to want to fight each other. I can still access the Pace from the wired network via the new gateway IP (192.168.100.1), and also connect to the Pace wirelessly using the old SSID if I need to, though I’m shutting that down to alleviate unnecessary wireless congestion.
Step by step, this is what you need to do.
Change the LAN DHCP Range
With a web browser, go to your residential gateway advanced device configuration page.
The link for this will be printed on the bottom of the gateway and is probably http://192.168.1.254.
You will also need the «Device Access Code» that is printed just below that web address.
I’m using a hardwired ethernet connection between my desktop and the residential gateway, but this will probably also work over wireless, too.
Click on
Settings
… then
LAN
… … then
DHCP
.
In the «DHCP Configuration»→»DHCP Network Range» section, select «Configure manually» and enter these values:
Router Address: 192.168.100.1
Subnet Mask: 255.255.255.0
First DHCP Address: 192.168.100.100
Last DHCP Address: 192.168.100.200
DHCP Lease Time: 24
At the bottom, click «Save». You’ll need your Device Access Code at this point to save your changes.
Pace Residential Gateway DHCP Configuration Page
The IP address ranges on the LAN side of the residential gateway have now changed, so t…

Issue 80: Cryptocurrency’s Wasteful Energy Consumption and an Ode to Interlibrary Loan

Issue 80: Cryptocurrency’s Wasteful Energy Consumption and an Ode to Interlibrary Loan

Welcome to issue 80 of Thursday Threads.
I’m so happy many of you chose to stick around and greetings to all of the new subscribers.
To those that received my email last Thursday giving you a heads-up that a new issue would be coming to your inbox but then didn’t receive it: check your spam folder.
Over the course of the week, I’ve learned a great deal more about the spam-prevention mechanisms that are keeping our inboxes as clean as they are.
I highly recommend the
interactive ‘Learn and Test DMARC’
site sponsored by URIPorts.
It was useful to see several standards come together to ensure email senders are who they say they are.
(If you find this issue in your spam folder, please reply so I can track down more of the causes.)
Two threads this week:
Cryptocurrency’s Energy Consumption
Ode to Interlibrary Loan
On a professional note, my employer is looking for a
FOLIO Services Analyst
to join our growing effort bringing the FOLIO open source platform to libraries around the world.
If getting in on the ground floor of a revolution in library technology sounds appealing, check out the job description at the link above.
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to
DLTJ’s Thursday Threads
, visit the
sign-up page
.
If you would like a more raw and immediate version of these types of stories,
follow me on
Mastodon
where I post the bookmarks I save. Comments and tips, as always, are welcome.
Cryptocurrency’s Energy Consumption
Kosovo’s government on Tuesday introduced a ban on cryptocurrency mining in an attempt to curb electricity consumption as the country faces the worst energy crisis in a decade due to production outages.

Kosovo bans cryptocurrency mining to save electricity
Reuters, 5-Jan-2022
An army of cryptocurrency miners heading to the state for its cheap power and laissez-faire regulation is forecast to send demand soaring by as much as 5,000 megawatts over the next two years. The crypto migration to Texas has been building for months, but the sheer volume of power those miners will need — two times more than the capital city of almost 1 million people consumed in all of 2020 — is only now becoming clear.

Texas Plans to Become the U.S. Bitcoin Capital. Can Its Grid, Ercot, Handle It?
Bloomberg, 19-Nov-2021
Tape Pile
, by SidewaysSarah, CC-By
One thread that I already anticipate will be covered on many Thursdays is the growing cryptocurrency problem.
In this edition: how cryptocurrencies are a waste of resources.
A brief introduction, in case you haven’t encountered this technology yet, goes like this: cryptocurrencies are tokens of value that are exchanged on a «blockchain».
A blockchain, in turn, is like a strip of calculator tape…once something is printed on it, it doesn’t come off and it is there for everyone to see.
Cryptocurrencies need «miners» to do th…

Issue 81: Controlled Digital Interlibrary Lending, Gamers Revolt Against NFTs, and Cats

Issue 81: Controlled Digital Interlibrary Lending, Gamers Revolt Against NFTs, and Cats

Alan the cat
Wednesday night with a cat on the lap, composing the next day’s
Thursday Threads
.
How could life get any better?
Hey…I’m not above using cat pictures to satisfy readers.
In fact, I’m going to do it one more time before this newsletter is finished.
(Oh, and if you are not seeing the pictures in your email, go ahead and click on the «load remote images button»—these are shared from my own site and there are no trackers in use.)
Thanks for the feedback on
Thursday Threads
—it has been very helpful.
With this issue, I think you’ll notice the email has a better visual look.
The
website of back issues
has some improvements as well, and I’m starting to get into the swing of converting old posts to the new format.
The threads this week:
Controlled Digital Lending Gets a Funding Boost
Gamers Pushing Back Against Non-Fungible Tokens
Cat Dish
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to
DLTJ’s Thursday Threads
, visit the
sign-up page
.
If you would like a more raw and immediate version of these types of stories,
follow me on
Mastodon
where I post the bookmarks I save. Comments and tips, as always, are welcome.
Controlled Digital Lending Gets a Funding Boost
The Davis Educational Foundation has awarded the Boston Library Consortium a two-year $215,000 grant to accelerate the implementation of controlled digital lending as a mechanism for interlibrary loan. The grant supports plans described in BLC’s ‘Consortial CDL: Implementing Controlled Digital Lending as a Mechanism for Interlibrary Loan’ report published in September 2021.

Davis Educational Foundation award accelerates Boston Library Consortium’s controlled digital lending implementation
, Boston Library Consortium, 13-Jan-2022
The National Information Standards Organization (NISO) today announced that it has received a grant of $125,000 from The Andrew W. Mellon Foundation to support the development of a consensus framework for implementing controlled digital lending (CDL) of book content by libraries, which has been approved by NISO members as a new initiative.

NISO Awarded Mellon Funding for Controlled Digital Lending Project
NISO Press Release, 20-Sep-2021
From my perspective, controlled digital lending for interlibrary loan (or «CDILL») is gaining steam.
(I’m trying to make «CDILL» stick as a way of differentiating this type of controlled digital lending from the kind where a library uses CDL techniques to offer its own materials to its own patrons.)
These two funding announcements show support for the development of systems and practices for libraries to advance the cooperation beyond the point of shipping physical books back and forth.
(Although shipping physical books back and forth is still a noble effort by libraries, as last week’s
ode to interlibrary loan
demonstrated.)
The key to mak…

A Better Structlog Processor for Python for CloudWatch Logs Using AWS Lambda

A Better Structlog Processor for Python for CloudWatch Logs Using AWS Lambda

I was introduced to structured logs at work, and this ol’ hacker thinks that is a darn good idea.
For a new program I’m writing, I wanted to put that into use.
The program uses AWS Lambdas, and the log entries for the Lambdas end up in CloudWatch Logs.
Unfortunately, in its default configuration, the output is less than useful:
Default configuration structured logs
AWS has configured the default Python logger in the Lambdas to automatically put the timestamp and the HTTP API request ID from the context in the display when the log line is collapsed.
When you expand the log line, you can see the additional detail in structured JSON.
That timestamp is duplicated in the column to the left, and the UUID is really not useful in this context.
What I’d rather see is the
event
that caused the line to be logged and any corresponding
error message
.
Enhanced configuration structured logs
It took some trial and error to make this happen.
This post describes that process in case I or anyone else needs this in the future.
The Usefulness of Structured Logs
I believe the widespread use of format strings in logging is based on two presumptions:
The first level consumer of a log message is a human.
The programmer knows what information is needed to debug an issue.
I believe these presumptions are
no longer correct
in server side software.

Paul Querna
This quote is from a 2011 blog post.
It’s only now that I’m getting involved with troubleshooting distributed systems running on AWS that I appreciate the value of Paul’s insight.
The ability to
search
the contents of log files combined with the ability to correlate log messages from disparate programs is a real game-changer.
(This coming from a programmer who still feels most comfortable trolling through
/var/log
with liberal
grep
and
awk
commands.)
I’ve seen the light.
And so with this new effort, I’m using the Python
Structlog package
to simplify the building of the stuctured logs.
The problem is that AWS is too smart for its own good.
When you use the AWS-supplied Python installation, it:
Sets the log level to WARN, and
Sets the format string to include the timestamp and UUID of the Lambda call in front of anything you want to log.
Both of those are really annoying.
The way to get around the first is somewhat cumbersome, as
this answer on Stack Overflow describes
.
The nicest solution—if you are using Python 3.8 or higher—is to use the
force=true
on the
logging.basicConfig
call:
1
2
3
4
5
6
logging
.
basicConfig
(
format
=
»
%(message)s
»
,
stream
=
sys
.
stdout
,
level
=
logging
.
DEBUG
,
force
=
True
,
)
The second line of this code snippet is the start of the solution to address the second problem described above—it clears out the AWS-supplied formatting string.
In its place, we will put our own formatted string.
Tricking CloudWatch to Display Useful Content
I couldn’t find this documented anywhere, but th…

Issue 82: Personal Digital Library, Video Preservation, Selling Prayers, and Library Ebook Legisl…

Issue 82: Personal Digital Library, Video Preservation, Selling Prayers, and Library Ebook Legislation

The People Have Spoken
On a whim, last Thursday I put out a poll with the announcement of
last week’s issue
.
Out of the three threads,
controlled digital lending
,
gamers and NFTs
, and
cats
, the winner was cats.
The sample size was small—five votes—so I’m not ready to throw out the digital quill pen yet.
But if it readers want cats, readers shall have cats.
I have plenty of cat pictures.
And keep the feedback coming.
The threads this week:
Attorney General of India’s Online Collection of Rare Books
«Inside WWE’s massive video vault»
Prayers For Sale
Ebooks Wanted For Sale (for reasonable terms)
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to
DLTJ’s Thursday Threads
, visit the
sign-up page
.
If you would like a more raw and immediate version of these types of stories,
follow me on
Mastodon
where I post the bookmarks I save. Comments and tips, as always, are welcome.
Attorney General of India’s Online Collection of Rare Books
Attorney General K.K. Venugopal has granted public access to a wide collection of rare books in his library, through a website. It lists over 570 books, some of which date back to the 17th century. The ‘antiquarian’ or rare book collection has been digitally scanned and made available for the public. The publications cover a wide range of subjects, from religion, mythology and the Vedas, to Indian art and sculpture, historical battles, the British Empire in India and tales of travels across the world.
The website, however, clarifies that these books are not copyrighted in India, either because the copyright has expired or because the books are not covered under the Indian copyright laws. It adds that while readers located anywhere in India can download them, those located outside India should check their country’s laws before downloading content from the website. The website also makes it clear that the books uploaded are for ‘personal or research use only, and not for commercial use or exploitation.

Attorney General KK Venugopal converts his rare book collection into public online library
The Print (India), 25-Mar-2020
I learned about this article and corresponding website during the
Controlled Digital Lending Implementers (CDLI)
monthly forum.
Aishwarya Chaturvedi, LL.M. candidate from the Cornell Law School, spoke about copyright law in Inda relative to efforts to start a controlled digital lending practice at the forum, and she included mention of
Mr. Venugopal’s library website
.
It is a WordPress site with the books embedded with a PDF reader, and some of the books are relatively recent—1980s and one from 1994.
Ms Chaturvedi has a preprint in SSRN:
Digital Libraries, Copyright and the COVID-19 Pandemic: A Comparative Study of India and The United States
.
It is a cross-cultural exploration of the legal mechanics of d…

Issue 83: Author’s CDL Thoughts, WWE’s Monopsony, Child’s Library Book

Issue 83: Author’s CDL Thoughts, WWE’s Monopsony, Child’s Library Book

Greetings from the wintery mix that is central Ohio.
The local school district called off school yesterday afternoon in preparation for what came today.
Also yesterday: Ohio’s own «Buckeye Chuck»
predicted an early spring
.
Let’s be grateful for snow days (and teenagers who shovel snow) and for predictions of early spring.
In the meantime, the threads this week:
Author Speaks Up for Controlled Digital Lending
The Wrestling Monopsony
Self-publishing the Local Way
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to
DLTJ’s Thursday Threads
, visit the
sign-up page
.
If you would like a more raw and immediate version of these types of stories,
follow me on
Mastodon
where I post the bookmarks I save. Comments and tips, as always, are welcome.
Author Speaks Up for Controlled Digital Lending
The controversial tweet.
The Big Five publishing houses’ share of the approximately $25 billion book publishing market is estimated at 80%. And it’s Big Publishing that is indeed throwing its weight around by suing the Internet Archive (the “org making reproductions” referenced here, which is actually a California state library and leading institution for digital preservation, not some random “org”).
Again, [Controlled Digital Lending] provides the legal framework for any library to make
one copy
of
one paper book
that it owns and loan it to
one patron
at a time.

What Kind of Writer Accuses Libraries of Stealing?
Maria Bustillos on Popula, 22-Jan-2022
Maria Bustillos wrote approvingly of controlled digital lending (CDL) in a quoted tweet of the Internet Archive. In response, she received a flurry of negative responses that seem to misunderstand a fundamental tenant of CDL: the own-to-loan ratio. If a library owns a copy of a book and takes the steps to physically sequester it, the library can loan a digital copy to patrons. I’ve read a lot on the library’s perspective of CDL, and it was useful to hear how an
author’s
perspective aligns with the goals of the library.
The Wrestling Monopsony
In the 70s, there were 32 wrestling promoters in the North American market, all competing for audiences and performers, all bidding to sew up TV rights with different broadcasters. Wrestlers like Andre the Giant were able to improve their working conditions by playing off rival leagues against one another.
In a single lifetime, the market has collapsed, with 85% market-share going to WWE and McMahon, the billionaire major Trump donor whose loyalty was rewarded when his wife Linda, a WWE executive, was given a plum job as head of Trump’s Small Business Administration.

Grappling with Big Wrestling: Vince McMahon has a monopoly on violence
, Cory Doctorow’s Pluralistic, 31-Jan-2022
Pulling through a thread from last week about
Worldwide Wrestling Entertainment’s video archive
…I included a quote from WWE’s Director of M…

Starting a Python-oriented Serverless-dot-com Project

Starting a Python-oriented Serverless-dot-com Project

In the past few months, I’ve created about a half-dozen projects using «serverless» infrastructure on Amazon Web Services (AWS).
(And I’m about to start another one.)
Over the course of these projects, I’ve refined my development environment into something that I think is useful to share, so read on for how to make Python, Node, and Serverless.com work together and work independently from your other projects.
About «Serverless»
«Serverless» is both a term for a kind of computing environment and the name of a framework that helps manage such environments.
As a computing environment, «serverless» abstracts away the needs to manage the servers and underlying operating systems from the task of writing and running code.
If you assume that a fully-patched server at the required capacity is ready and waiting to run your code, then a serverless environment allows the developer to focus on just running the code.
Someone else will deal with the other parts.
AWS’
Lambda
is probably the best known, but other major cloud computing environments (
Microsoft Azure
,
Google Cloud Services
,
Cloudflare Workers
) and datacenter tools (
Apache OpenWhisk
,
Kubeless
) have the same thing.
»
serverless.com
» is also the name of a specific framework that helps developers manage serverless environments.
It takes care of the tasks of bundling up code, setting up the appropriate triggers (web APIs, message queues, etc.), managing versions, and similar tasks.
To make matters even more confusing, «Serverless.com» is also a service for managing workloads in serverless environments…so hopefully you can see that talking about «serverless» quickly gets one to «what ‘serverless’ are you talking about?»
As far as understanding serverless-the-framework, I recommend skipping the homepage and going right to the
framework documentation
.
Building Up the Environment
There is one globally-installed prerequisite that I use:
pipenv
.
Pipenv creates isolated Python environments…the python executable and installed modules for the project are separated from those of the underlying operating system.
There are many isolated Python environment tools—
pipenv
,
virtualenv
,
poetry
—but I’ve used pipenv for a long time and it has the advantage of working with Eugene Kalinin’s
nodeenv
project: a Node isolation tool that integrates with pipenv.
In other words, in one directory I’m getting both Python isolation and Node isolation.
The numbered steps below are the sequence of commands to set this up. If you want to see what an empty shell looks like—along with some strong opinions about how I like to set up Serverless for myself—check out this GitHub repository:
dltj/serverless-template
.
mkdir serverless_project && cd serverless_project
— create an empty directory and change into it
PIPENV_VENV_IN_PROJECT=1 pipenv install
— create an isolated installation of Python in this environment
[note 1]
pipenv install –dev nodee…

Issue 84: Chips Go Bad, Learn From Our Cyber Mistakes, Automation at the USPS

Issue 84: Chips Go Bad, Learn From Our Cyber Mistakes, Automation at the USPS

The invoice is in.
This reengineered blog and the reinvigorated
Thursday Threads
newsletter cost just US$2.51 last month.
All of that cost is in the blog construction and delivery.
The cost of delivering the newsletter alone falls well below
AWS’ always-free tiers of service
.
Not bad!
And as always, no internet trackers or surveillance capitalism is involved.
The threads this week:
When Bugs Come from the Chips, not the Code
Learning From Our Cyber Mistakes
Automation at the United States Postal Service
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to
DLTJ’s Thursday Threads
, visit the
sign-up page
.
If you would like a more raw and immediate version of these types of stories,
follow me on
Mastodon
where I post the bookmarks I save. Comments and tips, as always, are welcome.
When Bugs Come from the Chips, not the Code
Imagine for a moment that the millions of computer chips inside the servers that power the largest data centers in the world had rare, almost undetectable flaws. And the only way to find the flaws was to throw those chips at giant computing problems that would have been unthinkable just a decade ago.
As the tiny switches in computer chips have shrunk to the width of a few atoms, the reliability of chips has become another worry for the people who run the biggest networks in the world. Companies like Amazon, Facebook, Twitter and many other sites have experienced surprising outages over the last year.

Tiny Chips, Big Headaches: As the largest computer networks continue to grow, some engineers fear that their smallest components could prove to be an Achilles’ heel
, New York Times, 7-Feb-2022
We have all experienced unexplained computer errors.
The software programmer among us cringe and think about what they possibly did wrong.
Did I use the wrong variable in that loop, did I miss a
hyphen
?
What if the programmer did everything correctly and the computer just «glitched»?
Modern computers have many layers of redundancy built into them—error-correcting memory, multi-drive storage volumes, checksums on blocks of data, and so forth.
This article from the
New York TImes
points to a new cause…the physics of electrons moving over very small spaces.
As the hardware architects press for smaller, faster, more electrically efficient chips, they will more often face this challenge and need to account for it in their designs.
Learning From Our Cyber Mistakes
The new Cyber Safety Review Board is tasked with examining significant cybersecurity events that affect government, business and critical infrastructure. It will publish reports on security findings and recommendations, officials said…
The board, officials have said, is modeled loosely on the National Transportation Safety Board, which investigates and issues public reports on airplane crashes, train derailments…

You’re getting «Invalid request provided: AWS::CloudFront::PublicKey» because CloudFront Public K…

You’re getting «Invalid request provided: AWS::CloudFront::PublicKey» because CloudFront Public Keys are immutable

This is the web page I wish I had found when I spent the afternoon sorting through why AWS CloudFormation kept telling me:
Resource handler returned message: «Invalid request provided: AWS::CloudFront::PublicKey»
Like me, you might be working on a Serverless.com stack and are trying to restrict access to items in an S3 bucket through CloudFront.
You might even be putting the public key text block into a YAML multiline string in an external configuration file and pulling that into your
serverless.yml
file.
And you are pulling your hair out because when you run updates on your stack, you get this error.
So in frustration, you blow away the stack and recreate it.
It works fine at first, but soon you are back at that same error above.
Do you want to know why?
An
AWS::CloudFront::PublicKey
resource is immutable, you idiot.
(Me idiot, actually. Hopefully you are fortunate in finding this page early in your quest to solve the problem.)
The clue came from
this issue report in the CloudFormation coverage roadmap page
:
As mentioned in the API documentation :
UpdatePublicKey
UpdatePublicKey action lets you update just the
Comment
field. The values
EncodedKey
and
Name
are immutable, and cannot be updated once created. To update the Key or the Name, a new PublicKey must be created using CreatePublicKey and use it.
The resources section of my
serverless.yml
file looks like this:
1
2
3
4
5
6
7
WebsiteDistributionPublicKey
:
Type
:
AWS::CloudFront::PublicKey
Properties
:
PublicKeyConfig
:
Name
:
${self:custom.stack_name}
CallerReference
:
${self:custom.config.PUBLIC_KEY_CALLER_REFERENCE}
EncodedKey
:
${self:custom.config.PUBLIC_KEY_ENCODED}
I’m using
Rich Buggy’s ‘Keeping secrets out of Git’ technique
to store secrets outside of the
serverless.yml
file, so I have a custom section that looks like this:
1
2
3
4
5
custom
:
default_stage
:
dev
stage
:
${opt:stage, self:custom.default_stage}
stack_name
:
${self:service}-${self:custom.stage}
config
:
${file(config.yml):${self:custom.stage}}
… which reads in this file:
1
2
3
4
5
6
7
8
9
10
11
12
13
default
:
&default
<< : *default PUBLIC_KEY_CALLER_REFERENCE : SomeRandomString PUBLIC_KEY_ENCODED : | -----BEGIN PUBLIC KEY----- MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAwU37058NQTUqEHBor95x VZ1iezIzZB7MWoYHt4KCRDVw5G3h/pzDKLu2NKo+rVOBztgQ+cefdqBNWa2Mf4Tl YQxOP9m978C2f4H9tc8c2px9Lxdkh27Vd8xZx/JHPvnqTUYP/p6WNa+jLVm6TV7a mL5QqrURd9OpOoyrfKmzhkJwrBxhT8WlchKmnd3S+dotAFdOgb8aABtdIEoCvKYq +MeAeBrsE1UhennDU/yWfNl2deGUCUnhkWPHDmLgObr/iYGZamdnp6InjUX2PLsC leQuc1M13904QKX+0wfUNin6IK9Pn+UmLupQSg0ou533Nxkw69KLZRAvoOHJlZJW BwIDAQAB -----END PUBLIC KEY----- ... and populates the variables you saw in the fragment at the top. (If you've read this far and are interested in how I set up serverless.com projects, check out the blog post I wrote earlier this week on the topic.) The practical upshot is if any...

Issue 85: Privacy-busting Journal Article Fingerprints, Fraud in NFTs, Improve Your Life

Issue 85: Privacy-busting Journal Article Fingerprints, Fraud in NFTs, Improve Your Life

The middle of February already.
Time is flying; I hope you are having fun.
The threads this week:
Privacy-busting Fingerprints in Journal Articles
Fraud in NFTs
Improve Your Life
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to
DLTJ’s Thursday Threads
, visit the
sign-up page
.
If you would like a more raw and immediate version of these types of stories,
follow me on
Mastodon
where I post the bookmarks I save. Comments and tips, as always, are welcome.
Privacy-busting Fingerprints in Journal Articles
One of the world’s largest publishers of academic papers said it adds a unique fingerprint to every PDF users download in an attempt to prevent ransomware, not to prevent piracy.
Elsevier defended the practice after an independent researcher discovered the existence of the unique fingerprints and shared their findings on Twitter last week.
“The identifier in the PDF helps to prevent cybersecurity risks to our systems and to those of our customers—there is no metadata, PII [Personal Identifying Information] or personal data captured by these,” an Elsevier spokesperson said in an email to Motherboard. “Fingerprinting in PDFs allows us to identify potential sources of threats so we can inform our customers for them to act upon. This approach is commonly used across the academic publishing industry.”
When asked what risks he was referring to, the spokesperson sent a list of links to news articles about ransomware.

Academic Journal Claims it Fingerprints PDFs for ‘Ransomware,’ Not Surveillance
, Motherboard from Vice, 31-Jan-2022
Pretty incredulous…adding unique identifiers to the metadata of each PDF downloaded from Elsevier (the «fingerprint») somehow protects against ransomware.
Extraordinary claims require extraordinary proof, and it is not forthcoming from Elsevier.
I’ve seen no follow-ups from Elsevier on this Motherboard article, nor from the researcher that
discovered the fingerprinting
.
Look, if you’re employing a technique to go after researchers sharing PDFs of articles, own up to it.
I can see why you don’t want to, Elsevier…shared articles might cut into that $40-per-article charge you put on non-subscribers.
Either way…owning it or lying about it looks bad.
I can think of no plausible scenario where fingerprints in PDF files detect, prevent, or help prosecute ransomware.
Fraud in NFTs
[Cameron] Hejazi highlighted three main problems: people selling unauthorised copies of other NFTs [Non-Fungible Tokens], people making NFTs of content which does not belong to them, and people selling sets of NFTs which resemble a security.
He said these issues were «rampant», with users «minting and minting and minting counterfeit digital assets».
«It kept happening. We would ban offending accounts but it was like we’re playing a game of whack-a-mole… Every time we …

Issue 86: Tracking Media Provenance, Digital Classroom Surveillance, Don’t Pixelate to Redact, An…

Issue 86: Tracking Media Provenance, Digital Classroom Surveillance, Don’t Pixelate to Redact, Android In-App Advertising

I’ve deleted what I originally had here as newsletter-opening-banter. These are serious times. I think the world has radically changed overnight, and roughly 7.9 billion of us are not in positions to do anything about it. To those that are in positions to do something about it and to those that are caught up in the effects of one man’s decision to impose
his
will on others: may you be safe, may you succeed, and may you find peace. For those coming to this after early 2022, yesterday Russia invaded the sovereign country of Ukraine.
Russia invaded the sovern country of Ukraine
The threads this week:
Specification for Media Content Provenance
Encroaching on Digital Privacy in the Classroom
Pixelation for Redaction → bad
Google Changes Up In-App Advertising
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to
DLTJ’s Thursday Threads
, visit the
sign-up page
.
If you would like a more raw and immediate version of these types of stories,
follow me on
Mastodon
where I post the bookmarks I save. Comments and tips, as always, are welcome.
Specification for Media Content Provenance
Today, the Coalition for Content Provenance and Authenticity (C2PA), an organization established to provide publishers, creators, and consumers with opt-in, flexible ways to understand the authenticity and provenance across various media types, released version 1.0 of its technical specification for digital provenance. This specification is the first of its kind and empowers content creators and editors worldwide to create tamper-evident media, by enabling them to selectively disclose information about who created or changed digital content and how it was altered. The C2PA’s work is the result of industry-wide collaborations focused on digital media transparency that will accelerate progress toward global adoption of content provenance.

C2PA Releases Specification of World’s First Industry Standard for Content Provenance
, Coalition for Content Provenance and Authenticity, 26-Jan-2022
Elements of the C2PA specification.
[Source]
This is a fascinating development.
Although the target audience for this technology is news organizations and citizen journalists to provide a way to establish the creator and editors of media, one could easily envision using this standard to mark images, video, and audio from digital archives.
As a way of combatting problems like manipulated media and «deep fakes», the specification would allow news organizations to cryptographically «sign» the media in a way that a display tool—via a media tool on your device or a browser plugin—would be able to decode and display to the viewer.
If the cryptographic signature doesn’t match the one published by the news organization, you would know that the media has been changed.
Or, from the perspective of an…

Five Years and Ten Months

Five Years and Ten Months

I reached a new milestone this month.
A minor one in the grand scheme of things, but one worthy of a few remarks nonetheless.
This month marks my longest tenure with an employer at five years and 10 months.
I’ve now worked at Index Data longer than I had at OhioLINK a decade ago, and this has hardly felt like almost six years.
Seven employers in 31 years.
All of it in library technology—one of the last places I thought I would land with an undergraduate degree in Systems Analysis.
Seven doesn’t seem like a lot, but at almost all of them I thought «I could see myself completing my career here.»
(This definitely feels true for my current employer.)
But life intervened and a change was made…always for the better.
I’m so grateful for all of the people that mentored me and the couple that kicked my butt along the way (in retrospect, at least).
Hopefully I’ve managed to give back in equal measure to those coming into the field.
Career history (as of today)
Open Source Community Advocate
Index Data · Jun 2016 – Present · 5 yrs 10 mos
Dev/Ops Lead and Project Manager
The Cherry Hill Company · Aug 2015 – Dec 2016 · 1 yr 5 mos
Assistant Director, Technology Services Development
Lyrasis · Sep 2010 – Jun 2015 · 4 yrs 10 mos
Assistant Director, Multimedia Services; Assistant Director, New Service Development
OhioLINK · Jan 2005 – Sep 2010 · 5 yrs 9 mos
Computer Services Librarian (Law School); Area Head for Library Information Technology Services; Assistant to the Director for Technology Initiatives
University of Connecticut · Feb 2000 – December 2004 · 4 yrs 11 mos
Library Systems Manager
Case Western Reserve University · Jul 1995 – Feb 2000 · 4 yrs 8 mos
Library Systems Manager
Miami University · Jun 1991 – Jun 1995 · 4 yrs 1 mo

Issue 87: Ukraine War, Artificial Intelligence Art

Issue 87: Ukraine War, Artificial Intelligence Art

We are one week into Russia’s war against Ukraine.
From here in America, it is hard to understand the reality of a country whose citizens seemed to be going about normal lives just a short time ago.
I find it also hard to know what to say to people whose misery comes about on the whims of a dictator guided by…what?
A misguided notion of history?
A deep-seated desire to return to former glory?
A vain attempt to show how big his manhood is?
Who can tell?
Beyond asking my elected officials to
do something
and tweeting expressions of support, I’m feeling powerless to change what is happening.
I hope and pray for a return to sanity, for grace and mercy for those in conflict, and for a world that strives to find a greater, common good.
The threads this week:
One Library-related Corner of the Ukraine War
Archiving the Ukrainian Web
Artificial Intelligence Can’t Hold Copyright
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to
DLTJ’s Thursday Threads
, visit the
sign-up page
.
If you would like a more raw and immediate version of these types of stories,
follow me on
Mastodon
where I post the bookmarks I save. Comments and tips, as always, are welcome.
One Library-related Corner of the Ukraine War
Nicholas Poole tweet
Dear colleagues,
The sneaky, cruel and bloody aggression of the Russian Federation has prevented us from implementing our plans and holding March 1-4 XII International Scientific Conference «Modern Library-Information Continuous Education: what, how, for whom? «.
65 participants registered at the conference, re-calculated the registration fee of 35 members of the VGO Ukrainian Library Association total amount of 10 500 UAH.
The Organizing Committee of the Conference has decided to hold the Conference after our confident victory, and the contributions collected to support the Armed Forces of Ukraine.
We promise to provide complete and quality service to all participants in the peaceful time.
Glory to Ukraine!
For questions, please contact the Executive Office of the Association by email.
—Facebook-supplied translation of the
announcement of the postponing of a library conference
by the Ukrainian Library Association, 28-Feb-2022
Nicholas Poole, CEO of CILIP in the UK, has a poetic take on this announcement from the Ukraine Library Association.
Facebook’s automated translation from Ukrainian to English (quoted above) sounds a little dry; I’m left wondering how this reads in the original Ukrainian.
Archiving the Ukrainian Web
[Ian Milligan, associate professor of history at the University of Waterloo,] points out that in 50 years, historians will not only be curious about how people got their information and how it shaped their worldviews but also what kind of information archivists saved about this conflict.

Ukrainian Websites Are Going Dark. Archivists Are Trying To Save Them
, Motherboard on Vi…

Issue 88: Battling Censorship, Considering the Right to be Forgotten

Issue 88: Battling Censorship, Considering the Right to be Forgotten

For this week’s newsletter introduction, I searched the Flikr service for photographs of libraries in Ukraine.
I thought that putting a picture here at the top of a grand reading room with dark wood shelves and neat rows of books would help us remember that a significant part of our world has been turned upside down.
What I didn’t expect to find was an album titled
‘November 2021: Strategic Session on Digital Education Hubs development’
.
Attendees of the strategic session on Digital Education Hubs development.
Source
, CC By-ND
Four months ago, these professionals were gathered together in a room to hear presentations, sort multi-color post-it notes on flip charts, and work together for «the transformation of libraries into Digital Education Hubs».
That is a scene that is very familiar to me, and quite possibly to many of my readers as well.
Now their country is being bombed, its citizens are fleeing, and I doubt anyone is thinking about the transformation of libraries.
Let’s not forget them.
The threads this week:
Minecraft as an Anti-censorship Tool
Right-to-be-Forgotten Tangled with Press Freedoms
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to
DLTJ’s Thursday Threads
, visit the
sign-up page
.
If you would like a more raw and immediate version of these types of stories,
follow me on
Mastodon
where I post the bookmarks I save. Comments and tips, as always, are welcome.
Minecraft as an Anti-censorship Tool
When schools ban books, the strategy often backfires on would-be censors, resulting in greater interest around illicit literature. Similarly, when governments censor the media, groups like Reporters Without Borders spearhead efforts to make such censored material extra visible. Their Uncensored Library project brings together architecture and journalism in an unlikely virtual reality space: the interactive gaming world of Minecraft.

Uncensored Library: Banned Journalism Housed in Virtual Minecraft Architecture
, 99% Invisible, 3-Mar-2022
With help from my teenage son, I got into the
Uncensored Library
on Minecraft.
(A hint for those trying to access it in early 2022: the
instructions
say you need a specific version of Minecraft—that version is now 1.16.5 instead of what is listed in the PDF.)
The «Frequently Asked Questions» book in this world starts with this answer: «Minecraft is available even in countries with cyber censorship. So we build this library to provide a platform for censored journalists, connect people around the world and bring back the truth.»
The content of the library is curated—you don’t have the option of modifying the elements in the Minecraft world.
The books in the library are short…the ones that I saw were each several hundred words long.
Right-to-be-Forgotten Tangled with Press Freedoms
The “right to be forgotten,» which exists in European Union …

Sanctioning Governments on the Internet

Sanctioning Governments on the Internet

What a strange article title to type:
Sanctioning Governments on the Internet.
What does that even mean?
Who would decide?
Who would implement the decision?
To say nothing of the consequences of trying to impose an Internet Sanction on a government or a country.
The internet as we know it is a quirky beast.
It is called «inter-net» because it is formed as the interconnection of independent networks plus a healthy dose of human capital (and independent streams of monetary capital), reliance on openly-published and open-ended standards, interpersonal trust, and—quite frankly—quite a bit of luck.
You might think of «the internet» as one big thing, but in reality it is many smaller things hooked together by common agreement.
The internet connection at my house comes from an Internet Service Provider (ISP).
My ISP connects to one or more (likely many more than one) other ISPs and transit providers.
Through those interconnections, a message I’m typing here will be sent to a computer across town, across the country, and across the world.
It works like this because many decades ago, a bunch of people got together to agree on the methods and rules computers would use to communicate with each other.
A guiding philosophy was to make those methods and rules simple and easy to implement.
Another guiding principle was to build up layers of complexity that relied on the functionality of the layers below it.
At the bottom-most layer, the network equipment moves messages along a path from a sending computer to a receiving computer. That equipment doesn’t understand or care what was in the messages…it just knows how to get the message one hop closer to its destination.
On top of that is a set of rules (a «protocol») for ensuring all messages get from the sender to the receiver and describing how to retransmit if something is missing.
On top of that is a protocol for translating human-readable names into computer-understandable addresses.
On top of that, a protocol for requesting and receiving a file.
Then a specification for how to arrange text on a page.
Lastly, a web browser that understands that specification and knows how to ask the layer below it to retrieve an HTML file from a faraway server.
The network layer at the bottom doesn’t know the difference between an HTML file and a snippet of voice on a Zoom call, and the browser at the top doesn’t know how the file got to it.
It is the common agreement on the protocols and specifications across decades of work that put this page in front of your eyes.
So about those key components of the «inter-net»:
human capital
: coming to agreement takes time, and humans need to bring their priorities, their experiences, their knowledge, and their biases to the table to work to a common agreement.
monetary capital
: every network that is a part of the «inter-net» is paying for its piece to connect its users to is neighboring networks; there isn’t one singular …

Issue 89: Ukraine’s Libraries, Russia’s Internet, and the Big Deal

Issue 89: Ukraine’s Libraries, Russia’s Internet, and the Big Deal

The first story below is one from National Public Radio on Ukraine libraries’ efforts are undertaking.
Let’s not forget the terror they are facing, the people stepping up to meet their community’s needs, and those who have lost their lives in the Russian war.
The threads this week:
Ukraine Libraries Doing What Libraries Do
Can the Internet Sanction a Country? Should It?
Thursday Threads 2011
: The Demise of the Big Deal?
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to
DLTJ’s Thursday Threads
, visit the
sign-up page
.
If you would like a more raw and immediate version of these types of stories,
follow me on
Mastodon
where I post the bookmarks I save. Comments and tips, as always, are welcome.
Ukraine Libraries Doing What Libraries Do
«Refugee reception points, hostels and logistics points are organized here,» [Oksana Brui, president of the Ukrainian Library Association] said. «Camouflage nets for the military are also woven here. Home care courses are held here. Books are collected here to be transferred to libraries in neighboring countries that receive Ukrainian refugees.»

Ukraine’s libraries are offering bomb shelters and camouflage classes
, NPR, 9-Mar-2022
I’m not surprised.
I presume the libraries mentioned in the NPR article are «public libraries,» but they could be libraries of any type.
It brings to mind the stories about the
library in Ferguson
, Missouri, during the riots for the shooting of Michael Brown by local police.
The NPR story also mentions Nicholas Poole’s «we will reschedule just as soon as we have vanquished our invaders» tweet that was in
Thursday Threads
two weeks ago
.
Can the Internet Sanction a Country? Should It?
The invasion of Ukraine poses a new challenge for multistakeholder Internet infrastructure governance. In this statement, we discuss possible sanctions and their ramifications, lay out principles that we believe should guide Internet sanctions, and propose a multistakeholder governance mechanism to facilitate decision-making and implementation.

Multistakeholder Imposition of Internet Sanctions
[PDF], Packet Clearing House, 10-Mar-2022
Last week,
Ukraine’s Ministry of Digital Transformation called on internet bodies to sanction Russia
over its government’s war on Ukraine.
This would include revoking Russia’s top-level country domains (e.g. «.ru»), canceling SSL certificates associated with Russian sites, disabling the root DNS servers, and withdrawing the right for Russian internet service providers to use the IP addresses that have been assigned to the country.
The
Multistakeholder Imposition of Internet Sanctions
document describes why this would be a bad idea and lays out a plan for what can be done.
For more depth, see the
article I wrote last week
on the document.
Thursday Threads 2011
: The Demise of the Big Deal?
Looking backward, the
Th…

Issue 90: When Machine Learning Goes Wrong

Issue 90: When Machine Learning Goes Wrong

The People of Ukraine are not forgotten.
The Tufts University newspaper published an article this week about
a multinational effort
to preserve the digital and digitized cultural heritage of the country.
On the other side of the war,
Russian citizens are downloading Wikipedia
out of fear of more drastic network filtering or collapse of Russia’s connections to the global internet.
Eleven years ago this week, the judge overseeing the Google Book Search case (
Authors Guild v. Google
) ruled that the proposed settlement was not «not fair, adequate, and reasonable.»
As you might recall, the proposal was for a grand vision of a book author rights clearinghouse—not unlike what is in place for the music industry.
I had a
Thursday Threads
entry that
covered the initial reactions from the litigants, legal observers, and the library community
.
In writing this week’s article, I learned that machine learning is a subset of the artificial intelligence field.
While the terms are often used interchangeably, machine learning is one part of artificial intelligence.
As the
Columbia University Engineering Department describes it
, «put in context, artificial intelligence refers to the general ability of computers to emulate human thought and perform tasks in real-world environments, while machine learning refers to the technologies and algorithms that enable systems to identify patterns, make decisions, and improve themselves through experience and data.»
With that definition in mind, the thread this week is on challenges with machine learning:
Flip the Switch on Your Drug Synthesizing Tool and Chemical Weapons Come Out
With Machine Learning, Garbage In/Garbage Out
Five Realities Why Applying Machine Learning to Medical Records is Hard
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to
DLTJ’s Thursday Threads
, visit the
sign-up page
.
If you would like a more raw and immediate version of these types of stories,
follow me on
Mastodon
where I post the bookmarks I save. Comments and tips, as always, are welcome.
Flip the Switch on Your Drug Synthesizing Tool and Chemical Weapons Come Out
This generative model normally penalizes predicted toxicity and rewards predicted target activity. We simply proposed to invert this logic by using the same approach to design molecules de novo, but now guiding the model to reward both toxicity and bioactivity instead.
In less than 6 hours after starting on our in-house server, our model generated 40,000 molecules that scored within our desired threshold. In the process, the AI designed not only VX, but also many other known chemical warfare agents that we identified through visual confirmation with structures in public chemistry databases. Many new molecules were also designed that looked equally plausible.
—Urbina, F., Lentzos, F., Invernizzi, C.
et al.
Dual use of artificial-intelligence-powere…

Trip Report: NISO Plus Forum 2022

Trip Report: NISO Plus Forum 2022

Earlier this week, NISO held its one-day
NISO Plus Forum for 2022
.
This was an in-person meeting that is intended to feed into the online conference in February 2023.
Around 100 people from NISO’s membership groups—libraries, content providers, and service providers—attended to talk about
metadata
.
The meeting was structured in
World Café
style
and was moderated by
Jonathan Clark
.
The broad topic of «metadata» was broken down into three parts:
Identifiers: what identifiers are missing or underutilized
Exchange: what is the most significant barrier to seamless exchange?
Structure: what is impossible due to a lack of appropriate structures?
There were small table discussions for each part of no more than six people, with 15 minutes at a table before everyone got up and moved to a new table.
After three rounds of 15 minutes, a scribe that stayed at the same table the whole time reported the major themes to the larger group.
What makes this style interesting is that everyone’s experience is different.
We agreed to use the
Chatham House Rule
; what is reported here is my interpretation of my table’s discussion and my take on the broader outcomes.
Edited on 5-Oct-2022 to add:
NISO published a summary of the in-person meeting in the October issue of NISO I/O —
Are You Ready? Metadata — The Musical!
.
Identifiers
The most fascinating idea I discovered here was how much the metadata ecosystem relies on «Publication Date».
Not only do several parts use publication date as an anchor, but different understandings of the meaning of «publication date» cause many problems downstream.
There is the online publication date, the physical publication date, and sometimes simply an unlabeled publication date.
Some publishers have a practice of changing an online publication date to the physical issue date when the issue comes out.
(Changing a field that others use as part of metadata to distinguish one item from another is never a good thing.)
«Place of Publication» also has a lot of variability and inconsistency, even within a publisher.
Institution identifiers were also a topic, particularly with the lack of hierarchy in the
Research Organization Registry
(ROR).
Someone reported that ROR is working to address the problem, but right now there is not a good way to relate a department to its encompassing agency or organization.
I showed my professional age a bit by mentioning
SICI
—the Serial Item and Contribution Identifier.
This is a compound identifier developed in the 1990s. Given a citation, you could construct a SICI that was a kind of key to the article. For instance,
Lynch, Clifford A. «The Integrity of Digital Information; Mechanics and Definitional Issues.» JASIS 45:10 (Dec. 1994) p. 737-44
…could be condensed into…
0002-8231(199412)45:102.3.TX;2-M
This standard didn’t last past the early 2000s, although a few people at my table mentioned that they saw examples of this identifier in their backfile as the …

Automatically Generating Podcast Transcripts

Automatically Generating Podcast Transcripts

I’m finding it valuable to create annotations on resources to index into my personal knowledge management system.
(The
Obsidian journaling
post from late last year goes into some depth about my process.)
I use the
Hypothesis
service to do this—Hypothesis annotations are imported into Markdown files for Obsidian using the custom script and method I describe in that blog post.
This works well for web pages and PDF files…Hypothesis can attach annotations to those resource types.
Videos are relatively straight forward, too, using Dan Whaley’s
DocDrop
service; it reads the closed captioning and puts that on an HTML page that enables Hypothesis to do its work.
What I’m missing, though, are annotations on podcast episodes.
Podcast creators that take the time to make transcripts available are somewhat unusual.
Podcasts from NPR and NPR member stations are pretty good about this, but everyone else is slacking off.
My task management system has about a dozen podcast episodes where I’d like to annotate transcripts (and one podcast that seemingly
stopped
making transcripts just before the episode I wanted to annotate!).
So I wrote a little script that creates a good-enough transcript HTML page.
You can see a
sample of what this looks like
(from the
Search and Ye Might Find
episode of 99% Invisible).
Note!
Of course,
99% Invisible
has now gone back and added transcripts to all of their episodes, including
the one used in this example
. Thanks? … No really, thank you 99PI!
AWS Transcribe
to the rescue
Amazon Web Services has a
Transcribe
service that takes audio, runs it through its machine learning algorithms, and outputs a
WebVTT
file.
Podcasts are typically well-produced audio, so AWS Transcribe has a clean audio track to work with.
In my testing, AWS Transcribe does well with most sentences; it misses unusual proper names and its sentence detection mechanism is good-but-not-great.
It is certainly good enough to get the main ideas across to provide an anchor for annotations.
A WebVTT file (of a podcast advertisement) looks like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
WEBVTT

1
00:00:00.190 –> 00:00:04.120
my quest to buy a more eco friendly deodorant quickly started to

2
00:00:04.120 –> 00:00:08.960
stink because sustainability and effectiveness don’t always go hand in hand.

3
00:00:09.010 –> 00:00:11.600
But then I discovered finch Finch is a

4
00:00:11.600 –> 00:00:14.830
free chrome extension that scores everyday products on
After a
WEBVTT
marker, there are groups of caption statements separated by newlines.
Each statement is numbered, followed by a time interval, followed by the caption itself.
(WebVTT can be much more complicated than this…to include CSS-like text styling and other features; read the specs if you want more detail.)
What the script does
The code for this is up
on GitHub
now.
The links to the code below point to the version of software at the time this blog po…

OCLC v. Clarivate: What was MetaDoor? What is an OCLC Record?

OCLC v. Clarivate: What was MetaDoor? What is an OCLC Record?

On November 7, 2022, OCLC and Clarivate announced a settlement in their lawsuit about using WorldCat records in the embryonic MetaDoor service.
This ended the latest chapter in the saga of reuse of library metadata with little new clarity.
The settlement terms were not disclosed, but we can learn a little from the proceedings.
First, let’s review the press releases from the parties.
Then we’ll look at the transcripts of court proceedings to see if we can get closer to answers about some questions this lawsuit raises.
Clarivate’s Statement
Clarivate’s statement about the settlement is quite vague:
Clarivate continues to deny OCLCs allegations of wrong-doing and maintains that the issue lay between OCLC and its customers, who sought to co-create an efficient community platform for sharing of bibliographic records. Clarivate will not develop a record exchange system of MARC records that include records which OCLC has claimed are subject to its policy and contractual limitations. Clarivate will bear its own fees and costs.
Gordon Samson, Chief Product Officer at Clarivate insisted, «Clarivate will continue to support the goals of open research and data exchange – because we believe it is the best way to make the process of research and learning faster, more robust and more transparent. Regardless of business model, when scholarly information is easily accessible and shareable, the dots are easier to join, the connections are explicit, and collaborations are more natural and meaningful. The process of scientific discovery is faster, and it is easier to ensure research integrity and reproducibility. We know that navigating the transition to open research is important to our customers, and we remain committed to helping them make that transition as seamlessly as possible.»
– »
Clarivate and OCLC Settle Lawsuit
«, Clarivate press release issued November 7, 2022
It isn’t clear from this statement whether MetaDoor is done or not.
(We’ll answer the «What is/was MetaDoor?» question below.)
The statement, which matches the language in the OCLC statement, only says that a service that includes OCLC records will not be built.
(We’ll also try to answer the «What is an OCLC record?» question below.)
OCLC’s Statement
OCLC’s statement is only a little less vague:
OCLC is pleased to announce today that it successfully defended WorldCat to protect the collaborative service developed and maintained with and for libraries worldwide.
An agreement has been reached in a lawsuit filed by OCLC in June 2022 against Clarivate and its subsidiaries in the United States District Court, Southern District of Ohio.
Though the settlement document itself is confidential, two significant elements include:
Clarivate, Ex Libris, and ProQuest have ceased the development and marketing of the MetaDoor MARC record exchange system
developed using records that are subject to the WorldCat Rights and Responsibilities Pol…

With Mastodon on the Rise, Who Archives the Digital Public Square?

With Mastodon on the Rise, Who Archives the Digital Public Square?

DALL*E prompt: photorealistic waves of twitter logos and mastodon logos crashing onto a sandy beach
Much has been made about the differences between Twitter and Mastodon: the challenge of finding a home for your account (and the corresponding differences between your “local” timeline and your “global” timeline), the intentional antiviral design choices (no quote-tweets and a narrow search system), and the more-empowering block and mute features.
A recent article in
MIT Technology Review
about
the potential loss to history if Twitter goes away
had me thinking of another one difference: a Mastodon-filled world changes expectations for archiving this kind of primary source material.
Think Bigger Than Mastodon
Let’s set some common ground.
»
Mastodon
» is being used here as a shortcut for the growing federation of servers that follow the ActivityPub protocol—the «fediverse».
Most people caught up in the migration away from Twitter are looking for a «Twitter-equivalent», and the option that has caught the popular imagination is Mastodon.
As we view the fediverse digital public square, we could just as well be talking about Mastodon forks like
Hometown
.
We should also include in the genre-specific ActivityPub software like
Pixelfed
(for photographers,
me there
),
Bookwyrm
(for book groups and reader commentary,
me there
),
Funkwhale
(for music), and
write.as
(for long-form articles).
Although Mastodon is getting the most traction right now, the question of archiving the digital public square is bigger than just Mastodon…just keep that in mind as you read below.
Twitter Archiving Challenges
As the
MIT Technology Review
article points out, there are challenges to archiving Twitter.
For eight years, the US Library of Congress took it upon itself to maintain a public record of all tweets, but it stopped in 2018, instead selecting only a small number of accounts’ posts to capture. “It never, ever worked,” says William Kilbride, executive director of the Digital Preservation Coalition. The data the library was expected to store was too vast, the volume coming out of the firehose too great. “Let me put that in context: it’s the Library of Congress. They had some of the best expertise on this topic. If the Library of Congress can’t do it, that tells you something quite important,” he says.
The challenges include that of scale:
[In January 2013] We now have an archive of approximately 170 billion tweets and growing. The volume of tweets the Library receives each day has grown from 140 million beginning in February 2011 to nearly half a billion tweets each day as of October 2012.

Update on the Twitter Archive at the Library of Congress
, Library of Congress blog, January 2013.
And also of scope—the Library does not receive the multimedia parts of tweets.
As the
whitepaper attached to the Update on the Twitter Archive at the Library of Congress
says:
The Library only receive…

Issue 91: Bibliographic Records and Mastodon Migration

Issue 91: Bibliographic Records and Mastodon Migration

Well, this newsletter was off the air longer than I anticipated.
A lot has happened since
issue 90 in late March
: cryptocurrency value falling, Twitter spiraling (
maybe a death-spiral
…can’t be too sure), and (in the U.S.) a whopper of a mid-term election season.
All is well here in the Jester’s home…I needed some time to build up some more tooling around the blog and newsletter — then summer came, and then fall, and before you knew it, eight months had passed before this issue came out.
Speaking of Twitter…I have mostly left it behind. The «DataG» account is still there, but I have turned off the automated posting and have stopped visiting the site.
I’ve made the migration to Mastodon on the Code4Lib instance; you can find me at
@dltj@code4lib.social
.
If you, too, have made the move, I hope you will follow me there and give me a chance to follow you back.
Threads from 12 years ago are still weaving their way through us today.
In the
11th issue of Thursday Threads
from 2010, I posted, among other things, about the
new free e-journal hosting from University of Pittsburgh on OJS
(and it looks like it is
still available as a service
!), the desire for
open bibliographic data
(and that is still a thing…see below), and the
masters degree in business administration earned through a Facebook app
(which, 12 years later, I would guess is no longer a thing).
I hope you and those close to you are doing well as we enter the last month of 2022.
Don’t be a stranger—drop me a line if you find this interesting or come across something you think I would want to know about.
OCLC versus Clarivate: In the Battle for Bibliographic Records, the Winner is ???
Moving On to Mastodon
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to
DLTJ’s Thursday Threads
, visit the
sign-up page
.
If you would like a more raw and immediate version of these types of stories,
follow me on
Mastodon
where I post the bookmarks I save. Comments and tips, as always, are welcome.
OCLC versus Clarivate: In the Battle for Bibliographic Records, the Winner is ???
Clarivate continues to deny OCLCs allegations of wrong-doing and maintains that the issue lay between OCLC and its customers, who sought to co-create an efficient community platform for sharing of bibliographic records. Clarivate will not develop a record exchange system of MARC records that include records which OCLC has claimed are subject to its policy and contractual limitations. Clarivate will bear its own fees and costs.

Clarivate and OCLC Settle Lawsuit
, Clarivate press release, 7-Nov-2022
Though the settlement document itself is confidential, two significant elements include:
Clarivate, Ex Libris, and ProQuest have ceased the development and marketing of the MetaDoor MARC record exchange system
developed using records that are subject to the WorldCat Rights and Responsi…

Mastodon Instance Operators Report on the Impact of the #TwitterMigration

Mastodon Instance Operators Report on the Impact of the #TwitterMigration

A number of Mastodon operators have started to report the impact of the #TwitterMigration on their instances.
I started gathering these because I was curious about what it takes to run a public or semi-public Mastodon instance.
These reports are full of those kinds of details, but they also describe evolutions of policy and operations that are just as interesting.
If you see other reports (or have posted a report of your own Mastodon instance), please tag me at
@dltj@code4lib.social
and I’ll add it to this list.
Note!
There is now a
branch and pull request
on GitHub where you can suggest changes to this list and/or subscribe to notifications for updates.
Updates to the page
are also available via
RSS-Bridge
. This didn‘t work as I expected it to when the commits got to GitHub. (The pull request was automatically closed.) Will need to rethink this.
sfba.social (San Francisco Bay Area)
This report covers the San Francisco Bay regional Mastodon instance sfba.social, including general statistics, financial details including income (donations) and expenses (hosting costs), moderation efforts, and changes made and considered. It has been wonderful to welcome all of our new friends and neighbors. We have expanded our server capacity and refined our moderation process, including a new version of the code of conduct and updated the server rules to match. This has helped to improve expectations and free our users to be nice and have fun!

Transparency Report (November 2022)
, SFBA Community Hub, 2-Dec-2022
Includes sections for:
Statistics for activity growth
Financials/fundraising
Governance changes (new moderators, code of conduct revisions, )
Future plans
mindly.social
Since April of this year I‘ve been running my own Mastodon server and 3 days ago we hit 100 users which was a huge milestone for my tiny little server… and then all of a sudden something happened, the other Mastodon servers started to get full and new users were looking for homes. Less than 72 hours after being excited for hitting 100 users we hit 10,000 users.

Running a Mastodon server – Part 1?
, KuJoe‘s blog, 29-Nov-2022
Includes sections for:
Statistics for activity growth
Process for managing growth (technical)
chaos.social
The past month has changed the Fediverse, and, by extension, our instance. We‘ve continued as normal (apart from limiting sign-ups) to give ourselves time to figure out which changes were only temporary, what seems to be changed for good, and how to react. A month seems ample time, and here we are with a set of changes in how chaos.social will work in the future.

Rule changes, closed sign-ups, and more
, chaos.social blog, 29-Nov-2022
Includes sections for:
Statistics for activity growth
Process for managing growth (new user moderation, instance rules)
I was going to write an article for a while now, but there was too much work to do with the latest influx. Together wi…

Issue 92: Privacy Stories From 2014 Still Echo Today

Issue 92: Privacy Stories From 2014 Still Echo Today

Back again.
Thanks for the comments on the return of the newsletter.
I’ve heard that Microsoft Outlook isn’t playing nice with my email theme.
(It also isn’t playing fair…someone forwarded the newsletter back to me, and when I replied that person said the view of the newsletter in the reply looked fine in that same Microsoft Outlook.)
Until I get that fixed, remember that you can read the newsletter online — just follow one of the bullet point links below to get to it.
This week we’re going to pull through some privacy threads to the current day.
Eight years ago this week, I published a whole
DLTJ Thursday Threads
issue on privacy.
This was the lead paragraph:
Are you paranoid yet? Are you worried that
the secret you shared anonymously might come right back to you
? Or wondering why
advertisements seem to follow you around from web page to web page
? Or just creeped out by
internet-enabled services tracking your every move
? Or angry that
mobile carriers made it very easy for anyone to track every page you visited from your smartphone
? Or maybe you will
simply give up any personal information for a delicious cookie
? (Are you paranoid now?)
The first was about how posts on apps like YikYak, Secret, Whisper, and Snapchat weren’t really anonymous.
The second was about the kinds of data that apps collect and aggregate about us.
The third was an opinion piece about how Uber was tracking your every move as part of its experiments, and also contained a nugget about how Facebook was updating its terms of service to say explicitly that the app will now track your location.
The fourth was how AT&T and Verizon got caught invisibly rewriting web pages passing through their network to include their own tracking tokens.
And the fifth was a person-on-the-street test to see how much personal information passers-by would give up for a cookie (a tasty treat, not the browser cookie kind).
So with all that attention on privacy in 2014, you’d figure we’d have it all solved by now, right?
Let’s see what some of the latest stories are.
Algorithmic Creulty
Ditching CAPTCHAs
and
Improving Privacy
When Privacy is a National Security Concern
A Privacy-in-the-Cloud Good News Story
Facebook’s Luck Running Out in the European Union
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to
DLTJ’s Thursday Threads
, visit the
sign-up page
.
If you would like a more raw and immediate version of these types of stories,
follow me on
Mastodon
where I post the bookmarks I save. Comments and tips, as always, are welcome.
Algorithmic Cruelty
When I became pregnant, my partner and I, like many expectant individuals, opted not to tell our friends until after the first trimester. But I had an additional goal: for my friends to learn of my pregnancy before advertisers did. I’m a health-privacy scholar, so I know that pregnant individuals are of…

LIBnft: a Project in Search of a Purpose

LIBnft: a Project in Search of a Purpose

At first, I thought this was a parody.
LibNFT is an R&D initiative exploring the impact of blockchain and the digital asset economy on library archives.

LIBnft homepage
, 12-Dec-2022
However, it seems like a serious proposal that was presented today at a
CNI project briefing
.
I did not attend the project briefing; the only details publicly available are from the
whitepaper
. (Note: link to the whitepaper can’t be robustified—Dropbox is hostile to web archiving—but I have saved a copy of version I reviewed…
version 0.04 dated 4-Dec-2022
.)
From the details in the whitepaper, it is safe to say this project should be shelved until the need and purpose are better understood.
Why?
First, blockchain is the wrong technology; gallery-library-archive-museum (GLAM) institutions do not need a technology where participants are adversarial or trying to steal each other’s data.
Second, there is no utility in non-fungible tokens for GLAM governance or assets; it would be better (and certainly cheaper) to hold a meeting or write a typical contract.
Note!
The recording of the LibNFT project briefing is now up on YouTube, and I’ve
posted a follow-up
with additional thoughts.
Why Use Blockchain
As the LIBnft whitepaper points out, «in its simplest form, a blockchain is a communally maintained distributed ledger, or database, that reliably and immutably stores digital information» (summarizing a
New York Times glossary
).
The «database» term is crucial—blockchain is a technique for storing and retrieving information, much like one would do with a run-of-the-mill database.
This database has some interesting characteristics: data can’t be erased once it is written and there are copies of the database spread over the network.
Rather than «distributed», though, a blockchain database is «decentralized».
A USENIX article makes an important distinction between «decentralized» (which blockchain is) and «distributed» (emphasis added):
A
distributed system
is composed of multiple, identified, and nameable entities. DNS is an example of such a distributed system, as there is a hierarchy of responsibilities and business relationships to create a specialized database with a corresponding cryptographic PKI. Similarly the web is a distributed system, where computation is not only spread amongst various servers but the duty of computation is shared between the browser and the server within a single web page.
A
decentralized system
, on the other hand, dispenses with the notion of identified entities.
Instead everyone can participate and the participants are assumed to be mutually antagonistic, or at least maximizing their profit.
Since decentralized systems depend on some form of voting, the potential for an attacker stuffing the ballot box is always at the forefront. After all, an attacker could just create a bunch of sock-puppets, called “sibyls”, and get all the votes they want.
In a distributed system sibyls are…

Issue 93: Chat-bots Powered by Artificial Intelligence

Issue 93: Chat-bots Powered by Artificial Intelligence

This week we jump into the world of chat-bots driven by new artificial intelligence language models.
The pace of announcements about general-purpose tools driven by large training sets of texts or images has quickened, and the barrier to experimenting with these tools has dropped.
There are now fully-functional websites where there once were only programmer-focused APIs.
We wonder what the effects will be on our students, our business workflows, and on society.
We also wonder about the underlying biases in the training data.
OpenAI Introduces ChatGPT
A High School Teacher Laments a Tool for Easy Essays
A Real-world Example
Can’t Paper Over Biased Training Data
The View from a Human Trainer
As an aside, in the first article below I mention that the use of these tools, while free for now, will be monetized at some point.
This is another unfortunate example of taking from the common good and commercializing it.
The training data used by the company came from crawling web pages, from Wikipedia, and from books (
source
).
Yet soon, it seems, all of the benefit from that information will be held by a corporate body.
The same thing has been said about the image-based AI tools that have slurped up sets of photos from sites like Flikr, Wikipedia, and even stock photo businesses.
We don’t talk enough about this private capture of the common good and the uncompensated taking of other’s work.
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to
DLTJ’s Thursday Threads
, visit the
sign-up page
.
If you would like a more raw and immediate version of these types of stories,
follow me on
Mastodon
where I post the bookmarks I save. Comments and tips, as always, are welcome.
OpenAI Introduces ChatGPT
We’ve trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests. ChatGPT is a sibling model to InstructGPT, which is trained to follow an instruction in a prompt and provide a detailed response.

ChatGPT: Optimizing Language Models for Dialogue
, OpenAI blog, 30-Nov-2022
This link is the announcement from the company that created ChatGPT, OpenAI.
The innovation with this model is the introduction of Reinforcement Learning from Human Feedback (RLHF).
With RLHF, «human AI trainers provided conversations in which they played both sides—the user and an AI assistant» — and the ChatGPT language model incorporated the refinements learned from those human interactions.
The blog post gives examples of how this human training affected the output.
In the language model without RLHF training, when asked how to bully someone the AI would return a list of ideas.
With the RLHF training, the response starts with «It is never okay to bully someone» and says t…

Backing Away from Twitter in Measured Steps

Backing Away from Twitter in Measured Steps

My relationship with Twitter crossed a new line yesterday. As I posted on Mastodon (
one
,
two
):
Have just deleted the Twitter app on mobile. Felt the need to ramp down stress this week, and the current owner’s meltdown is unnecessary drama. There are still a few people there that I like to read, but I’ll be doing that far less often now.
I have regret in deleting the app. I found value there, and felt that the trade-off of attention and advertising versus the benefits of personal connections and trading ideas was a net positive.
As has happened several times to me in my 53 years, I’m astonished at how fast a valued and valuable community can be destroyed.
The toxic mix of arrogance and ignorance and power is a sad combination.
The past eight weeks on Twitter have been emotionally tiring, and I wondered why.
On reflection, mourning seems like the most appropriate label for the emotion I’m feeling.
I had invested time and effort into cultivating a network of friends and acquaintances.
Now it is being destroyed; that network was a guest in someone else’s kingdom.
It feels like a reciprocal behavior back-and-forth: Musk makes a snap dictatorial decision, I step away a little farther.
The first move came on October 27 when
I stopped engaging with others on Twitter and logged on less often
—that corresponded with the announcement that Musk had closed the deal to buy Twitter.
At the same time, I picked up my activity on Mastodon (
@dltj@code4lib.social
)…following more people and engaging more in that community.
(My Mastodon account on code4lib.social had been idle since 2018 except for automated postings from my knowledge management tools.)
The second shift came on November 22 when
Twitter started rejecting links in posts
that came out of my knowledge management tools.
I have a series of scripts that I use to save references to web pages that I find notable, and the scripts also post those references to Twitter and Mastodon.
For unknown reasons, a Twitter post with a link to The Markup (or the Hypothesis proxy) started failing, so I turned off the automated posting to Twitter and wrote a sign-off message.
Now comes the third reciprocal reaction: Musk suspends and un-suspends journalists, then starts rejecting posts with links that are «free promotion of certain social media platforms…» (quote from
deleted TwitterSupport tweets
).
And I delete the app from my mobile device.
Deleting the app is my commitment to visit the site less often.
I regret it has come to that.
Once a community is destroyed, it can’t be brought back.
Not to the same cohesion it had before…it will be different, and there will be a longing nostalgia for what once was.
(Maybe that can be good? Probably not, given Twitter’s current trajectory.)
I’m already missing the Twitter notifications that I had set up: the local office of the National Weather Service, the messages from the town and the regional highway p…

Issue 112: Odds and Ends in Social Media Research

Issue 112: Odds and Ends in Social Media Research

Social media saturates nearly every facet of our lives, and understanding its effects on society has never been more critical.
This week’s
DLTJ Thursday Threads
delves into recent studies and discussions of why misinformation is spread on platforms and ways to counteract it.
As platforms continue to shape the way we communicate and process information, they also spark moral outrage and other intense emotions that can lead to the further spread of false content.
Researchers are exploring how these dynamics unfold, as well as the roles of opportunists who exploit these platforms for personal or political gain.
As we navigate these challenges, there are things that individuals can do and things that we could expect platforms to do to reduce the impact of misinformation.
While individuals can adopt practices to avoid contributing to misinformation, there is also a call for platforms to refine their moderation strategies, such as combining fact-checking with community-driven initiatives.
Amidst these discussions, the potential impact of social media on adolescent wellbeing remains a concern, with experts debating its true role in rising mental health issues among young adults.
Did you really read that article?
Moral Outrage
fuels the spread of misinformation online.
Maybe that outrageous article wasn’t pushed to you because of moral outrage. It could be opportunists
exploiting online conspiracy theories
for influence and profit.
We can clean up social media from the ground up:
strategies
to avoid becoming a ‘misinformation superspreader’ on social media.
For a more top-down approach, we could insist that platforms combine fact-checking and community notes for
better social media content moderation
.
On the other hand, research showed that the
community notes system fails to curb misinformation
on social media.
Exploring the
complex impact
of social media on teen mental health.
This Week I Learned
: most plastic in the ocean isn’t from littering, and recycling will not save us.
This week’s cat
Also on DLTJ this past week:
Another Saturday, another #TeslaTakedown
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to
DLTJ’s Thursday Threads
, visit the
sign-up page
.
If you would like a more raw and immediate version of these types of stories,
follow me on
Mastodon
where I post the bookmarks I save. Comments and tips, as always, are welcome.
Moral Outrage Fuels Spread of Misinformation Online
“The vast majority of misinformation studies assume people want to be accurate, but certain things distract them,” says William J. Brady, a researcher at Northwestern University. “Maybe it’s the social media environment. Maybe they’re not understanding the news, or the sources are confusing them. But what we found is that when content evokes outrage, people are consistently sharing it without even clicking into the article.” Bra…

In OCLC v Anna’s Archive, New/Novel Issues Sent to State Court

In OCLC v Anna’s Archive, New/Novel Issues Sent to State Court

The U.S. District Court for the Southern District of Ohio released an
opinion
in the case of
OCLC v. Anna’s Archive
.
As you may recall, the case stems from an accusation that Anna’s Archive—
a search engine for ‘shadow libraries’
—scraped the content of OCLC’s WorldCat.
Anna’s Archive itself is an anonymous effort, and OCLC named one person in the lawsuit—Maria Matienzo—with a weak and dubious connection to Anna’s Archive.
Here are bits of the court’s order from its introduction (the start of the order) and conclusion (at the end):
This case is about data scraping. Plaintiff Online Computer Library Center, Inc. («OCLC») is a non-profit organization that helps libraries organize and share resources. In collaboration with its member libraries, OCLC created and maintains WorldCat-the most comprehensive database of library collections worldwide. OCLC alleges that a «pirate library» named Anna’s Archive along with Maria Matienzo, and other unknown individuals (collectively, «Defendants») scraped WorldCat’s data. OCLC claims that, in doing so, Defendants violated Ohio law. Specifically, OCLC invokes causes of action arising under the Ohio common law of tort, contract, and property, as well as a provision of the Ohio criminal code.
But whether Ohio law prohibits the data scraping alleged here poses «novel and unsettled» issues. No Ohio court has ever applied its law as OCLC would have this Court do (as far as the Court is aware). Nor have courts uniformly applied analogous laws of other jurisdictions that way. So, to resolve this case, the Court would need to answer «novel and unsettled» questions about Ohio law.
When that is true-when a federal court faces «novel and unsettled» state-law issues-the federal court may certify those issues to the state’s high court. Unwilling to sleepwalk into a drastic expansion of Ohio law, this Court thus resolves to certify the issues presented here.
[…]
The Court is sympathetic to OCLC’s situation: a band of copyright scofflaws cloned WorldCat’s hard-earned data, gave it away for free, and then ignored OCLC when it sued them in this Court. But mindful that bad facts sometimes make bad law, the Court requests that an Ohio court intervene before this Court makes any new state tort, contract, property, or criminal law.
The Court resolves to CERTIFY the novel Ohio-law issues identified above to the Supreme Court of Ohio. Plaintiff’s counsel and Matienzo’s counsel are ORDERED to propose an order containing all the information Ohio Supreme Court Practice Rule 9. 02 requires by April 11, 2025. The parties may file their proposed orders separately, or, if they so choose, they may file one joint proposed order. The Court will finalize a certification order afterward.
OCLC’s motion for default judgment is DENIED without prejudice.
See Lammert v. Auto-Owners (Mut. ) Ins.,
286 F. Supp. 3d 919, 928-29 (M. D. Tenn. 2017) (adopting this same dispositio…

My protest signage improved at this week’s #TeslaTakedown

My protest signage improved at this week’s #TeslaTakedown

My protest sign for the #TeslaTakedown today.
I’m a long way from a career change to graphic design or protest communications, but this week was a definite improvement.
About a half dozen people asked for pictures of my sign.
That’s a good signal, so I’m including instructions below on printing and making one yourself.
It was another windy, gloomy day at the
Easton Tesla store
, but the number of people increased from
last Saturday
.
One organizer said between 450 and 500 people, which seemed about right to me.
It was just a little more than we had last week.
The weather forecast for next week is about 15 degrees warmer, so it will be interesting to see if the families with young children come out again like on
March 8th
.
The entertainment definitely improved.
Someone set up an amplified acoustic guitar and a microphone, and people took turns singing.
We marched two laps around the block, and there were many more honks and cheers from the cars driving by.
Promptly at 5:30, the organizers walked around and asked people to leave to be respectful of the Columbus police dialogue team that had been called out for what was advertised as a one hour protest.
That seemed reasonable.
This week’s protest sign
My protest sign at the #TeslaTakedown.
Back to basics, I thought.
People are driving by quickly, so too much text won’t be read.
So this was the idea.
Set the context: «Our GOVERNMENT was FINE.»
Deliver the punchline: «Now it is MUSKed UP!»
Clear call-to-action: «FIRE ELON!» (in a flaming font, nonetheless)
And that seemed to work.
Refer to the
March 8th blog post
for instructions on creating the sign.
If you want one too, I’ve uploaded the
6-page PDF of page tiles
to make the sign.
When you print them, line them up with three on top and three on the bottom.
Then, trim the bottom and right edges of each page.
For the two right-most pages, there will be a lot of extra, unused space to cut off, and there are crop marks you can use to trim at just the right spot.
There are a few millimeters of overlap between pages, so your trimming doesn’t have to be exact.
Then line up the pages and tape them to a poster board (or, as in my case, a recycled campaign yard sign.)
This is set up to make a 26″ by 16″ sign — the exact dimensions of a typical campaign yard sign!
Ping me on
Mastodon
or
Bluesky
if you use it, and include a picture if you’d like!
So now that I’ve shown improvement week-by-week, I need to figure out how to step up my game for next Saturday…

Issue 113: More on Copyright and Foundational AI Models

Issue 113: More on Copyright and Foundational AI Models

Two years ago this month, I wrote a
DLTJ Thursday Threads
article on the
copyright implications of foundational AI models
.
A lot has happened in those 24 months.
This issue mostly focuses on lawsuits, plus an announcement of a service offering image generation from licensed content.
These articles highlight the growing tension between content creators and technology companies as AI technologies increasingly rely on large datasets that include licensed and, in some instances, pirated content.
From late 2023, the
New York Times sues OpenAI and Microsoft
for alleged copyright infringement in AI training (with late-breaking update).
U.S. judge partially favors OpenAI while
permitting unfair competition claim
in authors’ copyright lawsuit in this ruling from early 2024.
Last month Thomson Reuters
wins landmark U.S. AI copyright case
, potentially establishing a legal precedent.
Microsoft
guarantees legal protection for Copilot users
from copyright lawsuits.
Meta’s training of its AI with pirated LibGen books
sparks legal and ethical debate
.
Nvidia denies copyright infringement
in the use of shadow libraries for AI training.
Getty Images launched an
AI image generator
using its licensed library in 2023.
This Week I Learned
: «But where is everybody?!?» — the origins of Fermi’s Paradox
This week’s cat
Also on DLTJ this past week:
In OCLC v Anna’s Archive, New/Novel Issues Sent to State Court
: The case of OCLC against Anna’s Archive, accused of “data scraping” from OCLC’s WorldCat, takes a turn as the U.S. District Court for the Southern District of Ohio decides to certify several “novel and unsettled” legal questions to the Supreme Court of Ohio.
My protest signage improved at this week’s #TeslaTakedown
: My improved sign said «Our GOVERNMENT was fine. Now it is MUSKed UP! FIRE ELON!» Read the post for instructions on printing your own copy of this protest sign.
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to
DLTJ’s Thursday Threads
, visit the
sign-up page
.
If you would like a more raw and immediate version of these types of stories,
follow me on
Mastodon
where I post the bookmarks I save. Comments and tips, as always, are welcome.
New York Times sues OpenAI and Microsoft for alleged copyright infringement in AI training
The New York Times sued OpenAI and Microsoft on Wednesday over the tech companies’ use of its copyrighted articles to train their artificial intelligence technology, joining a growing wave of opposition to the tech industry’s using creative work without paying for it or getting permission. OpenAI and Microsoft used “millions” of Times articles to help build their tech, which is now extremely lucrative and directly competes with the Times’s own services, the newspaper’s lawyers wrote in a complaint filed in federal court in Manhattan.

New York Times sues OpenAI, Microsoft for using ar…

Holy cow—did the people show up for today’s #TeslaTakedown!

Holy cow—did the people show up for today’s #TeslaTakedown!

I don’t know how many there were at the protest today in front of the
Easton Tesla store
, but for the first time we covered all four corners of the intersection.
I think there were at least 600 people…maybe more.
Some observations:
The weather was good—windy, but warm—and the families with young children did come out again. But there were just MORE people there overall.
This week I recognized more cars making a circuit around the block. More people honking with thumbs up, turning around, then coming back again. I don’t remember seeing that on past Saturdays.
There were more Tesla sedans driving by that I remember seeing in the past. Quite possibly, they were just making the circuit around the block, too.
I’m starting to recognize familiar faces at each protest.
There was no live music this time, but that was okay because there was definitely more noise from the sidewalks and more energy in the air.
The Proud Boys made noise about coming in counter-protest, but I didn’t see them.
One of the event marshals said they were there early, but the police effectively separated them. As the panorama picture shows, though, we had all four corners covered and we were raucous.
This week’s protest sign
This week’s #TeslaTakedown protest sign.
I went off-script this week with a sign about the political nonsense we have at the moment.
It is a play on the phrase «The call is coming from inside the house!» — a play on a
famous movie trope
where the police tell the person in a home that they have traced the antagonist’s call to that home.
In this case, the danger to democracy is coming from
inside the Whitehouse!
Or, at least, that is what I was aiming for.
This sign probably only get’s one week’s use; let’s hope by next week, one or more people on this sign are fired because of the released of what sure looks like classified information on a Signal group chat.
If you want to use this 26″ by 16″ sign for yourself, I’ve made it
available for download
.
Ping me on
Mastodon
or
Bluesky
if you use it, and include a picture if you’d like!
My protest sign at the #TeslaTakedown.

Issue 114: Digital Privacy

Issue 114: Digital Privacy

This week’s
DLTJ Thursday Threads
looks at digital privacy concerns from the commercial perspective.
I think next week’s article will be a summary of recent happenings with government surveillance activities.
Late last month, Amazon launched Alexa+, and with it a flurry of privacy concerns. Why? Because
Amazon now mandates cloud uploads
to process Echo voice commands.
Using the technologies already in buildings, employers can monitor employee activities,
raising privacy concerns
.
Last year the FTC released a report that, while surprising no one, exposed the
extensive data collection
by social media platforms.
Speaking of collecting personal data, all of it ends up in databases of various sorts, and
Fiverr freelancers use tools made for law enforcement and insurance companies
to sell access to anyone.
This Week I Learned
: We started capitalizing the pronoun «I» to distinguish it from similarly typset letters.
This week’s cat
Also on DLTJ since the last newsletter was published:
Holy cow—did the people show up for today’s #TeslaTakedown!
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to
DLTJ’s Thursday Threads
, visit the
sign-up page
.
If you would like a more raw and immediate version of these types of stories,
follow me on
Mastodon
where I post the bookmarks I save. Comments and tips, as always, are welcome.
With Alexa+ launch, Amazon mandates cloud uploads for Echo voice recordings
Amazon has disabled two key privacy features in its Alexa smart speakers, in a push to introduce artificial intelligence-powered “agentic capabilities” and turn a profit from the popular devices. Starting today (March 28), Alexa devices will send all audio recordings to the cloud for processing, and choosing not to save these recordings will disable personalisation features.

Everything you say to an Alexa speaker will be sent to Amazon
, The Conversation, 28-Mar-2025
Starting a few weeks ago, Amazon required Echo users to send all voice recordings to its cloud, eliminating a privacy feature that allowed for local processing.
This change coincides with the rollout of Alexa+, a subscription service that enhances the voice assistant’s capabilities, including recognizing individual users through a feature called Voice ID.
Users who previously opted out of sending recordings will find their devices’ Voice ID functionality disabled.
Amazon justifies this move by stating that the processing power of its cloud is necessary for the new generative AI features.
Privacy concerns anyone?
Especially given Amazon’s history of mismanaging voice recordings and allowing employees to listen to them for training purposes.
The company has previously faced penalties for storing children’s recordings indefinitely and has been involved in legal cases regarding the use of Alexa recordings in criminal trials.
Surprise, surprise: this shift would appear to prioritize t…

My Public Archive of Protest Signs

My Public Archive of Protest Signs

Formerly just
#TeslaTakedown protest
signs, this post is now more general — protest signs against the growing authoritarianism that Donald Trump is trying to normalize.
For all except the two at the bottom, I’ve included a link where you can download a PDF to print your own.
Please use these if you’d like; if you want to give me something in exchange, just tag me on
Mastodon
or
Bluesky
so I know how far these have spread.
Also, Marc Lee from
Free Protest Signs
reached out on Bluesky to let me know about his website of signs.
If you don’t like something below, maybe one of his will suit your mood!
Respect My Authoritah!
Respect My Authoritah!
protest sign, first used on 14-Jun-2025
Of the two signs that I’m bringing to the
#NoKingsInAmerica protest
, this is the snarky one.
Trump’s attitude and actions in the ICE raid protests in Los Angeles and elsewhere reminded me of Eric Cartman from South Park screaming, »
Respect My Authoritah!
»
Download
and print your own 26″ by 16″ version of this sign.
Three Branches
Three Branches
protest sign, first used on 14-Jun-2025
I’ve had this one in my protest design document for a while, and now seems like a very good time to make it real.
The U.S. Constitution lays out three branches of government, and right now we are seeing inaction (the legislature), abuse (the executive), and under siege (the judicial).
We need the legislature to step up, support the judiciary, and tell the executive to knock it off.
Download
and print your own 26″ by 16″ version of this sign.
Dictators Hold Parades
Dictators Hold Parades
protest sign, first used on 14-Jun-2025
My daughter was listening to
a story from
The Daily
from the New York Times and shouted out, «that’s my sign!»
The guest had been talking about how Trump’s parade in Washington, DC, is un-American and something that dictatorships do.
A little photo searching and editing later…this sign was born.
Download
and print your own 26″ by 16″ version of this sign.
All of the ABOVE!
All of the ABOVE!
protest sign, first used on 26-Apr-2025
The meanness, the illegality, the stupidity…it is all more than I thought possible and it is certainly not what deserve from our government.
And it is not just one of these attributes, but all of them coming from all of this administration’s elected, confirmed, and senior leaders.
Download
and print your own 26″ by 16″ version of this sign.
Elected Assholes
Elected Assholes
protest sign, not used by the author
This crap is well past getting out of hand, and I wanted a sign that reflected that.
The government—in my name as one of its citizens—is deporting people without due process?
It is bullying foreign leaders in the Oval Office?
It is recklessly dismantling medical research, food safety programs, and environmental controls?
This doesn’t represent my values, nor—I’d wager—the values of most of the country.
The focus group (my family members) weren’t a fan of the unnecessary c…

Issue 115: Public and Private Camera Networks

Issue 115: Public and Private Camera Networks

After
last week’s issue on digital privacy
, I thought I’d focus this week on government-sponsored or -enabled surveillance.
As I dug through my store of saved articles, though, I realized I had quite a number of a particular kind of surveillance: camera networks.
These are often municipal-sponsored systems of license plate readers, but there are also networks of private systems—and, of course, attempts to combine the output of all of these networks.
So that is the focus of this week’s
Thursday Threads
issue:
An investigation by a newspaper editor highlights privacy concerns and legal challenges in rural
Virginia’s use of license plate reading cameras
. (2025)
Debate over the privacy concerns and legal challenges of license plate readers is nothing new, as
this 2012 article shows
.
What happens when you put equipment not meant for the internet onto the internet? A security flaw in Motorola’s automated license-plate-recognition systems
exposes real-time vehicle data and video feeds online
. (2025)
A license plate reader in every tow truck? Privacy concerns of a
private surveillance network of 9 billion license plate scans
enable widespread vehicle tracking. (2019)
Similar to «the call is coming from inside the house», the surveillance is coming from inside your community. Privacy concerns emerge as
HOAs nationwide install Flock Safety’s license plate readers
, facilitating police surveillance. (2023)
How about we network all of these cameras together?
AI-powered surveillance system
spurs privacy concerns as adoption grows in U.S. (2023)
If we’ve got to have this tech, we might as well have some fun with it.
Artist’s Traffic Cam Photobooth
sparks controversy and cease-and-desist over creative use of NYC traffic cameras. (2024)
This Week I Learned
: The word «scapegoat» was coined in a 1530 translation of the bible.
This week’s cat
Also on DLTJ since the last newsletter was published:
My Public Archive of #TeslaTakedown Protest Signs
. Print one off and take it to
your
next protest.
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to
DLTJ’s Thursday Threads
, visit the
sign-up page
.
If you would like a more raw and immediate version of these types of stories,
follow me on
Mastodon
where I post the bookmarks I save. Comments and tips, as always, are welcome.
Privacy Concerns and Legal Challenges in Rural Virginia’s Use of License Plate Reading Cameras
The research for State of Surveillance showed that you can’t drive anywhere without going through a town, city or county that’s using public surveillance of some kind, mostly license plate reading cameras. I wondered how often I might be captured on camera just driving around to meet my reporters. Would the data over time display patterns that would make my behavior predictable to anyone looking at it? So I took a daylong drive across Cardinal Country and asked 15 law e…

Issue 116: Government Surveillance

Issue 116: Government Surveillance

After
DTLJ Thursday Threads
issues on
digital privacy
and
surveillance camera systems
, I’m focusing this week on the more general topic of government-sponsored or -enabled surveillance.
In an era defined by ubiquitous data collection and ever-advancing technologies, the line between public safety and individual privacy is growing alarmingly thin.
From President Trump’s executive order to dismantle inter-agency “data silos” and Elon Musk’s DOGE initiative weaving federal databases together, to Oracle co-founder Larry Ellison’s vision of AI-powered cameras and drones monitoring citizens, the U.S. surveillance apparatus is expanding at breakneck speed.
Meanwhile, programs like the Pentagon’s “Locomotive”—which turns innocuous dating-app location pings into real-time tracking tools—and the data broker–driven sharing of driving and personal records with law enforcement underscore how private and public interests have converged to create a modern panopticon.
So that is the focus of this week’s
Thursday Threads
issue:
Trump’s executive order dismantling government data silos and Musk-led DOGE initiative fuel
fears of a U.S. surveillance state
.
More details about how
DOGE is building an Immigrant Surveillance Database
with Social Security and IRS Data.
In cases where the government doesn’t already have the data,
spy agencies want to centralize commercial data purchases
in a new one-stop portal.
1984 is here and some people want it:
Oracle’s Larry Ellison
proposes Orwellian AI camera-and-drone surveillance network, stoking privacy fears.
LexisNexis parent Relx
lobbies against data broker restrictions
amid FISA Section 702 reauthorization clash.
Dating app location data powers
Pentagon’s “Locomotive” program
to track phones worldwide
Apple sues U.K. government
over a secret order for backdoor access to encrypted data on phones, and it removes the Advanced Data Protection from U.K. market rather than giving in.
This Week I Learned
: «Leeroy Jenkins!!!!» was staged
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to
DLTJ’s Thursday Threads
, visit the
sign-up page
.
If you would like a more raw and immediate version of these types of stories,
follow me on
Mastodon
where I post the bookmarks I save. Comments and tips, as always, are welcome.
Trump’s Executive Order and Musk-Led DOGE Initiative Fuel Fears of a U.S. Surveillance State
In March, President Trump issued an executive order aiming to eliminate the data silos that keep everything separate. Historically, much of the data collected by the government had been heavily compartmentalized and secured; even for those legally authorized to see sensitive data, requesting access for use by another government agency is typically a painful process that requires justifying what you need, why you need it, and proving that it is used for those purposes only. Not so under Trump. This is a per…

Issue 117: Local Government Surveillance

Issue 117: Local Government Surveillance

After previous
DTLJ Thursday Threads
issues on
digital privacy
,
surveillance camera systems
, and
federal government systems
, I’m focusing this week on what is happening at the local level—mainly in policing.
This closes this loop on surveillance by coming back around to local activity — although it takes an unexpected jump back to the national level with a story published last month.
Law enforcement surveillance has dramatically evolved, influenced by cutting-edge technology and controversial practices.
This thread of stories highlights the complexities and ethical challenges arising from deploying these advancements.
From the sophisticated smartphone tracking tools like Fog Reveal to the spidering data collection activities of Flock’s AI-powered license plate readers, these stories underscore the growing tensions between public safety objectives and personal privacy rights.
So that is the focus of this week’s
Thursday Threads
issue:
In 2022, we learned of a
local police surveillance called Fog Reveal
that pinpointed mobile phones and de-anonymized users.
Two years later,
they were still at it
—this time asking police to augment Fog Reveal’s data to include information about doctor visits. (2024)
NYPD has
multi-million dollar contracts with controversial surveillance firms
that scrape social media and post fake users to get surveillance engagement. (2023)
Advances in surveillance technology mean we’ve seen the unchecked growth of
Real-Time Crime Centers
across America. (2023)
Police and other public officials have special protections from data brokers, and
West Virginia officers sue Whitepages
over unlawful info disclosure. (2024)
Here’s the recent national twist on local law enforcement surveillance:
ICE’s covert use of Flock’s AI camera network
for immigration enforcement. (2025)
This Week I Learned
: Ammonium chloride may be the 6th basic taste
Before we start…it is important to call out what is happening in the United States.
The Trump administration is using modern-day authoritarian tactics to frighten citizens into accepting a new normal.
I am more angry at what my national leaders have done than I am frightened, and I hope you will express your outrage, too, at a
No Kings in America protest this weekend
.
These are drafts of the two signs I’ll be waving:
In light of Elon Musk stepping back from a public role in the administration, I’ll retitle my
#TeslaTakedown protest sign blog post
(although, in keeping with cool-URLs-don’t-change practice, it is at the same web link) and will be adding these two signs when they are finalized.
You are welcome to visit that post to download printable versions of these signs or any other ones that I’ve made.
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to
DLTJ’s Thursday Threads
, visit the
sign-up page
.
If you would like a more raw and immediate version of these t…