Weapons of Math Destruction Part 1

Bryan Alexander is facilitating an online book club reading of Cathy O’Neil’s Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy.  I am about two weeks behind (typical), so I will focus on just a couple of his questions for Part 1.

A. “What would it take for an education algorithm to meet all of O’Neil’s criteria for not doing damage?”

The big problem with  almost all educational measurement is the use of clumsy at best proxies (O’Neil uses the term) for the learning  we want to measure.  Since the fundamental output metric is a test with all the possibilities for manipulation that suggests, when we then try to measure what input changes improve that output we are are at least two levels of abstraction removed. Until we can measure educational outcomes some way other than by means of a crude manipulable proxy, I’m not sure we can fix this.

B. “What are the best ways to address the problem of “false positives”, of exceptionally bad results, of anomalies?”

I think the best way to solve this problem is to place some limits (preferably not determined by an algorithm) on the kinds of decisions we allow algorithms to make without human input.  The potential harm of a bad book recommendation from Amazon is much lower.  That probably means  thoughtful review of every adverse algorithmic recommendation by at least one live human being. Of course, thus undermines the efficiency and scale that algorithms are designed to create.  An important step is to acknowledge that algorithms are not neutral even if they manage not to be arbitrary.  They encoded the assumptions and biases of their creators, and acknowledging those assumptions and biases is a key part of the design process.

The DC schools example draws attention to the importance of checking for flawed input data.  After all, the algorithm is only as accurate as the data you feed it.

Notes: O’Neil’s three criteria for a Weapon of Math Destruction are “opacity, scale, and damage.” She uses the initialism WMD.  I wish she had come up with something else, because of the namespace confusion with chemical, biological and nuclear weapons.

Opacity makes me think of Frank Pasquale’s The Black Box Society, which I haven’t read yet.  The synopses of Pasquale’s book make me wonder how his and O’Neil’s work intersect.




Tooting Alone

This month, the early adopters are all on mastodon. Mastodon is actually a server implementation for OStatus (which used to be StatusNet, which was originally on identi.ca) the TL;DR of Ostatus is “like twitter, but federated”. As of this morning there are almost 900 active instances. Since the software is open, different instance administrators can set their own policies and users can find an instance whose culture agrees with them.

Mastodon also has an option to operate a single user instance, and this is where it gets less clear. Mastodon is designed to show three different timelines, the users personal timeline, a public timeline for the local server, and a federated timeline. On a single user server the local timeline will show the “toots” (Yes, that’s what they call a status post) of the instance’s one user, and the federated timeline will look very similar to the single user’s personal timeline. In managing your own presence on the network, you simultaneously isolate yourself from it. It’s possible however that this won’t end up mattering very much. I can’t remember the last time I looked at the Twitter public timeline. If OStatus ends up working the same way, it won’t matter how many people are on your instance, because you will interact with the network through the people you follow, even if they are on many different instances. While the local timeline won’t show much , the federated timeline, which is sort of a second degree network (see https://cybre.space/users/nightpool/updates/13933 ) looks as if it may end up, on a single user server, as a very personalized feed.

This ties in to the IndieWeb movement with its idea of POSSE (Publish own site, syndicate elsewhere) and there are already connectors to publish from tools like withknown to mastodon. Withknown is great for publishing, but not very good for aggregating. There is always RSS, and in fact mastodon autogenerates atom feeds per user (site.tld/users/username.atom). This leaves you using one application to read and another one to reply. I really wish something like Mark Pesce’s Plexus (https://github.com/mpesce/Plexus) was still active. How hard would it be to build a personal dashboard that would bring together RSS reading / OStatus / blogs /etc.?

Social Media and Tool Creep

Last week, Mike Caulfield lamented that social media is poorly suited to enhancing human potential. If you think about it, this shouldn’t surprise one too much, since it wasn’t designed for that.  Facebook was, after all, first and foremost a social tool, a virtual version of the paper books new college students resorted to in ages past to figure out who that cute guy/girl in your English class was.

For the task for which they was originally designed, fostering social connections between people, Facebook, Twitter and other social platforms work well, but then something happened.  As social platforms moved to the center of our online lives we wanted them to be the hubs not just of our social interactions, but of our information gathering.  This dovetailed nicely with the platform creators quest to grab, quantify and monetize more and more of our attention, but, as Mike points out, was not necessarily good for us.

D’Arcy Norman quoted an old post that touched on the same issue.  In 2008  he wrote about what he recently dubbed real-time toll.

Every time I read an update by someone that I care about, I think about that person – if only for a second – and my sense of connection is strengthened.

But, I fear that the strengthened social connections are not worth the cost borne in superficial thinking.

This led me to a little experiment. I looked at my Facebook activity feed for the almost completed month. I’ve only interacted with about 75 entities, and two thirds of those are people in the county I live in.  This comes with the usual caveat that it includes outbound plus inbound tags but not inbound likes and reactions.

Maybe the key to managing D’Arcy’s real time toll is to only follow people you care about enough that whatever superficial thinking it causes is worth it.

I’m going to presuppose that social networking sites are not very good tools to expand human potential.  The ratio of signal to the noise of social interaction is just too low.  What would such a tool look like?  Is a good list of RSS feeds adequate, or is something like fedwiki, wikity, or a choral explanations platform necessary?  If you end up with something that isn’t extremely decentralized, how do you generate beneficial network effects while keeping the signal to noise ratio high enough to generate value?

Verifying Academic Credentials with Blockchain

This morning, a college classmate posted a link to this Campus Technology article on blockchain based transcripts.  It turns out the University of Nicosia  is already doing this. Campus Technology used the d- word , disrupt, to describe the potential of this approach.  On the plus side:

  • this would allow verification of credentials without contact with the issuing institution.  That would seem to save lots of time and trouble in registrars offices everywhere.
  • the permanence of the ledger would mean that it wouldn’t matter if a credential issuing entity ceased to exist
  • The lack of having to produce paper trails might make traditional institutions more willing to offer micro credentials

Pitfalls include:

  • Security – I’m a blockchain novice, but my understanding is that the ledger is quite secure because so many users are verifying it.  That said, even more important than actual security is the perception thereof.  It may take a long time before credential audiences (education institutions, employers, etc.) trust blockchain credentials.
  • Privacy – Blockchain records are permanent and public.  How do you  ensure that only authorized viewers can see the details of a credential (courses and grades)?  What if you don’t want to publicize your attendance at a particular institution?


Coding and Literacies

This morning my feed is full of discussion of the coding for all movement.  Anya Kamenetz asks how long “I’m not a coder” will be a socially acceptable thing to say.  Some, including The Atlantic’s Melinda Anderson are more skeptical.  I’m not sure this is the right question, any more than teaching everyone to repair cars is necessary.  A more important issue is fundamental understanding of how systems work.

Let’s go back to cars.  I don’t know enough to repair my own car. I do on a basic level understand how cars work.  Refined petroleum is ignited by a spark from a battery in a closed cylinder, the resulting explosion moves a piston while creating exhaust gases, the moving pistons turn a drive shaft.  This commonly shared understanding of how cars work and that even providing a modest 12V requires a fairly large battery means that it is commonly understood that creating an inexpensive zero emission vehicle is a hard problem.

Contrast this with the collective understanding of digital encryption, given FBI v Apple and the preceding discussions of encryption backdoors.  I have yet to find an expert on how encryption works who believes that a mechanism which would allow law enforcement to bypass encryption without allowing hostile governments and criminal actors to do the same is technically possible.  See this summary for one example.  However, those with less technical knowledge don’t seem to share this belief.

Perhaps the key is not being able to code per se, but having enough fundamental knowledge of how computers work in order to have a shared understanding of what, for a computer, is possible or impossible, easy or difficult. The broader question when designing education is, “Which systems are important enough that we need  a shared understanding of their fundamental principles in order for society to function well?”

Blogs are blogs and wikis are wikis and never the twain….

Mike Caulfield, whose latest project, Wikity, brings to WordPress some features of federated wiki, asks whether an architecture that would allow data to flow seamlessly between blogs and wikis is a desirable thing.  In a comment, Kartik Agaram suggests that tagging makes blogs behave in a more wiki-like way.

To unpack this, I found it helpful to think all the way back to physical libraries. The whole notion of card catalogs and call numbers is a system designed to make physical objects findable. No matter how many cards referred to an item, the call number (a primary key, as it were) pointed to one spot on a shelf.  There has been a tendency to think of tagging as being fundamentally different because the artifacts are digital, but as Mike points out, the web is still location based, even if the locations are virtual.  Tagging merely allows, to extend the card catalog analogy, there to be a theoretically infinite number of “subject” cards for any given entity or entities under any given subject.

Given that the blog is clearly one person’s writing and thought, it makes more sense for it to have a single canonical address.  Wiki is more reference like and seems to lend itself better to Mike’s notion of connected copies, since the question of authorship is less important.

Now on to Mike’s actual question.  How valuable is it to be able to seamlessly move data across this divide?  I think the answer depends on how important you think the attribution chain is.  If it’s not important at all, just cut and paste.  If it is important, is it equally important in both contexts?

For the blog, some sort of attribution clarifies what is the author’s own thought versus what came from somewhere else.  However, when that somewhere else is a wiki, you deal with a source that is designed not to be static.  All of the web does that, in fact, which is why we have accessed on fields in web citations and everyone should love the Internet Archive Wayback Machine.  The vary malleability of a wiki page may lessen its value as a source. Would a wiki to blog bridge, like a fedwiki fork, pull the entire history of a wiki document up to the point of citation?  It’s with connected copies that this sort of link makes more sense. Even if the copy you originally cited has disappeared, you might find another.

Going the other direction, one expects a blog post, with it’s time and date stamp, to be a fixed oeuvre, so it makes more sense as a source or reference for a wiki document. It’s usually static nature also makes this process easier.

Having thought “aloud” through the use cases, I’m not in desperate need of a bridge. If reference by content grows in importance, it might make more sense.

Musical Theater and Transculturation

Well, I finally did it. I broke down and started listening to the Hamilton cast album. Hamilton is, for those who don’t know, the (now grammy winning) hottest thing on Broadway, a biographical musical about the founding father, duel victim and face of the ten dollar bill, which stars show creator and MacArthur Fellow Lin-Manuel Miranda.

From the very opening line, it’s clear that this is not a period piece. Hip-hop and rap influences are immediately apparent, and I found that just a bit off putting at first hearing.  Then I thought about why I found it off putting.

Douglas Hofstadter uses the word transculturation to refer to the process of replacing cultural references when translating a text to a new language.  That’s sort of what’s happening here.  I’m sure that the founding fathers didn’t rap.  If you watched the performance of the show’s opening number on the recent Grammy telecast, there was an interesting juxtaposition.  Although the musical style is contemporary, the costumes are period, so you see the eighteenth century and hear something much more modern.

Of course Hamilton is far from the first show to do such a thing.  In West Side Story, Bernstein completely transculturated Romeo and Juliet in Verona to Tony and Maria in New York City.  Shows like Candide and A Little Night Music juxtapose modern music with older settings.  There’s (so far, I’m only a few songs in) one moment in Hamilton that evokes actual 18th century musical style, Samuel Seabury’s1 half of  “Farmer Refuted”.  I wonder if Miranda is contrasting Loyalists and Patriots by having the former sing music that sounds eighteenth century and European, while the latter sound hip, modern, and American.

This sort of approach has some benefits. A noticeable spike in Google search volume for the term “Alexander Hamilton” followed the Grammy performance. On the other hand, it doesn’t exactly encourage one to consider an event in its own historical and cultural context. What do we gain and lose when we retell an historical story through a modern cultural lens?


1Students of religious history will recognize the name. Seabury was later the first Anglican bishop in the United States.

What should be the age of digital majority?

Last week Jessica Reingold of the University of Mary Washington  suggested that educational institutions committed to personal student domains should make students’ development of such spaces a gradual process.  I asked how such a process might apply to younger netizens, since I have a couple at home and have been thinking about this.Jessica made an argument that surprised me.  I’d summarize it thusly, although by all means go read Jessica’s post and make up your own mind as to how shoddy a job of summarizing I have done.

  1. Adolescents say, write, and express many things they later regret (sometimes not all that much later in the grand scheme of things).
  2. Between the Internet Archive , Google and like entities, everything published to the public web is both public and permanent.
  3. Therefore, putting stuff on the public web before you are, say, 20, is a bad idea.

It struck me how Jessica’s suggestion puts “first blog post” in the same age range as “first vote” and “first (legal) alcoholic beverage” as the sort of event that goes along with no longer being a minor.  Tim Berners-Lee referred to the notion of an “age of digital majority” in a 1996 speech.  To the extent there is an age of internet majority, existing law sets it much lower.  Current US law allows a 13 year old to sign up for a website without written parental permission. My kids are about that age, which is why I’m thinking about this issue now. Even setting aside the many children younger than 13 who have all sorts of social media accounts, this raises some questions.

Jessica suggests that young people should wait to have a permanent web presence until after society has judged them to have enough judgement to drive a car, vote in elections, or join the armed forces.  Why set the bar so high?  She writes:

Adolescents are still developing and discovering who they are (I’m still discovering who I am and i’ve finished my undergraduate degree!) and I’d imagine it’d be difficult to develop a digital identity without some sort of foothold on what you’d want that to be and if you’d want it to mimic your in-person identity or not.

Even though I’m middle aged by most measures, I know that my identity is still developing.  How does one decide one’s identity is stable enough to document?

When the notion of personal digital identity is introduced can also have an effect on the digital habits a person develops.  Jessica asserts that for new college students, “Sticking them with the task of trying to create a digital identity that’s not in the form of preset social media norms is like asking them to have multiple existential crises.”  Perhaps the difficulty she mentions here happens because, by the time we discuss with a young person the notion of digital identity, they are well trained feeders of the Facebook, Twitter and Google data mines.  Is there an opportunity to start the process earlier so that a young person learns the habits of digital identity building , even if they are highly scaffolded, instead of those of preset social media norms?

Let me continue with a caveat that is big enough it probably should have been at the top of the post rather than buried 500 words down. The questions I’m asking and Jessica is, to her credit, trying to answer are not questions I had to deal with as an adolescent.  I was over Jessica’s suggested age threshold at the dawn of the public web, and was over 30 before I made my first blog post.

Even so I wonder if Jessica’s cautious approach may unintentionally limit the extent to which a young person internalizes digital identity building. Alan Kay is reported to have said , “Technology is everything invented after you were born.” Does waiting until someone is 19 or 20 make it more likely that digital identity tools will feel like technology per Kay’s definition?

Unlike Jessica, I’m not sure to go about all this.  The model she proposes has, I think, a good sense of progression from simpler to more complex tasks.  I do wonder how well tying the process to formal entities like majors and courses will hold up in the long run.  As I’ve worked on my digital footprint, I’ve consciously kept it disconnected from institutional affiliations.  As one moves through life, these come and go, and I worry that content tied closely to an affiliation long past will be effectively orphaned.

Looking back at this, it reads much more like a rebuttal than I intended it, but I’ll leave it out here anyway, hoping it will expand the conversation.


Some Thoughts on Web Annotation

Yesterday, Mike Caulfied pondered how one might replace blog comments with something more connective by replacing them with annotated links, I think he was being purposefully provocative since he titled it a “…..Proposal for Killing Comments….”  Neverthelss, I think there’s a lot to be said for the idea that publishing platforms should encourage less dialogue and more broad conversation.  This goes back to Mike’s thought about tools that help people “geek out” virtually.

His choice of the term “annotated links” was important because it made me think of another annotation project that I played with yesterday , hypothes.is, which counts Jon Udell of elmcity fame among its team members.  Collaborative annotation of the has for a long time been a feature just on the verge of changing everything, even before it was diigo’s killer feature.

At first glance hypothesis has much to recommend it.  There’s already a github repository, and the software is designed to run in a docker container, so running one’s own instance on a VPS should be straightforward, an important thing for anyone who lived through the life, death, and undeath of delicious.  Hypothesis is also working on Browsertrix, software that archives a web page when it’s annotated.  After all, your brilliant annotations aren’t much use if the page disappears. Federation is further down the roadmap, but a self hosted annotation server that would archive annotated documents and communicate with other servers that were doing the same thing looks to be not too far off.

This isn’t quite what Mike is talking about, but I wonder, based on the maxim “Don’t re-invent the wheel unless you have to,” how the two might fit together. Hypothesis,especially if you imagine a single user instance, seems very close to a recreation of Vannevar Bush’s memex, with electronic storage replacing microfiche.  It also shares characteristics with fedwiki, a project Mike has championed for a couple of years now, with the added benefit of maintaining copies of the source materials. I can imagine a research workflow where high level idea processing , outlining and drafting, happened in fedwiki with links to hypothesis annotated, browsertrix archived web pages (digital notecards) for documentation. Build federation into both ends and you could allow individuals and groups to create and document research publicly all the way through the process.


Education and the Future of the Internet

I just read Jennifer Granick’s Black Hat keynote.  I highly recommend it.  I want to focus on one thing that Granick unpacked.  In discussing the history of technology law, she mentioned numerous instances from the present and past where policymakers propose and make law and/or policy even when their understanding of the technology and its history is poor.

A particularly salient example is the renewed call for law enforcement encryption backdoors.  I  have yet to find any technical experts who even think such a thing will work, never mind be a good idea.  Unfortunately, many of the policymakers who will decide this issue are not known for their technical acumen.

Looking at the big picture, the question is something like this: “In a democratic society, how much do policymakers and the public at large need to know about technology in order for society to make informed decisions about the policies and laws that govern it?” Unless the answer is “almost nothing,” we aren’t doing what we need to do.

Consider computer survey courses for undergraduates.  They are almost always tightly focused on technology as tool.  How do you send an email or make a spreadsheet?  As I think about it, this makes quite a bit of sense.  One must understand at least what a piece of technology can do before one can have a very thoughtful discussion about what it ought to do and not do.  Unfortunately, that second discussion is rare.

For example, a discussion of remote device kill and wipe functionality points out their value if keeping sensitive data out of the hands of bad actors is more important than recovering a lost or stolen device. It does not, however, address the question of whether the device manufacturer, the wireless carrier the device uses, or the government should be capable (by design) and/or allowed (by law and policy) to trigger a kill or wipe without the device owner’s consent and/or knowledge.

A discussion of the history of the Internet leads Granick to the almost inevitable comparison of “hacker” culture, with its emphasis on tinkering and openness, and the expectations of safety and turnkey operation that more recent Internet arrivals expect.  As this year’s Beloit Mindset List points out, traditional college freshmen don’t remember a time before online shopping at Amazon. Accordingly, Internet culture for a large subset of the population is about viral videos and Facebook memes, not free software and innovation at the edge.

This conversation has been going on and off at least since Anil Dash’s The Web We Lost. (text) (video) Is the gentrification of the Internet inevitable — perhaps even desirable?  If not, how do we teach about historical online culture to encourage thoughtful discussions about what aspects of that culture are worth preserving and how we might preserve them?