The Black-Boxed Ideology of AWE
Antonio Hamilton and Finola McMahon
Methods
Description of course
This project began as a collection of midterms for a graduate seminar at University of Illinois Urbana-Champaign, titled English 584: Writing and Rhetoric in an era of Algorithmic Culture. The course examined the role of algorithms in contemporary culture, with a focus on writing and rhetoric. It asked students to critique procedures and instructions as they relate to writing processes and production. Students were asked to critique an algorithm of their choice and write two papers drawing on course texts and outside research. One of those papers was a midterm in which the three graduate students enrolled were asked to “unbox” three algorithms which were black-boxed in some way. As a result of the themes emerging during in-class discussions and the students' pedagogy related interests, all three students decided to focus on AWE algorithms. Following each students' individual process, described below, the three students, along with their professor, put the three midterm papers in conversation, drew conclusions across the data collected, and assembled the information to create this project. Additionally, two of the students selected VirtualWritingTutor as one of their objects of study. As a result, this project focuses on 8 AWE programs rather than 9.
The following data for this project was collected in the Fall of 2021. As such, these programs have likely changed and developed in the time since, especially with the launch of ChatGPT in the Fall of 2022. However, the concerns discussed, including the inconsistency of features and black-boxing of these algorithms continue to be relevant.
Narrative of data collection
First participant's narrative Antonio Hamilton:
Once I selected Grammarly, VirtualWritingTutor (VWT), and WriteToLearn (WTL) as my three AWE algorithms, I developed a set of basic criteria by which I was going to assess each automated evaluation software. Because each of these AWE are intended for different audiences, I decided on identifying the elements listed below to determine what functions each AWE provided and how it performed those functions:
- Identifying the writing dimensionality AWE assessed (i.e., plagiarism, grammar, word choice, etc.)
- Type of feedback provided (if feedback was provided)
- The data used by the software's algorithm
- Other information about the software's algorithm, if provided.
I then created charts which cataloged and categorized the features and information that was discovered from each AWE website. After investigating each AWE's website for the four elements above, I developed further questions to send to the developers of the software in order to gather more information about the criteria if it was not apparent on the website or if their information remained obfuscated. Other questions were individually developed in regard to what was inductively discovered on each AWE's website. In the case of VWT, I conducted an interview with the creator where I followed up with the questions that were asked in my initial email to them. Questions and categorization of features were also accomplished by personally using the programs when the option was available. WriteToLearn, however, can only be used if purchased by a school's administrator; therefore, I could only infer based on the information provided on the website.
Second participant's narrative Finola McMahon:
My process began by selecting Criterion, Outwrite, and ProWritingAid (PWA) as the three AWE software I would analyze, in the hopes that they would cover three different approaches to AWE. Criterion is strictly academic and accessible only through schools and universities. ProWritingAid is more publicly available, although it still primarily targets professional or academic contexts. Outwrite is not framed as targeting a specific audience. I hoped that these different approaches would create a picture of AWE software overall. With these audiences in mind, I identified the following elements:
- The features and functions of each AWE, including descriptions
- The assessment claims of each AWE
- The feedback provided by each AWE and, if so, the form of the feedback
- Explanations of the algorithm
To do so, I read through each program's website and made note of the companies' explanations of their intended audience and use, as well as what their algorithm does and how it accomplishes its goals. During this process, I paid particular attention to any static abstractions (Connors, 1997), or the way in which stylistic terms coalesce to become static (p. 265-6), the company used or referenced. I explored articles and documents published about the three programs. This mostly provided information about Criterion, as it has been researched previously and Criterion's parent company, Educational Testing Services (ETS), has published a great deal of explanatory documents about their program. Throughout this process, I tested the free versions of the programs whenever possible and made note of the user experience to further address the above questions, particularly in #3 above. Finally, once my co-researchers and I began to compile our data, I identified specific features or program details they had collected that I had not. Using that information, I returned to the companies' websites and program interfaces in search of the correlating information for my algorithms.
Third participant's narrative Alexis Castillo:
To begin my research of AWE algorithms, I chose three types of AWE software: Virtual Writing Tutor (VWT), Paper Rater, and MI Write. VWT and Paper Rater are the more student-friendly of the three programs whereas MI Write is marketed to administrators and teachers. Setting out to consider these programs from the perspective of a student and high school teacher, the following topics were the basis of my analysis:
- Accessibility of each program
- Scoring features offered by each AWE software
- Actionability and source of written feedback (when provided)
- Risks of using AWE programs in the classroom
- Alignment of teacher and AWE software learning objectives
After selecting what I would assess, I attempted to use each program, but because MI Write was paywalled, my engagement with VWT and Paper Rater was more extensive. I was able to immediately insert a writing sample for evaluation into VWT and Paper Rater. Once the sample was evaluated by each of the programs, and in the case of Paper Rater, scored, I considered the features the programs promised to evaluate alongside the feedback the programs actually generated. With the goal of understanding how feedback was produced and the software developer's intended uses for these programs, I reached out via email to the contacts listed on the websites. After making contact with the VWT developer and a MI Write representative, I gained insight as to what degree each evaluation tool could be personalized to match a teacher's learning goals and the objectives that the companies had in mind when creating the programs.1
2Alexis allowed us to use her data for this chapter but decided not to proceed as a co-author.