Drafting a Policy for Critical Use of AI

Daniel Frank and Jennifer K. Johnson

A Path to Policy

Dan:
My research focus and pedagogical grounding never was too far out of orbit from the conversations that the ChatGPT classroom question provoked. Both personally and professionally, I’ve always been interested in the ‘new thing.’ I was always a computer nerd and I grew up side-by-side with the internet: when I was a kid, my dad dialed me in to nascent internet platforms CompuServe and Prodigy, where I read and shared stories on early-stage internet bulletin boards. I learned to imagine, share, code, communicate, and collaboratively play on online MOOs and MUDs—these were essentially text-based, virtual, online worlds. In my undergraduate career, Myspace and AIM were all the rage. I really got to see the internet as we know it take shape right as I was starting off as a writing tutor and learning the basics of composition and teaching. I started building the connections: I saw the internet as a vibrant space for sharing, thinking, collaborating, learning, and writing.

That was fleshed out as I worked on my master’s thesis and read the work of Mimi Ito (2010), Troy Hicks (2010), Cindy Selfe (2007), and Kathleen Yancey (2009). Together they painted a picture of the novel ways kids were working, learning, and collaborating on the internet. The “net generation” was sharing, creating, and collaborating by making mods for their favorite videogames, or remixing youtube videos, or participating in fan-fiction storytelling communities, and in doing so they were learning, and they were doing so on values completely alien to the assumptions of traditional kill-and-drill pedagogies. Learning on the internet was bottom-up, communicative, collaborative, passion-driven, and multimodal, and in case after case, students who were ignored by the traditional educational apparatus found their voices, passions, and skills through the endless discourses made possible on the net. My guiding research question became: How do students work, learn, and create on the internet, what values inform that growth, and how can we tap into those values in our own classrooms? My eventual dissertation (2018) combined these pedagogical questions with Seymour Papert’s theory of Constructionism (1991), which argues that learning best happens in unscripted little ways as students engage in ‘passionate affinity spaces,’ environments of constructive play and experimentation on projects that capture their interests and passions. I detailed a pedagogy I called “Microworld Writing,” which was inspired from the time I spent writing online and with others in the MUDs and MOOs I grew up in. So, all that is to say, I was already keyed in to the intersections of learning, technology, and computer-mediated writing. The boom in AI writing was right in my wheelhouse.

When ChatGPT was released in the Winter of 2022, it ignited the discourse amongst teachers all over the world. What is this technology? What can it do? How will we teach our students if they can just have this “AI” do all of their work for them? Was this the death of the essay? Of homework? Of education altogether? This was a moment that needed response.

I had been keeping an eye on GPT-powered writing in the months leading up to the release of ChatGPT. From 2020 to 2022, Large Language Model (LLM) AI Writing Technology was generally limited to producing blogs and copy for marketing material. The technology was limited to a few startups such as Jasper.ai, Copy.ai, and Writesonic. People were also directly playing with GPT models in OpenAI’s playground, and there were communities involved in sharing prompts and settings to make the technology work. OpenAI’s GPT-3 model, released in 2020, marked a great advancement in the technology. The only thing that I saw holding it back was interface; one had to have a good deal of computer literacy to really coax anything valuable out of these models. I noted at the time that if a powerful interface came around that made thinking about and working with this technology much more transparent, this could really take off. I warned a colleague at that time that I foresaw that AI writing was coming, and as writing teachers, we were going to have a lot to deal with.

I was right on both counts. It turned out that all that was really needed for GPT-3 to culturally take off was an accessible interface. That interface turned out to be conversational, alongside a moderate advancement of the model from GPT 3 to GPT 3.5. The release of ChatGPT was quite simple: it allowed people to interface with the GPT-3.5 model through conversation. Indeed, once it became clear what could be produced by simply talking to the bot, the news of this technology spread like wildfire. Having been carefully following the discourse from day one, I knew I was in a position to help my colleagues get a sense of this technology, what it meant, and how to approach it.

Takes at that time ranged from amazed to terrified. My own experiences matched some of these early reactions. I remember the feeling of awe as I asked ChatGPT for a full syllabus for a class I wanted to run the next quarter. As paragraphs of writing flowed in, complete with schedules and even a reading list, a chill ran down my spine. However, after more careful reading, I could recognize the simple, over-structured prose that GPT writing tends to produce, and I definitely started to see the cracks and the seams when I tried to look up the reading list it seemingly miraculously offered. The authors were real and the right names for the field, but the books offered in this list didn’t actually exist. Here I had a long conversation with a colleague of mine, Dr. Nathan Riggs, who had been working with and thinking about AI within the humanities for years. “The issue here is semantics vs. syntax,” he told me. “This is doing the latter, but it can’t do the former. You should think of it as little more than souped-up Mad Libs.”

A reading from Ian Bogost (2022) at that time also helped me calibrate my approach. In his article, “ChatGPT is Dumber Than You Think,” Bogost put forward the idea that this technology may produce the image of intelligence, but it was just putting together language patterns without a deeper understanding of what it was writing. Bogost had found an array of ways that the writing produced by this technology left much to be desired—while fluent, he argued, the writing was formulaic, prone to errors, and perhaps worst of all, boring. The key, then, would be in asking our students for more than what the AI can currently give.

This advice turned out to be challenged as the technology grew in complexity, accuracy, and flexibility over the following months. But Bogost left his piece on a closing note that really helped shape my approach to and conceptualization of the tool: Bogost suggested viewing ChatGPT as an aesthetic instrument for manipulating, playing with, and thinking about language, not for answering questions. It can probe the “textual infinity” of digitized information:

GPT and other large language models are aesthetic instruments rather than epistemological ones. Imagine a weird, unholy synthesizer whose buttons sample textual information, style, and semantics. Such a thing is compelling not because it offers answers in the form of text, but because it makes it possible to play text—all the text, almost—like an instrument. (Bogost, 2022)

ChatGPT, he argued, wasn’t an “intelligence”—it was a language playspace, turning the stuff of our discourse into algorithms that can be swapped, experimented with, combined, and remixed. This idea formed the core of how I started to talk about this technology in my classroom.

I sent an email to my colleagues over our listserv. I wanted to give them a primer to help them understand what this tool was and what it wasn’t. In the ‘What can it do’ section, I spoke of excited tweets that showed the technology producing what seemed to be at the time marvels. This was important: this really was something powerful, and new, and exciting, and it couldn’t be simply dismissed. But even more importantly, I needed to share the ways that this technology was limited. It was there that we as teachers would need to carve out our educational approach. The primer initiated a passionate and continuing discussion across the department.

Jennifer:

Unlike Dan, I am not an early adopter of new technology; in fact, friends and relatives have on more than one occasion jokingly accused me of being a bit of a Luddite. When first the web and later social media came on the scene, I was initially deeply suspicious of these tools and how they might impact human activity and connection. While I did eventually dabble with Myspace and was pleased when it enabled me to reconnect with old school mates, I was hardly a steady user of the platform. It wasn’t until the rise of Facebook that I really became invested in the possibilities of social media and began to be entangled by it, albeit still as a relatively casual user.

Yet in early December of 2022 I happened upon Stephen Marche’s “The College Essay is Dead” published in The Atlantic and my curiosity about generative AI was piqued. I wondered if Marche could really be right and if my career teaching university writing, which I spent so long preparing for and which I deeply love, was indeed facing an existential threat. Was it really possible that these tools could eclipse writing instruction?

When Dan first started sharing his thoughts with our department about how LLMs such as ChatGPT could (would?) upend both our teaching and our word, I became captivated. It was apparent to me that both Marche and Dan were right about these tools’ potential, and I could clearly see that I would need to modify my teaching as quickly as possible to account for them. But Dan’s emails to our faculty and Ian Bogost’s “ChatGPT Is Dumber than You Think” (2022) assured me that, given the way in which the models function—by continuously selecting the next most likely word—they were limited in their capabilities.

I quickly concluded that the models were constructing texts without giving much attention to audience, purpose or context, which we in writing studies regularly teach our students are necessary considerations in the development of rhetorically effective prose. Despite some of the worries expressed in the popular press and by colleagues that these tools would enable students to bypass their work entirely, I noted to a colleague that it did not seem likely to me that they could produce texts that would pass muster in our UC classrooms. At the same time, I could see the tools’ potential for inviting students to grapple with rhetorical concerns. My sense of these tools’ pedagogical potential became the basis of my thinking about how I might incorporate generative AI in my teaching.

Still, my experience was opposite of Dan’s in that I did not anticipate the sudden entrance of LLMs into everyday life AT ALL. Prior to Dan's emails to our department and the sudden onslaught of articles in the popular press about the perils and promises of this new technology, I had no clue that LLMs were even a thing or that they had been in development. But once they became publicly available in November 2022, I quickly began reading everything that crossed my radar about them and set myself to considering how or to what extent I could use them in my classes. While I was admittedly alarmed by the suggestions that they would run my colleagues and me out of a job and that students would no longer need to learn to write, I mostly focused on the possibilities for learning and engagement that it seemed they would offer.

Also unlike Dan, I did not immediately test the tools for myself. In fact, for as much familiarity as my reading offered me about ChatGPT, it was several months before I actually tried it out. Once I did, I became even further intrigued, both by its abilities and its limitations. It became apparent to me that in addition to its inherent inability to create new ideas or textual constructions, users would have to carefully construct prompts in order for it to produce anything valuable. My initial instinct, therefore, was to try and show my students the ways in which the tools could not produce effective texts—at least not without a lot of time and effort put into both prompt engineering and revising whatever the tools would spit out.

While at this point I was not yet ready to invite my students to embrace ChatGPT in their work for my classes, I was starting to name the elephant in the (class)room, especially after reading Owen Kichizo Terry’s “I’m a Student. You Have No Idea How Much We Are Using ChatGPT” in the May 2023 Chronicle of Higher Education. I showed my classes samples of AI-generated text—including a relatively clunky course syllabus that I used ChatGPT to create—and invited them to consider to what extent these texts were achieving their rhetorical goals. In every class I teach, I introduce students to the notion of the rhetorical triangle and the relationship between purpose, audience, and writer, and in spring of 2023 I realized that conversations about LLMs in fact lent themselves to these discussions. Along these same lines, and in anticipation of my students’ and my own further engagement with these tools down the road, I also began suggesting that my students test them out for themselves.

When I broached the topic, some of my students readily admitted to using LLMs for their classes, but most indicated that they either had not used them at all or that they had only played around with them for fun. Interestingly, a majority of my first-generation college students told me that they had not even heard of ChatGPT or any of the other large language models, which seemed ironic, given the warning that Liang et al. (2023) and others have given us about this demographic being unfairly accused of using generative AI when they submit effective texts in their coursework.

All of this is to say that when Dan began reaching out to our department to share with us what these tools could and could not (yet) do, I was open and ready for his input, and I found myself wanting to learn and know more. Because I tend to avoid embracing cutting-edge technology, this position was both new to me and somewhat surprising. But I became intrigued by the ways in which this particular tech could spark rich conversations about textual construction / effectiveness, and in turn I became eager to explore it further, both within and outside of the classroom.

Dan:
The primer I created and shared was well received and my colleagues appreciated the ideas to help orient them through the contours of this technology. I knew even then that the core assumption of my approach—that we could find pedagogical exigency in the weak spots of GPT output—wasn’t future-proof. I had already seen ChatGPT advance in capability and complexity as it moved into its 3.5 model and was under no illusions that that advancement would for some reason stop there. Indeed, it didn’t; after a few months, OpenAI released its GPT-4 model, which boasted advancements across the board: more complex and fluent writing, better context awareness, fewer hallucinations. In addition, competitors and sibling models joined the race. Microsoft’s Bing Chat was revealed to use a version of the GPT-4 model but also combine internet searches in its input and output. Google announced its Bard model (later rebranded as Gemini). Anthropic’s Claude also joined the fray. Each bot brought an array of specialties and served to advance the realm of what is possible.

Even so, at this time of writing, I still don’t find an unmediated approach—just asking, for instance, for a paragraph or a whole essay without establishing a thorough and rhetorical context—to produce anything that’s worth reading, even with steady increases in the technology’s capabilities. The pedagogical exigency still exists. An interesting potential pedagogical side effect of the fact that basic, uninteresting, formulaic writing can perhaps now be automated is that we no longer have to settle for it in our classes. We can aim at more. Indeed, over the quarter following ChatGPT’s appearance, I found myself able to—needing to—explore higher level concerns in my classrooms. I spent more time talking about style, pacing, and tone. I spent more time discussing with my students what it means to write, to develop a voice, to hone ideas, to contribute to the conversations and the discourses around them. I was in a position to be pickier and more complex in my feedback, as well. For instance, when a student turned in a paper that I suspected used a heavy reliance on ChatGPT, I pointed out what felt to me a ‘clunkiness’ of adjectives and use of a somewhat overwrought tone. The student appreciated my frank and specific feedback, and his final draft was much better for it: the tone was much more developed. It felt complex, specific, unique, authentic.

I shared that anecdote with a colleague at the Computers and Writing conference last June, and in reply he mused, “Well, we still don’t know to what extent he still used ChatGPT in the final product, but it’s clear that something valuable happened there.”

Something valuable: we’re at a point now where we might have to go back to the drawing board on some of our fundamental pedagogical assumptions. We have to look back at the very question of what it means to learn. Where do we draw the lines between expecting our students to hold knowledge, by themselves, and knowing how to find, produce, evaluate, articulate, and/or work with knowledge, utilizing the ranges of tools available to them? What do students need to be able to know, and do, in what contexts? What is the range of skills required to be a critical, communicative, effective, and creative individual in the 21st century? Wu (2023) argues that the rise of generative AI like ChatGPT necessitates re-examining fundamental assumptions about teaching and learning. As Wu states, “it is important to acknowledge the ongoing transformations raised by ChatGPT, which is rapidly revolutionizing the process of learning and teaching. With its quiet yet profound impact, Generative AI is subtly influencing the trajectory of education’s future” (p. 6). Mogavi et al. (2023) concur that educators must reconsider learning goals and objectives in light of AI tools like ChatGPT, stating “As students increasingly rely on AI tools for support, educators must ensure that learning outcomes harmonize with this evolving landscape” (p. 46). Chan and Hu (2023) agree that the rise of generative AI necessitates rethinking policies, curricula and teaching approaches, arguing that “higher education institutions should consider rethinking their policy, curricula and teaching approaches to better prepare students for a future where GenAI technologies are prevalent” (p. 14).

These questions aren’t optional. It’s not enough to simply push the technology aside, because, one, the approach fails to consider the flexibility of the tool and the range of its outputs. While interaction with this tool could be minimal and a student *could* ask, for example, for an entire essay, it needs to be understood that that is just one (ineffective) way of interacting with the technology. If we are guided by Bogost’s closing idea about the tool as an algorithmic language playspace, we can realize that the tool can be used at nearly any level of rhetorical and authorial consideration, as through inputs and outputs, students can interrogate and develop writing across genres, tones, and voices, at the level of paragraph, sentence, or even word by word. Such an approach was only possible, however, if both teachers and students came to understand that the tools are not actually “artificial intelligences,” no matter how impressive initial experimentation with the tool might be. There is no deeper awareness beneath the output of the tool, no critical consideration or rhetorical intent: that has to come from the student. That can still be—still needs to be—exercised and developed by the student.

Secondly, detection doesn’t work. Such an approach fails to understand that “AI Writing” is not a single identifiable entity. While default, unmediated language output started to reveal itself to have certain recognizable characteristics, such as over-structured prose and a predilection for summarizing sentences and paragraphs, it needs to be understood that these tools could work with language as clay, molding it across a range of genres, tones, and purposes. Any framework that would be designed to catch the common patterns of AI Writing would also—or instead—catch the overuse of standard language structure/conventions. Thus there will inevitably be false positives as student writing is flagged as “cheating” or “plagiarism,” and research has revealed that not only is this inevitable, but that English language learners who tend to produce this highly structured prose are the most vulnerable to this approach (Liang et al.). Sadasivan et al. (2023) add that “as language models become more sophisticated and better at emulating human text, the performance of even the best-possible detector decreases” (p. 11). For advanced enough language models, “even the best detector can only perform marginally better than a random classifier” (p. 20).

Third, dismissal isn’t effective either. The release of ChatGPT marked the beginning of widespread adoption of LLM tools. Ethan Mollick delivered a screed in March 2023 that argued that under-cover use of the tool was already much more common than people might have thought. Mollick suggested that every case of obvious “AI Writing” evidenced only lazy or unedited interaction with the tool, and that much more carefully AI-constructed language was happening all around, across all fields, and appearing in every class. This technology was here, it was being used, and simply ignoring it wouldn’t make all that go away. In light of that, my argument became clear: we need to address this technology, let students know what the tool can do, what it can’t do, that there are a range of ways to interact with the tool that fall across a spectrum of both ethics and effectiveness, and that students need to be mindful and critical of when engaging with the technology. None of that can happen if the tool is lumped in with other forms of “cheating” and “plagiarism” and swept into the shadows.

If it is unavoidable that GPT technology becomes part of the writing process, it must then be part of the teaching process: we have to engage with it. Fortunately, discussing, working with, learning, and critically assessing the technology can by itself be excellent pedagogy and GPT technology can bring much to the classroom, both in terms of critical and rhetorical discussion of how it works, and the evaluation of what it produces (see Chan & Hu, 2023; Domenech, 2023; Meyer et al., 2023; Mogavi et al., 2023; Mollick & Mollick, 2022; Rahman & Watanobe, 2023; Rudolph et al., 2023; and Sharples, 2023).

With these points in mind, in April 2023 I delivered a presentation to my colleagues to help forward and discuss these important points and areas of consideration. The presentation points are scripted here.

Jennifer:

The presentation that Dan cites above resonated deeply with me, and I responded by not only rethinking my pedagogical approach but also by recognizing that our program needed an AI policy statement in order to provide clarity and promote best practices for students and faculty alike. One of my responsibilities in our department is training our new TAs to teach our first year writing course, and I was keenly aware that as new teachers of writing, the TAs in particular were going to need some guidance into how to deal with this new technology in their classes. I approached Dan and asked him if he would be open to working together on developing such a statement and suggested we convene an ad hoc committee to draft it. His instinct was that with our Department Chair’s blessing, we could move more quickly and nimbly by drafting it ourselves and then seeking feedback on the draft from the faculty in our department.

Dan began drafting, and together we shaped a first draft of a potential “AI Writing Policy,” which sought to give both teachers and students a range of important ideas to keep in mind when thinking through their approaches to the technology, either in their classes or in their work. We wanted to provide guidance, steer teachers clear of knee-jerk reactions to the technology, help teachers think through the possible advantages and risks, and give them language to share with their students. Mindful of the range of faculty responses, we wanted to arm our colleagues with the most essential concepts and then create space for them to carve their own paths in terms of their classroom policies.

Once we were satisfied with our rough draft, we called a meeting so that we as a program could discuss and consider it. Responses to the draft from faculty were in general highly engaged, constructive, and progressive. The hour went by quickly as we grappled with important questions of how to talk about this technology in the classroom, how to encourage students to think about it critically and as a way to help them develop their own voices and thinking, rather than to think of the tool as a crutch or a short-cut. In terms of the policy itself, it was suggested that this range of ideas for both students and teachers might better be codified through a series of major key points which built the pillars of the approach, which could then be unpacked through smaller pieces of advice for both teachers and students. We readily agreed and subsequently revised the document to incorporate this change and other key points that arose from the conversation.