Half-Baked Thoughts on ChatGPT and the College Essay

2 June 2023, 1119 EDT

The Chronicle of Higher Education recently ran a piece by Owen Kichizo Terry, an undergraduate at Columbia University, on how college students are successfully using ChatGPT to produce their essays.

The more effective, and increasingly popular, strategy is to have the AI walk you through the writing process step by step. You tell the algorithm what your topic is and ask for a central claim, then have it give you an outline to argue this claim. Depending on the topic, you might even be able to have it write each paragraph the outline calls for, one by one, then rewrite them yourself to make them flow better.

As an example, I told ChatGPT, “I have to write a 6-page close reading of the Iliad. Give me some options for very specific thesis statements.” (Just about every first-year student at my university has to write a paper resembling this one.) Here is one of its suggestions: “The gods in the Iliad are not just capricious beings who interfere in human affairs for their own amusement but also mirror the moral dilemmas and conflicts that the mortals face.” It also listed nine other ideas, any one of which I would have felt comfortable arguing. Already, a major chunk of the thinking had been done for me. As any former student knows, one of the main challenges of writing an essay is just thinking through the subject matter and coming up with a strong, debatable claim. With one snap of the fingers and almost zero brain activity, I suddenly had one.

My job was now reduced to defending this claim. But ChatGPT can help here too! I asked it to outline the paper for me, and it did so in detail, providing a five-paragraph structure and instructions on how to write each one. For instance, for “Body Paragraph 1: The Gods as Moral Arbiters,” the program wrote: “Introduce the concept of the gods as moral arbiters in the Iliad. Provide examples of how the gods act as judges of human behavior, punishing or rewarding individuals based on their actions. Analyze how the gods’ judgments reflect the moral codes and values of ancient Greek society. Use specific passages from the text to support your analysis.” All that was left now was for me to follow these instructions, and perhaps modify the structure a bit where I deemed the computer’s reasoning flawed or lackluster.

The kid, who just completed their first year at Williams, confirms that this approach is already widespread at their campus.

I spent a few hours yesterday replicating the process for two classes in my rotation: one the politics of science fiction, the other on global power politics. Here are my takeaways about the current “state of play.”

First, professors who teach courses centered on “classic” literary and political texts need to adapt yesterday. We don’t expect students to make original arguments about Jane Austen or Plato; we expect them to wrestle with “enduring” issues (it’s not even clear to me what an “original” argument about Plato would look like). ChatGPT has—as does any other internet-based LLM—access to a massive database of critical commentary on such venerable texts. These conditions make the method very effective.

Second, this is also true for films, television, popular novels, and genre fiction. I ran this experiment on a few of the books that cycle on and off my “science-fiction” syllabus—including The Fifth Head of CerberusThe DispossessedThe Forever War, and Dawn—and the outcomes were pretty similar to what you’d expect from “literary” classics or political philosophy.

Third, ChatGPT does significantly less well with prompts that require putting texts into dialogue with one another. Or at least those that aren’t fixtures of 101 classes.

For example, I asked ChatGPT to help me create an essay that reads The Forever War through Carl Schmitt’s The Concept of the Political. The results were… problematic. I could’ve used them to write a great essay on how actors in The Forever War construct the Taurans as a threat in order to advance their own political interests. Which sounds great. Except that’s not actually Schmitt’s argument about the friend/enemy distinction.

ChatGPT did relatively better on “compare and contrast” essays. I used the same procedure to try to create an essay that compares The Dispossessed to The Player of Games. This is not a common juxtaposition in science-fiction scholarship or science-fiction online writing, but it’s extremely easy to the two works in conversation with one another. ChatGPT generated topics and outlines that picked up on that conversation, but in a very superficial way. It gave me what I consider “high-school starter essays,” with themes like ‘both works show how an individual can make a difference’ or ‘both works use fictional settings to criticize aspects of the real world.’ 

Now, maybe my standards are too high, but this is the level of analysis that leaves me asking “and?” Indeed, the same is true of example used in the essay: it’s very Cliff’s Notes. Now, it’s entirely possible to get “deeper” analysis via ChatGPT. You can drill down on one of the sections it offers in a sample outline; you can ask it more specific prompts. That kind of thing.

At some point, though, this starts to become a lot of work. It also requires you to actually know something about the material. 

Which leads me to my fourth reaction: I welcome some of what ChatGPT does. It consistently provides solid “five-paragraph essay” outlines. I lose track of how many times during any given semester I tell students that “I need to know what your argument is by the time I finish your introduction” and “the topic of an essay is not its argument.” ChatGPT not only does that, but it also reminds students to do that. 

In some respects, ChatGPT is just doing what I do when students me with me about their essays: helping them take very crude ideas and mold them into arguments, suggesting relevant texts to rope in, and so forth. As things currently stand, I think I do a much better job on the conceptual level, but I suspect that a “conversation” with ChatGPT might be more effective at pushing them on matters of basic organization. 

Fifth, ChatGPT still has a long way to go when it comes to the social sciences—or, at least International Relations. For essays handling generic 101 prompts it did okay. I imagine students are already easily using it to get As on short essays about, say, the difference between “balance of power” and “balance of threat” or on the relative stability of unipolar, bipolar, and multipolar systems

Perhaps they’re doing so with a bit less effort than it would take to Google the same subjects and reformulate what they find in their own words? Maybe that means they’re learning less? I’m not so sure.

The “superficiality” problem became much more intense when I asked it to provide essays on recent developments in the theory and analysis of power politics. When I asked it for suggestions for references, at least half of them were either total hallucinations or pastiches of real ones. Only about a quarter were actually appropriate, and many of these were old. Asking for more recent citations was a bust. Sometimes it simply changed the years.

I began teaching in the late 1990s and started as a full-time faculty member at Georgetown in 2002. In the intervening years, it’s becoming more and more difficult to know what to do about “outside sources” for analytical essays. 

I want my students to find and use outside articles—which now means through Google Scholar, JSTOR, and other databases. But I don’t want them to bypass class readings for (what they seem to think are) “easier” sources, especially as many of them are now much more comfortable looking at a webpage than with reading a PDF. I would also be very happy if I never saw another citation to “journals” with names like ProQuest and JSTOR.

I find that those students who do (implicitly or explicitly) bypass the readings often hand in essays with oddball interpretations of the relevant theories, material, or empirics. This makes it difficult to tell if I’m looking at the result of a foolish decision (‘hey, this website talks about this exact issue, I’ll build my essay around that’) or an effort to recycle someone else’s paper. 

The upshot is that I don’t think it’s obvious that LLMs are going to generate worse educational outcomes than we’re already seeing.

Which leads me to the sixth issue, which is where do we go from here. Needless to say, “it’s complicated.” 

The overwhelming sentiment among my colleagues is that we’re seeing an implosion of student writing skills, and that this is a bad thing. But it’s hard to know how much that matters in a world in which LLM-based applications take over a lot of everyday writing. 

I strongly suspect that poor writing skills are still a big problem. It seems likely that analytic thinking is connected to clear analytic writing—and that the relationship between the two is often both bidirectional and iterative. But if we can harness LLMs to help students understand how to clearly express ideas, then maybe that’s a net good.

Much of the chatter that I hear leans toward abandoning—or at least deemphasizing—the use of take-home essays. It means, for the vast majority of students, doing their analytic writing in a bluebook under time pressure. It’s possible that makes strong writing skills even more important, as it deprives students of the ability to get feedback on drafts and help with revisions. I’m not sure it helps to teach those skills, and it will bear even less resemblance to any writing that they do after college or graduate school than a take-home paper does.

(If that’s the direction we head in, then I suppose more school districts will need to reintroduce (or at least increase their emphasis on) instruction in longhand writing. It also has significant implications for how schools handle student accommodations; it could lead students to more aggressively pursue them in the hope of evading rules on the use of ChatGPT, which could in turn reintroduce some of the Orwellian techniques used to police exams during the height of the pandemic).

For now, one of the biggest challenges to producing essays via ChatGPT remains the “citation problem.” But given various workarounds, professors who want to prevent the illicit use of ChatGPT probably already cannot pin their hopes on finding screwy references. They’ll need to base more of their grading not just on whether a student demonstrates the ability to make a decent argument about the prompt, but on whether they demonstrate a “deeper” understanding of the logic and content of the references that they use. Professor will probably also need to mandate, or at least issue strict directions about, what sources students can use.

(To be clear, that increases the amount of effort required to grade a paper. I’m acutely aware of this problem, as I already take forever to mark up assignments. I tend to provide a lot of feedback and… let’s just say that it’s not unheard of for me to send a paper back to a student many months after the end of the class.)

We also need to ask ourselves what, exactly, is the net reduction in student learning if they read both a (correct) ChatGPT explanation of an argument and the quotations that ChatGPT extracts to support it. None of this strike me as substantively all that different from skimming an article, which we routinely tell students to do. At some level, isn’t this just another route to learning the material?

AI enthusiasts claim that it won’t be long before LLM hallucinations—especially those involving references—become a thing of the past. If that’s true, then we are also going to have to reckon with the extent that the use of general-purpose LLMs creates feedback loops that favor some sources, theories, and studies over others. We are already struggling with how algorithms, including those generated through machine-learning, shape our information environment on social-media platforms and in search engines. Google scholars’ algorithm is already affecting the citations that show up in academic papers, although here at least academics mediate the process.

Regardless, how am I going to approach ChatGPT in the classroom? I am not exactly sure. I’ve rotated back into teaching one of our introductory lecture courses, which is bluebook-centered to begin with. The other class, though, is a writing-heavy seminar. 

In both my class I do intend to at least talk about the promises and pitfalls of ChatGPT, complete with some demonstrations of how it can go wrong. In my seminar, I’m leaning toward integrating it into the process and requiring that students hand in the transcripts from their sessions. 

What do you think?