“AI passed the US medical licensing exam.” “ChatGPT passes law school exams despite ‘moderate’ performance.” “Can ChatGPT get a Wharton MBA?”
Headlines like these have recently highlighted (and often exaggerated) the achievements of ChatGPT, an artificial intelligence tool capable of writing sophisticated text responses to human prompts. These achievements follow a long tradition of comparing the abilities of AI to human experts, such as Deep Blue’s chess victory over Gary Kasparov in 1997, IBM Watson’s “Jeopardy!” victory over Ken Jennings and Brad Rutter in 2011, and AlphaGo’s victory in the game Go over Lee Sedol in 2016.
The implied subtext of these recent headlines is more alarming: AI is coming for your job. It’s as smart as your doctor, your lawyer and that consultant you hire. This heralds an impending, widespread disruption in our lives.
But sensationalism aside, comparing AI to human performance tells us something ALMOST that useful? How can we effectively use an AI that passes the US medical licensing exam? Can it reliably and safely collect medical histories during patient intake? What about giving a second opinion on a diagnosis? These types of questions cannot be answered by doing the same as a person on the medical license exam.
The problem is most people have little AI literacy – an understanding of when and how to use AI tools effectively. What we need is a straightforward, general-purpose framework for evaluating the strengths and weaknesses of AI tools that everyone can use. Only then can the public make informed decisions about incorporating devices into our daily lives.
To meet this need, my research team turned to an old idea from education: Bloom’s Taxonomy. First published in 1956 and later revised in 2001, Bloom’s Taxonomy is a hierarchy that describes levels of thinking where higher levels represent more complex thinking. Its six levels are: 1) Remember — recall basic facts, 2) Understand — explain concepts, 3) Apply — use information in new situations, 4) Analyze — make connections between ideas, 5) Evaluate — criticize or justify a decision or opinion , and 6) Create — create original work.
These six levels are intuitive, even for non-experts, but specific enough to make meaningful assessments. In addition, Bloom’s Taxonomy is not tied to a particular technology – it can be used to identify many. We can use it to explore the strengths and limitations of ChatGPT or other AI tools that manipulate images, produce audio, or pilot drones.
My research team began to assess ChatGPT through the lens of Bloom’s Taxonomy by asking it to respond to changes in a prompt, each targeting a different level of cognition.
For example, we asked the AI: “Suppose the demand for COVID vaccines this winter is estimated at 1 million doses plus or minus 300,000 doses. How much do we need to stock to meet 95% of the demand?” – an Apply task. We then changed the question, asking it to “Discuss the pros and cons of ordering 1.8 million vaccines” – an Evaluation level task. Then we compared the quality of the two responses and repeated this exercise for all six levels of the taxonomy.
Preliminary results are instructive. ChatGPT is generally good at Recall, Comprehend and Apply tasks but struggles with more complex Analyze and Evaluate tasks. With the first prompt, ChatGPT responded well by APPLIED and Explains a formula to suggest a reasonable amount of vaccine (despite making a small arithmetic error in the process).
Second, however, ChatGPT is not convinced about having too much or too little vaccine. It did not make a quantitative assessment of these risks, did not account for the logistical challenges of cold storage for such large quantities and did not warn of the possibility that a vaccine-resistant variant might emerge.
We see similar behavior for different prompts at this level of taxonomy. Therefore, Bloom’s Taxonomy allows us to draw more nuanced assessments of AI technology than a raw human vs. AI comparison.
As for our doctors, lawyers, and consultants, Bloom’s Taxonomy also provides a more nuanced view of how AI will change — not replace — these professions. Although AI may succeed in Recall and Understand tasks, few people consult their doctor to inventory all possible symptoms of a disease or ask their lawyer to recite case law verbatim or hire a consultant to explain Porter’s Five Forces theory.
But we turn to experts for higher level cognitive tasks. We value our physician’s clinical judgment in evaluating the benefits and risks of a treatment plan, our attorney’s ability to synthesize precedent and advocate for us, and a consultant’s ability to -identify with an out-of-the-box solution that no one else has thought of. These skills are Analyzing, Evaluating and Performing tasks, a level of recognition where AI technology is currently lacking.
Using Bloom’s Taxonomy we can see that effective human-AI collaboration largely means delegating low-level cognitive tasks so that we can focus our energy on more complex, cognitive tasks. So, instead of wondering if an AI can compete with a human expert, we should be asking how well the capabilities of an AI can be used to help improve critical thinking, judgment and creativity in man.
Of course, Bloom’s Taxonomy has its own limitations. Many complex tasks involve multiple levels of taxonomy, frustrating attempts at categorization. And Bloom’s Taxonomy does not directly address issues of bias or racism, a major concern in large-scale AI applications. But while imperfect, Bloom’s Taxonomy remains useful. It’s simple enough to understand, general enough to be used in a wide range of AI tools, and structured enough to ensure we ask a consistent, comprehensive set of questions. of tools.
Just as the rise of social media and fake news requires us to develop better media literacy, tools like ChatGPT demand that we improve our AI literacy. Bloom’s Taxonomy offers a way to think about what AI can do – and what it can’t – as this type of technology becomes embedded in many areas of our lives.
Vishal Gupta is an associate professor of data sciences and operations at the USC Marshall School of Business and holds a courtesy appointment in the department of industrial and systems engineering.