Podcast Course 4

The Higher Education Educator’s AI Sandpit: Testing, Tinkering, and Thinking Aloud in University Classrooms

Welcome to this podcast series – The Higher Education Educator’s AI Sandpit – where we create space to talk about the trials, tribulations and triumphs of Generative AI in teaching, learning and assessment.

Looking back over the conversations I have had with the guests in this series, one theme rings out: none of us are so-called expert “AI pedagogues.” And yet, as the series unfolds, something else becomes equally clear. Whilst we may not be experts in AI pedagogy, we do hold a great deal of expertise that equips us with the skills and knowledge we need to inquire and learn together.

Each of us, for example, has a practical pedagogical knowledge built through trial, error, reflection, and often, pedagogical enquiry. We each have institutional knowledge – of policy and localised practice, as well as of our students, which helps us to understand the supports and parameters of learning, teaching and assessment within our departments and within our institution. We also have extensive disciplinary knowledge, and with that, an extensive understanding of the tacit norms that shape assessment and supervision, as well as the practical constraints of time, curriculum, accreditation, and professional standards. And finally, as inquiry is our professional habitus as academics, we each of us, have the skills needed to pose questions, test assumptions, generate evidence, and refine practice. As the episodes reveal, each guest draws on these strengths in their exploratory endeavours regarding the pedagogical impacts and implications of Generative AI.

With this in mind, we present in these episodes instances of experimentation with, and critical evaluation of AI, rather than case studies to be held up as exemplars of “good practice.” In this spirit, we ask you, the listener, to take an observer’s stance and to treat the conversations within each episode not as fixed case studies of “good practice,” but as situated narratives to be observed, interpreted, and interrogated. By hearing where things have worked and where they have not, our hope is that you will turn back to your own setting and apply the same observational lens as a tool for interpreting and interrogating your own approach to AI.

In short, none of us are experts, but rather than this being a weakness, we see it as an opportunity. By adopting a beginner’s mindset, we give ourselves permission to learn through doing, to proceed through trial and error, to make mistakes, and, in doing so, to be creative. This series is an invitation to think with us, to test ideas within your own context, and to shape practices that are not only informed by evidence, but guided by educational purpose.

To listen, click the podcast episode titles below:

Teaching Masters Students to use AI for Coding in Economics with Luc Bridet

What happens when a postgraduate economics module moves from pen-and-paper proofs to Python and invites students to use AI as a coding partner? In this episode, economist Luc Bridet explains how he redesigned a master’s-level optional module to reflect an increasingly common workflow in parts of the discipline – AI drafts; humans verify, test, and refine. We discuss the pedagogy and the practical implications for sustainability, transferability, and workload. If you’re deciding when to allow AI and how to reshape assessment as its use grows, this conversation offers a candid account of the adaptations and their trade-offs. Join us to hear more!

Listen to the Episode

Reflecting on your own Practice

AI as first drafter, student as quality assurer:

If you were so minded, in one of your assignments, where might you permit students to use AI to produce an initial draft/outline/solution, while requiring them to verify, test, and refine it?

Ethical Adoption of AI:

If you introduced a task that allowed students to use AI for an initial draft/outline/solution, would it be important to design it so students can opt out of using AI? Why or why not?

Fairness across different starting routes:

If students may begin with or without AI assistance, what common criteria (clarity of reasoning, quality of adaptation, robustness of tests) will let you judge both routes equitably?

The Artificial Intelligence Assessment Scale with Kirsty Duff

How can we guide students to use generative AI responsibly without banning it or letting it “write the assignment”? In this episode, Kirsty Duff (Director of Foundation Studies and Academic Misconduct Officer, University of St Andrews) introduces the Artificial Intelligence Assessment Scale (AIAS) developed by Mike Perkins and colleagues – a five-level framework from no AI to full AI. Kirsty shows how she embeds the AIAS in handbooks and induction activities so that staff and students know what’s acceptable, why, and how to evidence working with AI. You’ll hear concrete, classroom-ready ideas as well as discussion of workload, policy fit, disciplinary differences, and how to move beyond a “deficit” view of AI while keeping integrity central. If you want practical guidance you can adopt tomorrow, this one’s for you. Join us to hear more!

Listen to the Episode

Resources Discussed in the Episode

Perkins, M., Roe, J., & Furze, L. (2025). Reimagining the Artificial Intelligence Assessment Scale (AIAS): A refined framework for educational assessment. Journal of University Teaching and Learning Practice, 22(7). https://leonfurze.com/wp-content/uploads/2025/09/JUTLPFinalPerkins_JUTLP_2025.pdf

Perkins, M., Furze, L., Roe, J., & MacVaugh, J. (2024). The Artificial Intelligence Assessment Scale (AIAS): A framework for ethical integration of Generative AI in Educational Assessment. Journal of University Teaching and Learning Practice, 21(6), 49–66. https://search.informit.org/doi/10.3316/informit.T2024092900003300954126858

Perkins, M., Roe, J., & Furze, L. (2025). How (not) to use the AI Assessment Scale. Journal of Applied Learning and Teaching, 8(2). https://doi.org/10.37074/jalt.2025.8.2.15

Reflecting on your own Practice

Policy clarity:

How will you signal, in handbooks and induction, exactly what AI use is acceptable in each assessment – e.g., mapping tasks to a level on the AI Assessment Scale (from “no AI” to “full AI”)?

Process over product:

Where could you design activities that foreground how students work (brainstorming, structuring, editing) rather than the final output, to reduce misconduct and build judgement?

Open-book nuance:

If students may consult notes in open-book exams, how will you address the risk that those notes are AI-generated and disconnected from taught material (e.g., require citation to lecture/seminar sources)?

Equity and choice:

Given ethical, environmental, and access concerns, will you allow an opt-out pathway for students who prefer not to use AI, without disadvantage? How would you phrase that option?

Discipline fit:

What adaptations would your discipline need (e.g., from essay-focused activities to code, lab, or design tasks) to keep the same principles but change the artefacts?

Colleagues in Dialogue – Re-Designing Assessment in the Age of GenAI with Jenny Taylorson and Blair Matthews

What should assessment do when AI can already draft a passable answer, and when some students won’t use AI on principle while others will? In this episode, two colleagues think aloud about redesigning a master’s-level research-methods assessment for Teaching English to Speakers of Other Languages, Digital Education, and International Education students. We weigh trust and equity (opt-out pathways, transparency), purpose (what should the purpose of this assignment be in the age of Generative AI), and practical constraints (large cohorts, time zones, workload). Rather than offering a finished fix, we discuss possibilities: shifting from production to critique tasks, tightening context-specificity to reduce “AI-ability,” changing criteria, and communicating clear expectations (e.g., traffic-light/scale approaches to permitted use). If you want an honest conversation about changing assessment in the age of AI – trade-offs, dead ends, and workable next steps, this episode will help you frame your own. Join us to hear more!

Listen to the Episode through Panopto

Resources Discussed in the Episode

De Vita, K., & Brown, G. (n.d.). AI risk measure scale (ARMS): Guidance and resources [PDF]. University of Greenwich. https://www.gre.ac.uk/__data/assets/pdf_file/0022/323590/ai-risk-measure-scale-guidance-and-resources-website-version.pdf

The Open University Learning Design Team. (2024, December). Responsible by design (RBD) [PDF]. The Open University. https://www.open.ac.uk/blogs/learning-design/wp-content/uploads/2024/12/RBD-Version-for-blog.pdf

Perkins, M., Roe, J., & Furze, L. (2025). How (not) to use the AI Assessment Scale. Journal of Applied Learning and Teaching, 8(2). https://doi.org/10.37074/jalt.2025.8.2.15

Reflecting on your own Practice

AI risks (beyond a basic search):

Where might AI’s capacity to collate information across sources pose a risk to learning? How would you make those risks explicit to students?

Trust:

In the age of generative AI, how can we preserve trust between students and lecturers and between students?

Purpose of assessment:

What, precisely, is your assessment for now? What kinds of judgement, integrity, and disciplinary thinking should it elicit in an AI-saturated context?

Workable formats of assessment at scale:

Given cohort size, time zones, and workload, what is your lightest-touch mechanism to evidence understanding (e.g., short recorded rationale, annotated plan) when vivas or in-person exams are not viable?

Raising the bar (criteria):

If AI can already achieve a pass on a current assessment brief, which elements of your criteria could you strengthen to reward human judgement? Or should we even be considering tinkering with criteria to solve this problem?

Anthropological Encounters with AI with Paloma Gay Blasco

How do anthropologists actually work with AI? In this episode, Paloma Gay Blasco – Director of Teaching, Social Anthropology, St Andrews – shares a hands-on, student-partnered exploration of generative AI: posing concrete anthropological scenarios to ChatGPT, comparing refusals and outputs, and documenting where bias and contradiction appear. In this episode we talk about drawing on the tools of our disciplines to explore and interrogate AI, and discuss how adopting a beginner’s mindsets can be useful to this task. We also talk co-design with students, and the workload/sustainability trade-offs of fast-moving tools. Join us to hear more!

Listen to the Episode through Panopto

Follow up: Resources Discussed in the Podcast Episode

Biesta, G. J. (2010). Why ‘what works’ still won’t work: From evidence-based education to value-based education. Studies in philosophy and education, 29(5), 491-503. https://doi.org/10.1007/s11217-010-9191-x

Reflecting on your Own Practice

What does ethical use of AI in learning, teaching and assessment look like to me in my context?

What might inquiry into AI look like in my discipline or context? Would I consider partnering with students, as Paloma does?

Student Perspectives on AI in Higher Education – with Teodor Zidaru, Amelia Nassau, Camila Gomez, and Valerie O’Neill

How can students contribute to reshaping teaching, learning, and assessment in the post-AI era? And what perspectives on AI in higher education emerge when classroom discussions spill over into the podcast recording studio? In this episode, Dr Teodor Zidaru and anthropology students Camila Gomez, Valerie O’Neill and Amelia Nassau develop conversations they first begun as part of a 3^rd year anthropology Honours module titled ‘Sorcery and Conspiracy: The Anthropology of Alternate Realities’. Together, they explore themes such as the psychodynamics of student AI use; student perspectives on AI use among staff; whether current AI policies in higher education are fit for purpose; the merits of involving students in AI policy design; and the structural contradictions and implications associated with integrating genAI technologies in higher education.

Listen to the episode through Panopto

Catalysing Change – Rethinking Chemistry Assessment with John Mitchell

What happens when AI reshapes a discipline before pedagogy catches up, and how should assessment respond? In this episode, John Mitchell (School of Chemistry; Academic Misconduct Officer) and I consider AI’s purported “cognitive” capabilities, then turn to assessment. John shares findings from pedagogical inquiry benchmarking AI answers to exam questions against undergraduate responses, and explains why he now asks students to critique AI-generated answers to exam questions, rather than getting them to write their own from scratch. Expect candid lessons on what worked for him and what didn’t, plus practical ideas for building adaptability into assessment in changing times. If you’re weighing assessment redesign in the age of AI and want honest trade-offs rather than hype, this conversation is for you. Join us to hear more!

Listen to the episode through Panopto

Follow up: Resources Discussed in the Podcast Episode

University of Kent. (n.d.). Digitally enhanced education webinars [YouTube channel]. YouTube. https://www.youtube.com/@digitallyenhancededucation554

Krathwohl, D. R. (2002). A Revision of Bloom’s Taxonomy: An Overview. Theory Into Practice, 41(4), 212–218. https://doi.org/10.1207/s15430421tip4104_2

Reflecting on your Own Practice

AI’s impact on your discipline:

In your field, has AI already changed research or professional practice before teaching caught up, and if so, which parts of your curriculum or assessment need to move first in response (if any)?

Benchmarking reality check:

Could you run a small, ethical benchmarking exercise (AI answers vs. typical student answers or AI answers vs. the criteria) on one existing task to reveal strengths/weaknesses. If you did this, how might you use the findings to brief students on pitfalls and good practice?

Marking that rewards insight:

John found an overly prescriptive marking scheme made it difficult to differentiate between better and less good responses to his assessment questions. How might you design rubrics that recognise depth, nuance, and warranted judgement rather than tallying obvious points? How might you pilot any changes you plan to make?

Thinking levels as a lens:

Would applying a cognitive framework (e.g., lower- vs higher-order demands) help you specify the kind of thinking you want students – not AI – to do on a given task? How might these cognitive skills align to external frameworks that guide our course outcomes such as the Scottish Credit and Qualification Framework?