This document provides a detailed summary of the collaborative breakout sessions conducted during the ScholarRx InsideRx webinar on "AI Prompt Engineering in Health Professions Education." Each group was tasked with designing a high-impact generative AI prompt tailored to a real-world educational scenario. These prompts and the iterative strategies used in their development are intended as reference models for educators who are just beginning to explore the potential of AI in their teaching practice.
Key Takeaways
-
Prompt clarity drives quality: The most effective outputs came from prompts that clearly defined the role, task, audience, and format. Simple tweaks—like specifying the learner level—had a big impact on AI performance.
-
AI can support educational equity: Prompts that included diverse patient cases or emphasized empathetic communication showcased how AI can be directed to center inclusion and professional values in education.
-
Cross-model comparison is powerful: Using multiple AI tools (e.g., Claude, ChatGPT, Gemini) revealed differences in output quality, structure, and reasoning—reinforcing the value of testing prompts across platforms.
-
Iterative refinement improves results: The best outputs came from treating the AI as a partner in an evolving dialogue—not a one-and-done tool. Several groups improved their prompts in real time and saw immediate gains in relevance and structure.
-
Prompt engineering is a teachable skill: With just a few guiding frameworks (like RTF, 3Cs, or TRACI), even participants new to AI were able to generate meaningful educational materials.
-
Responsible AI use requires guardrails: Discussions around hallucinations, citation accuracy, data privacy, and copyright reinforced the importance of thoughtful, ethical AI integration—especially in clinical education.
- Start simple, then refine: Iteration yielded better alignment and clarity.
Group 1: Faculty Promotion Guide
Prompt Development Notes:
The group took a straightforward approach, using a basic role-task structure without extensive iteration. They opted for a general prompt to test the usefulness of free-tier models in generating practical administrative content. No major refinements were made during the session, making this a helpful example for beginners starting with default prompt behavior.
Final Prompt:
"You are a medical school administrator. Please create a faculty promotion guide. Include criteria, documentation requirements, and timeline recommendations."
Key Takeaways:
- Even a basic prompt can yield helpful output for logistical academic tasks.
- This is a good starting example for new users looking to test generative AI with lower-stakes content creation.
- Could be improved with RTF structure to specify the role (e.g., Dean of Faculty Affairs), audience (early-career faculty), and format (table or checklist).
Group 2: Heart Failure Quiz (AI-Generated Prompt)
Prompt Development Notes:
The group used a meta-strategy by first asking each AI model to generate a strong prompt for the task, rather than jumping straight into content creation. They compared results across three tools—ChatGPT-3.5, GPT-4 Mini, and GPT-4 Omni—and selected the prompt from GPT-4 Omni due to its attention to alignment with learning objectives and USMLE cognitive levels. This approach highlighted how AI can support educators in crafting their own structured prompts.
Final Prompt (generated by GPT-4 Omni):
"You are a board-certified cardiologist and experienced NBME/USMLE item-writer. Your task is to create a 5-item multiple-choice quiz (single-best-answer format) for a second-year medical-school module on Heart Failure Pathophysiology. For each question: list 4–6 high-yield concepts, match to a learning objective and USMLE cognitive level, draft a clinical vignette (≤120 words), write a focused lead-in, and provide five answer choices (A–E) with rationales. Flag the best answer with an asterisk (*). Review for negative stems, cueing, or implausible distractors."
Key Takeaways:
- Demonstrates the effectiveness of AI-assisted prompt design.
- Comparing model outputs helps identify best-fit tools for a specific task.
- GPT-4 Omni provided stronger educational framing and output quality.
Group 3: Pulmonary Course Syllabus
Prompt Development Notes:
This group clearly framed their task using RTF principles: assigning the AI the role of a course director, defining the task (syllabus creation), and specifying the format (3.5-week module). They iterated slightly by including the request for a suggested outline, which allowed them to review and refine the output structure. The prompt worked well on the first pass, showing how effective prompt scaffolding can yield high-quality results without complexity.
Final Prompt:
"You are a course director for first-year medical students. Please develop a syllabus for a pulmonary course including content on the three Ps – pharmacology, physiology and pathology. Narrow focus to coverage for three and a half weeks of material. Include course objectives, event objectives, scoring rubric and timeline. Please provide a suggested outline first."
Key Takeaways:
- Useful for creating curricular blueprints.
- ChatGPT-4o produced structured and modular content suitable for adaptation.
- Great entry point for educators designing new courses or modules.
Group 4: OSCE Rubric (Miscarriage Counseling)
Prompt Development Notes:
The group began with a clear and realistic scenario and received a well-structured rubric from Gemini. They then decided to refine the prompt mid-session by adjusting the learner level from resident to third-year medical student and specifying a clinic context. Gemini adapted the rubric accordingly and even explained its modifications without prompting. This showcased Gemini’s strength in contextual adjustments and transparency in reasoning.
Final Prompts:
Initial Prompt: "I am an internal medicine faculty for an OSCE designed for internal medicine residents. The activity is counseling a simulated patient who suffered a miscarriage. Please design a grading rubric that I can use to assess the resident."
Refined Prompt: "How would you modify this for a clinic encounter with a third-year medical student, where they have 15 minutes to see a patient in clinic?"
Key Takeaways:
- Gemini was strong at context-aware adaptation.
- The model justified its revisions, a valuable feature for learning and quality control.
- Showed how changing learner level can shape assessment design.
Group 5: Rubric with Critical Elements (Miscarriage Counseling)
Prompt Development Notes:
This group took a highly structured approach from the outset, building an RTF-style prompt that emphasized essential communication behaviors. They added an ethical dimension through the inclusion of a 'Critical Elements' checklist and used performance thresholds to model rubric-based assessment. The AI responded with a robust output aligned to the group’s objectives without requiring extensive back-and-forth.
Final Prompt:
"You are an experienced medical educator. Create a grading rubric on counseling a standardized patient who suffered a miscarriage. Add a mandatory 'Critical Elements' checklist (pass/fail) covering: permission-seeking before discussing causes, autonomy validation, and safety assessment documentation. The target population is 3rd-year medical students. Provide a scoring model with: Mastery (90–100%), Competent (75–89%), Remediation Needed (<75%)."
Key Takeaways:
- Excellent example of detailed formative assessment design.
- Prompt structure contributed to clear and actionable output.
- Reinforced the role of educators in guiding AI to highlight psychosocial competencies.
Group 6: Heart Failure Quiz (Chain-of-Thought Prompting)
Prompt Development Notes:
This group used a sophisticated instructional design prompt modeled after board exam question-writing guidelines. They included reasoning steps like identifying cognitive levels and distractor review, embodying the chain-of-thought prompting strategy. Their prompt mirrored expert faculty workflows and demonstrated how AI can simulate structured human reasoning.
Final Prompt:
(Identical to Group 2, also used by this group) "You are a board-certified cardiologist and experienced NBME/USMLE item-writer. Your task is to create a 5-item multiple-choice quiz (single-best-answer format) for a second-year medical-school module on Heart Failure Pathophysiology. For each question: list 4–6 high-yield concepts, match to a learning objective and USMLE cognitive level, draft a clinical vignette (≤120 words), write a focused lead-in, and provide five answer choices (A–E) with rationales. Flag the best answer with an asterisk (*). Review for negative stems, cueing, or implausible distractors."
Key Takeaways:
- Modeled high-level item writing pedagogy.
- Chain-of-thought strategy improved alignment and content quality.
- Effective example for educators mentoring students or junior faculty in assessment.
Group 7: TRACI Framework Quiz
Prompt Development Notes:
The group employed the TRACI framework (Task, Role, Audience, Create, Intent) to provide a scaffolded structure for their prompt. They deliberately included a vignette featuring a historically underrepresented patient population to enhance representation and inclusivity. This group’s work illustrated how prompt frameworks can align content creation with values-based education.
Final Prompt:
"Task: Create a 5-item MCQ quiz for a heart failure pathophysiology module
Role: Professor of Medical Education
Audience: First-year medical students
Create: Include five-choice questions with 90 seconds per question, rationales for correct and incorrect answers
Intent: Help differentiate types of heart failure
(Also asked to include a clinical vignette featuring a 24-year-old Black female patient with hypertension and relevant lab values.)"
Key Takeaways:
- Reinforced use of scaffolded prompt frameworks.
- Incorporated inclusive clinical context.
- Strong starting model for DEI-informed prompt development.
Group 8: USMLE MCQ Generator
Prompt Development Notes:
The group kept the prompt deliberately simple and task-focused, aiming to test Claude 3.7 Sonnet’s default behavior without additional layers of instruction. The AI produced usable questions with minimal refinement needed, suggesting that simpler prompts may be sufficient for straightforward content-generation use cases.
Final Prompt:
"You are a medical school faculty trying to write exams for second year medical students preparing for the USMLE. Please create 5 MCQ questions. Provide the questions first, then show us the answers."