SUMM

Introduction and Background 00:00

Sander Schulhoff introduces himself as CEO of Learn Prompting and HackAPrompt, with a background in AI research, NLP, and deep reinforcement learning.
Early involvement in prompt engineering, writing the first internet guide, and expanding into prompt injection and AI security.
Organized the first prompt injection/AI red teaming competition, leading to the creation of a 600,000-prompt dataset now widely used for benchmarking.
Goals for the session: explain why prompt engineering remains relevant, discuss security deployments, and highlight the challenges of securing generative AI.

Story and Context: Path to Prompt Engineering 04:09

Gained initial experience through AI deception research in the Diplomacy board game, which later tied into relevance for modern AI systems.
Contributed to the MinRL (Minecraft RL) project—connecting reinforcement learning research to the emerging trend of AI “agents."
Created Learn Prompting as a college project, scaling it into a major resource cited by OpenAI, Google, BCG, US Gov, and others.

Fundamentals of Prompt Engineering 09:16

Definition: A "prompt" is simply a message sent to a generative AI; prompt engineering is the process of improving that prompt for better results.
Prompt engineering can increase AI task accuracy significantly, but poorly crafted prompts can drop accuracy to zero.
Prompting, as a concept, goes back years under various names (e.g., control codes), but "prompt engineering" only became a widely used term around 2021.
Two main types of prompt engineering users: non-technical (iterative, conversational mode with chatbots) and technical (static, system-level prompts for tasks).

Systematic Literature Review and Techniques 17:26

Schulhoff led a large-scale literature review (the “prompt report”) cataloguing 200 prompting techniques, including about 58 text-based ones.
Defined key prompt parts (e.g., role, examples) and clarified which components are most effective across real-world usages.
Role prompting (assigning AI a "role") was thought to improve accuracy (e.g., "math professor" for solving math), but evidence showed it's largely ineffective for accuracy-based tasks and more urban myth than fact. For open-ended tasks like writing, it can still help.

Advanced Prompting Techniques 29:26

Thought Inducement: Chain of thought prompting, where AI is instructed to show step-by-step reasoning, is vital for accuracy and inspired reasoning-model development. The AI’s "explanations" do not always reflect its internal process but improve outcomes.
Decomposition-Based Prompting: Techniques like least-to-most prompting split complex problems into solvable subproblems.
Ensembling: Using multiple prompts or models to reach a consensus answer; less used today.
In-Context Learning and Few-Shot Prompting: Providing task examples during prompt creation remains a cornerstone technique. The number, order, balance, and quality of examples can majorly impact performance, but optimal parameters are highly variable and often trial-and-error.
Prompt performance can fluctuate based on model fine-tuning and even prompt mining—choosing prompt styles that match model training data yields better results.

Practical Challenges and Open Questions in Prompting 43:50

Exemplar ordering (order of examples) can shift accuracy dramatically; the field lacks consensus on optimal organization.
Balancing labels (class distribution) and checking quality is as important as in classical ML but subject to unusual quirks in LLMs.
Similarity between prompt examples and target instances may help, but results can also conflict between studies.
Prompt length and format matter; too lengthy prompts may degrade results, but evidence isn’t definitive.

Human vs. Automated Prompt Engineering 60:02

Schulhoff and his team compared dozens of prompt engineering techniques on tasks like detecting indicators of suicidal intent in Reddit comments.
Manual prompt engineering plateaued in performance, with automated prompt engineering tools (e.g., DSPY) outperforming or enhancing results when combined with human input.

Issues with Benchmarks and Reasoning Models 69:59

Benchmarks are often convoluted by unclear methodology and prompt strategies, undermining direct model comparison.
For latest reasoning models, explicit chain-of-thought prompting is usually unnecessary—and may even hinder performance—though most general prompt advice still applies.

Towards Automated Prompt Technique Selection 73:06

Meta-prompting (using LLMs to optimize prompts) exists as product features, but without clear reward functions, their effectiveness is limited.
No robust cross-model prompt transfer methodology exists; prompts that work on one model may or may not succeed elsewhere.
Red teaming experience shows prompt attacks have some transferability (e.g., 40% from GPT-3 to GPT-4).

Introduction to AI Red Teaming and Security 82:01

AI red teaming = “getting AIs to do or say bad things;” jailbreaking is a subset, using intentional manipulative prompts.
Many creative attack strategies exist (e.g., role-based, multilingual, encoding tricks), such as the "grandmother" or "Stan" jailbreaks.
Prompt injection involves bypassing developer instructions; historically shown to easily defeat simple system prompts.

Real-World Red Teaming Harms and Incidents 90:28

Discussed real incidents: chatbots tricked into making hazardous statements or performing unauthorized actions (e.g., car dealership bot, crypto payout bots, math-solving apps leaking secrets).
These incidents usually stem from classical security oversights, although prompt injection remains an unsolved threat.

Classical Cybersecurity vs. AI Security 94:13

Classical cybersecurity is binary (threats can be fully patched); AI security is probabilistic and never fully closed, due to the nature of LLMs (non-determinism and prompt flexibility).
Prompt injection vulnerability is inherent and intractable—no guarantee of full defense, only statistical mitigation.

Philosophies and Observations from AI Red Teaming 98:35

Jailbreaks are easily and quickly found in new models despite security claims.
Automated red teaming and improved datasets are essential for raising the bar in AI security, but perfect defense is unachievable.
Defensive strategies like improved system prompts or filter models are largely ineffective; obfuscation and encoding can bypass most current protections.

Challenges with Agents and Agentic Security 107:20

True agent security is unsolved: agents acting in the real world (physical or digital) remain vulnerable to prompt-based exploits (“adversarial robustness”).
Humans can manipulate or coerce agents into harmful or unintended behaviors, endangering deployment at scale.
Companies are deploying insecure agents, risking financial loss and customer harm.

HackAPrompt Competition and Live Red Teaming 116:44

HackAPrompt offers ongoing AI red teaming challenges, with realistic tasks such as extracting harmful instructions or bypassing policy restrictions.
Dataset and challenges from the competition are now broadly used by major labs for model testing and improvement.
Participants are encouraged to experiment with advanced prompts and techniques to expose weaknesses.

Final Q&A and Closing 113:15 / 116:44 / 120:45

Prompt filters (input/output) can often be circumvented with encoding/translation tricks.
There are psychological and subtle manipulation threats (e.g., priming users via LLM output), shown to have real effects and raise ethical issues.
Attendees are invited to try the competition and reach out for further questions or collaboration.

Prompt Engineering and AI Red Teaming — Sander Schulhoff, HackAPrompt/LearnPrompting

Introduction and Background 00:00

Story and Context: Path to Prompt Engineering 04:09

Fundamentals of Prompt Engineering 09:16

Systematic Literature Review and Techniques 17:26

Advanced Prompting Techniques 29:26

Practical Challenges and Open Questions in Prompting 43:50

Human vs. Automated Prompt Engineering 60:02

Issues with Benchmarks and Reasoning Models 69:59

Towards Automated Prompt Technique Selection 73:06

Introduction to AI Red Teaming and Security 82:01

Real-World Red Teaming Harms and Incidents 90:28

Classical Cybersecurity vs. AI Security 94:13

Philosophies and Observations from AI Red Teaming 98:35

Challenges with Agents and Agentic Security 107:20

HackAPrompt Competition and Live Red Teaming 116:44

Final Q&A and Closing 113:15 / 116:44 / 120:45