Do LLMs really “show their work“ when they perform chain of thought reasoning? “Measuring Faithfulness in Chain-of-Thought Reasoning“ is a new paper from Anthropic that aims to study this question empirically with a series of tests. Timestamps: 00:00 - Measuring Faithfulness in Chain-of-Thought Reasoning 00:53 - What is Chain-of-Thought reasoning? 03:15 - Do the Chain-of-Thought Steps Really Reflect the Model’s Reasoning? 07:03 - Possible Faithfulness Failures 08:44 - Encoded Reasoning/Steganography 12:01 - Experiment Details 15:44 - Does Truncating the Chain of Thought Change the Predicted Answer? 16:53 - Does Editing the Chain of Thought Change the Predicted Answer? 17:14 - Do Uninformative Chain of Thought Tokens Also Improve Performance? 18:28 - Does Rewording the Chain of Thought Change the Predicted Answer? 20:20 - Does Model Size Affect Chain of Thought Faithfulness? 22:04 - Limitations 24:38 - Externalized Reasoning Oversight Topics: ##ai #anthropic #CoT #reasoning Link to the paper: For related content: - Twitter: - Research lab: - personal webpage: - YouTube: @SamuelAlbanie1 - TikTok: @samuelalbanie - Instagram: - LinkedIn: - Threads: @samuelalbanie - Discord server for filtir: (Optional) if you’d like to support the channel: - - Credits: Image credit (Chelsea photo) –#/media/File:
Hide player controls
Hide resume playing