- A story about Text Summarization - What the Alignment is, and what’s the problem? - How RLHF works - Data setup, and why we’d like to follow instructions - Reward Modeling and PPO - Why RLHF works (and when it doesn’t) - ChatGPT improvements - What’s next and what to expect? Data Fest 2023: Трек “Instruct Models“: Наши соц.сети: Telegram: Вконтакте:
Hide player controls
Hide resume playing