Learn how to build end-to-end scalable machine learning solutions with Apache Spark. With this practical guide, author Adi Polak introduces data and ML practitioners to creative solutions that supersede today's traditional methods. You'll learn a more holistic approach that takes you beyond specific requirements and organizational goals--allowing data and ML practitioners to collaborate and understand each other better. Scaling Machine Learning with Spark examines several technologies for building end-to-end distributed ML workflows based on the Apache Spark ecosystem with Spark MLlib, MLflow, TensorFlow, and PyTorch. If you're a data scientist who works with machine learning, this book shows you when and why to use each technology. You will: • Explore machine learning, including distributed computing concepts and terminology • Manage the ML lifecycle with MLflow • Ingest data and perform basic preprocessing with Spark • Explore feature engineering, and use Spark to extract features • Train a model with MLlib and build a pipeline to reproduce it • Build a data system to combine the power of Spark with deep learning • Get a step-by-step example of working with distributed TensorFlow • Use PyTorch to scale machine learning and its internal architecture * Book description: © O’Reilly: The interview is based on the book “Scaling Machine Learning with Spark“: Find out more in the upcoming GOTO Book Club episode. Release date: Thursday, May 18, 2023 Check out more here: Adi Polak - VP of Developer Experience at Treeverse & Contributing to lakeFS OSS @@polakadi Holden Karau - Co-Author of “Kubeflow for Machine Learning“ & many more books & Open Source Engineer at Netflix @HoldenKarau RESOURCES Adi @adipolak 👋-adi-polak-68548365 Holden @holden Meet the authors who empower developers to continue innovating in the GOTO Book Club: #GOTObookclub #GOTOcon #GOTOconferences #LegendsOfSoftware #GOTOams #GOTOchgo #GOTOber #GOTOcph #GOTOnights #SoftwareEngineering #Spark #ApacheSpark #ML #MachineLearning #MLlib #TensorFlow #PyTortch #DataScience #AI #ComputerScience #AdiPolak #HoldenKarau #Programming RECOMMENDED BOOKS Adi Polak • Machine Learning with Apache Spark • Holden Karau, Trevor Grant, Boris Lublinsky, Richard Liu & Ilan Filonenko • Kubeflow for Machine Learning • Holden Karau • Distributed Computing 4 Kids • Holden Karau • Scaling Python with Dask • Holden Karau & Boris Lublinsky • Scaling Python with Ray • Holden Karau & Rachel Warren • High Performance Spark • Holden Karau, Konwinski, Wendell & Zaharia • Learning Spark • Holden Karau & Krishna Sankar • Fast Data Processing with Spark 2nd Edition • Holden Karau • Fast Data Processing with Spark 1st Edition • Looking for a unique learning experience? Attend the next GOTO conference near you! Get your ticket at SUBSCRIBE TO OUR CHANNEL - new videos posted almost daily.
Hide player controls
Hide resume playing