This course teaches learners how to scale machine learning workflows to big data using Fugue. The learning outcomes include understanding how to transition from Pandas to Spark or Dask as data grows, implementing Fugue to port Python code with minimal changes, and writing code in a framework-agnostic manner for different execution environments. The course covers skills such as Spark transformation, Fugue code implementation, lazy evaluation of Spark, partitioning, and decoupling logic and execution. The teaching method involves a demo-driven approach with examples and explanations. The intended audience for this course includes data scientists, machine learning engineers, and anyone interested in scaling data compute from a single machine to a Spark cluster.
Leave a Reply