Myvideo

Guest

Login

Entity Resolution at Scale Huon Wilson YOW! 2019

Uploaded By: Myvideo
1 view
0
0 votes
0

This presentation was recorded at YOW! 2019. #GOTOcon #YOW Huon Wilson - Software Engineer at CSIRO's Data61 RESOURCES ABSTRACT Real world #data is rarely clean: there are often corrupted and duplicate records, and even corrupted records that are duplicates! One step in #DataCleaning is #EntityResolution: connecting all of the duplicate records into the single underlying entity that they represent. This talk will describe how we approach entity resolution, and look at some of the challenges, solutions and lessons learnt when doing entity resolution on top of #ApacheSpark, and scaling it to process billions of records. [...] RECOMMENDED BOOKS Adi Polak • Machine Learning with Apache Spark • Holden Karau & Rachel Warren • High Performance Spark • Holden Karau, Konwinski, Wendell & Zaharia • Learning Spark • #DataEngineering #HuonWilson #SoftwareEngineering #Programming #YOWcon Looking for a unique learning experience? Attend the next GOTO conference near you! Get your ticket at Sign up for updates and specials at SUBSCRIBE TO OUR CHANNEL - new videos posted almost daily.

Share with your friends

Link:

Embed:

Video Size:

Custom size:

x

Add to Playlist:

Favorites
My Playlist
Watch Later