The first Foundations of Data Science virtual talk of this year will take place on Thursday, Jan 13th at 10:00 AM Pacific Time (13:00 Eastern Time, 19:00 Central European Time, 18:00 UTC). Piotr Indyk from MIT will speak about “Learning-Based Sampling and Streaming”.
Abstract: Classical algorithms typically provide “one size fits all” performance, and do not leverage properties or patterns in their inputs. A recent line of work aims to address this issue by developing algorithms that use machine learning predictions to improve their performance. In this talk I will present two examples of this type, in the context of streaming and sampling algorithms. In particular, I will show how to use machine learning predictions to improve the performance of (a) low-memory streaming algorithms for frequency estimation (ICLR’19), and (b) sampling algorithms for estimating the support size of a distribution (ICLR’21). Both algorithms use an ML-based predictor that, given a data item, estimates the number of times the item occurs in the input data set.
The talk will cover material from papers co-authored with T Eden, CY Hsu, D Katabi, S Narayanan, R Rubinfeld, S Silwal, T Wagner and A Vakilian.
The series is supported by the NSF HDR TRIPODS Grant 1934846.