Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas spanned by multiple horizontal data pipelines, platforms, and algorithms. We are unifying data science and data engineering, showing what really works to run businesses at scale.
The Internet Archive has many petabytes of archived webpages, books, videos, and images. Recently we've been making a big effort to make our data and metadata more accessible to outside users. I'll show off some of the methods to download stuff from the Archive, and then I'll show some example projects using this data.