Democratizing Data: How Apple and UW’s Data Filtering Networks Redefine Large-Scale Training Sets

In a new paper Data Filtering Networks, a research team from Apple and University of Washington introduces the concept of data filtering networks (DFNs). These neural networks, specifically designed for data filtration, demonstrate the capacity to generate extensive, high-quality pre-training datasets efficiently.