Explore >> Select a destination


You are here

richb.rice.edu
| | blog.adnansiddiqi.me
16.3 parsecs away

Travel
| | What Is Synthetic Data? Synthetic data is machine-generated data based on real-world data. It requires building a machine learning (ML) model to capture the patterns in the original, real data before generating new synthetic data based on these patterns. The generated data accurately represents the original data's statistical distributions, patterns, and properties. Synthetic data is
| | ssc.io
16.3 parsecs away

Travel
| | Data integration and cleaning have long been a key focus of the data management community. Recent research indicates the potential of large language models (LLMs) for such tasks. However, scaling and automating data wrangling with LLMs for real-world use cases poses additional challenges. Manual prompt engineering for example, is expensive and hard to operationalise, while full fine-tuning of LLMs incurs high compute and storage costs. Following up on previous work, we evaluate parameter-efficient fine-tuning (PEFT) methods for efficiently automating data wrangling with LLMs. We conduct a study of four popular PEFT methods on differently sized LLMs for ten benchmark tasks, where we find that PEFT methods achieve performance on-par with full fine-tuning, and that we can leverage small LLMs with negligible performance loss. However, even though such PEFT methods are parameter-efficient, they still incur high compute costs at training time and require labeled training data. We explore a zero-shot setting to further reduce deployment costs, and propose our vision for ZeroMatch, a novel approach to zero-shot entity matching. It is based on maintaining a large number of pretrained LLM variants from different domains and intelligently selecting an appropriate variant at inference time.
| | blog.fastforwardlabs.com
17.3 parsecs away

Travel
| | Advancements in machine learning have evolved to such an extent that machines can not only understand the input data but have also learned to create it. Generative models are one of the most promising approaches towards this goal. To train such a model we first collect a large amount of data (be it images, text, etc.) and then train a model to generate data like it. Generative Adversarial Networks (GANs) are one such class of generative models, that, given a training dataset, learn to generate new data with the same statistics as the training set.
| | bike-lab.org
40.7 parsecs away

Travel
| After doing some crunching this week on data about rapidly-gentrifying Valencia Street in San Francisco, and finding that residential density is actually dropping in the neighborhood despite new housing construction, I wondered whether the same phenomenon could be found elsewhere in the country. I didn't have to wait long for another case study, as Lynda Lopez and a number of other peeps I follow from Chicago posted about Mayor Lori Lightfoot's ill-considered statement about "vibrancy" in Pilsen, a gentr...