Explore >> Select a destination


You are here

algobeans.com
| | ssc.io
15.8 parsecs away

Travel
| | When designing data science (DS) pipelines, end-users can get overwhelmed by the large and growing set of available data preprocessing and modeling techniques. Intelligent discovery assistants (IDAs) and automated machine learning (AutoML) solutions aim to facilitate end-users by (semi-)automating the process. However, they are expensive to compute and yield limited applicability for a wide range of real-world use cases and application domains. This is due to (a) their need to execute thousands of pipelines to get the optimal one, (b) their limited support of DS tasks, e.g., supervised classification or regression only, and a small, static set of available data preprocessing and ML algorithms; and (c) their restriction to quantifiable evaluation processes and metrics, e.g., tenfold cross-validation using the ROC AUC score for classification. To overcome these limitations, we propose a human-in-the-loop approach for the assisted design of data science pipelines using previously executed pipelines. Based on a user query, i.e., data and a DS task, our framework outputs a ranked list of pipeline candidates from which the user can choose to execute or modify in real time. To recommend pipelines, it first identifies relevant datasets and pipelines utilizing efficient similarity search. It then ranks the candidate pipelines using multi-objective sorting and takes user interactions into account to improve suggestions over time. In our experimental evaluation, the proposed framework significantly outperforms the state-of-the-art IDA tool and achieves similar predictive performance with state-of-the-art long-running AutoML solutions while being real-time, generic to any evaluation processes and DS tasks, and extensible to new operators.
| | yasoob.me
15.2 parsecs away

Travel
| | For those of you who wish to begin learning Python for Data Science, here is a list of various resources that will get you up and running. Included are things like online tutorials and short interactive course, MOOCs, newsletters, books, useful tools and more. We decided to put this together so that you can begin learning Data Science with Python right of the bat, without having to spend hours surfing the web in search of resources.
| | www.markhw.com
13.9 parsecs away

Travel
| |
| | dustintran.com
103.6 parsecs away

Travel
| One aspect I always enjoy about machine learning is that questions often go back to the basics. The field essentially goes into an existential crisis every dozen years-rethinking our tools and asking foundational questions such as "why neural networks" or "why generative models".1 This was a theme in my conversations during NIPS 2016 last week, where a frequent topic was on the advantages of a Bayesian perspective to machine learning. Not surprisingly, this appeared as a big discussion point during the p...