Об этом курсе
4.6
Оценки: 1,638
Рецензии: 281

100% онлайн

Начните сейчас и учитесь по собственному графику.

Гибкие сроки

Назначьте сроки сдачи в соответствии со своим графиком.

Прибл. 48 часа на выполнение

Предполагаемая нагрузка: 6 weeks of study, 5-8 hours/week...

Английский

Субтитры: Английский, Арабский

Приобретаемые навыки

Data Clustering AlgorithmsK-Means ClusteringMachine LearningK-D Tree

100% онлайн

Начните сейчас и учитесь по собственному графику.

Гибкие сроки

Назначьте сроки сдачи в соответствии со своим графиком.

Прибл. 48 часа на выполнение

Предполагаемая нагрузка: 6 weeks of study, 5-8 hours/week...

Английский

Субтитры: Английский, Арабский

Программа курса: что вы изучите

Неделя
1
1 ч. на завершение

Welcome

Clustering and retrieval are some of the most high-impact machine learning tools out there. Retrieval is used in almost every applications and device we interact with, like in providing a set of products related to one a shopper is currently considering, or a list of people you might want to connect with on a social media platform. Clustering can be used to aid retrieval, but is a more broadly useful tool for automatically discovering structure in data, like uncovering groups of similar patients.<p>This introduction to the course provides you with an overview of the topics we will cover and the background knowledge and resources we assume you have....
4 видео ((всего 25 мин.)), 4 материалов для самостоятельного изучения
4 видео
Course overview3мин
Module-by-module topics covered8мин
Assumed background6мин
4 материала для самостоятельного изучения
Important Update regarding the Machine Learning Specialization10мин
Slides presented in this module10мин
Software tools you'll need for this course10мин
A big week ahead!10мин
Неделя
2
4 ч. на завершение

Nearest Neighbor Search

We start the course by considering a retrieval task of fetching a document similar to one someone is currently reading. We cast this problem as one of nearest neighbor search, which is a concept we have seen in the Foundations and Regression courses. However, here, you will take a deep dive into two critical components of the algorithms: the data representation and metric for measuring similarity between pairs of datapoints. You will examine the computational burden of the naive nearest neighbor search algorithm, and instead implement scalable alternatives using KD-trees for handling large datasets and locality sensitive hashing (LSH) for providing approximate nearest neighbors, even in high-dimensional spaces. You will explore all of these ideas on a Wikipedia dataset, comparing and contrasting the impact of the various choices you can make on the nearest neighbor results produced....
22 видео ((всего 137 мин.)), 4 материалов для самостоятельного изучения, 5 тестов
22 видео
1-NN algorithm2мин
k-NN algorithm6мин
Document representation5мин
Distance metrics: Euclidean and scaled Euclidean6мин
Writing (scaled) Euclidean distance using (weighted) inner products4мин
Distance metrics: Cosine similarity9мин
To normalize or not and other distance considerations6мин
Complexity of brute force search1мин
KD-tree representation9мин
NN search with KD-trees7мин
Complexity of NN search with KD-trees5мин
Visualizing scaling behavior of KD-trees4мин
Approximate k-NN search using KD-trees7мин
Limitations of KD-trees3мин
LSH as an alternative to KD-trees4мин
Using random lines to partition points5мин
Defining more bins3мин
Searching neighboring bins8мин
LSH in higher dimensions4мин
(OPTIONAL) Improving efficiency through multiple tables22мин
A brief recap2мин
4 материала для самостоятельного изучения
Slides presented in this module10мин
Choosing features and metrics for nearest neighbor search10мин
(OPTIONAL) A worked-out example for KD-trees10мин
Implementing Locality Sensitive Hashing from scratch10мин
5 практического упражнения
Representations and metrics12мин
Choosing features and metrics for nearest neighbor search10мин
KD-trees10мин
Locality Sensitive Hashing10мин
Implementing Locality Sensitive Hashing from scratch10мин
Неделя
3
2 ч. на завершение

Clustering with k-means

In clustering, our goal is to group the datapoints in our dataset into disjoint sets. Motivated by our document analysis case study, you will use clustering to discover thematic groups of articles by "topic". These topics are not provided in this unsupervised learning task; rather, the idea is to output such cluster labels that can be post-facto associated with known topics like "Science", "World News", etc. Even without such post-facto labels, you will examine how the clustering output can provide insights into the relationships between datapoints in the dataset. The first clustering algorithm you will implement is k-means, which is the most widely used clustering algorithm out there. To scale up k-means, you will learn about the general MapReduce framework for parallelizing and distributing computations, and then how the iterates of k-means can utilize this framework. You will show that k-means can provide an interpretable grouping of Wikipedia articles when appropriately tuned....
13 видео ((всего 79 мин.)), 2 материалов для самостоятельного изучения, 3 тестов
13 видео
An unsupervised task6мин
Hope for unsupervised learning, and some challenge cases4мин
The k-means algorithm7мин
k-means as coordinate descent6мин
Smart initialization via k-means++4мин
Assessing the quality and choosing the number of clusters9мин
Motivating MapReduce8мин
The general MapReduce abstraction5мин
MapReduce execution overview and combiners6мин
MapReduce for k-means7мин
Other applications of clustering7мин
A brief recap1мин
2 материала для самостоятельного изучения
Slides presented in this module10мин
Clustering text data with k-means10мин
3 практического упражнения
k-means18мин
Clustering text data with K-means16мин
MapReduce for k-means10мин
Неделя
4
3 ч. на завершение

Mixture Models

In k-means, observations are each hard-assigned to a single cluster, and these assignments are based just on the cluster centers, rather than also incorporating shape information. In our second module on clustering, you will perform probabilistic model-based clustering that provides (1) a more descriptive notion of a "cluster" and (2) accounts for uncertainty in assignments of datapoints to clusters via "soft assignments". You will explore and implement a broadly useful algorithm called expectation maximization (EM) for inferring these soft assignments, as well as the model parameters. To gain intuition, you will first consider a visually appealing image clustering task. You will then cluster Wikipedia articles, handling the high-dimensionality of the tf-idf document representation considered....
15 видео ((всего 91 мин.)), 4 материалов для самостоятельного изучения, 3 тестов
15 видео
Aggregating over unknown classes in an image dataset6мин
Univariate Gaussian distributions2мин
Bivariate and multivariate Gaussians7мин
Mixture of Gaussians6мин
Interpreting the mixture of Gaussian terms5мин
Scaling mixtures of Gaussians for document clustering5мин
Computing soft assignments from known cluster parameters7мин
(OPTIONAL) Responsibilities as Bayes' rule5мин
Estimating cluster parameters from known cluster assignments6мин
Estimating cluster parameters from soft assignments8мин
EM iterates in equations and pictures6мин
Convergence, initialization, and overfitting of EM9мин
Relationship to k-means3мин
A brief recap1мин
4 материала для самостоятельного изучения
Slides presented in this module10мин
(OPTIONAL) A worked-out example for EM10мин
Implementing EM for Gaussian mixtures10мин
Clustering text data with Gaussian mixtures10мин
3 практического упражнения
EM for Gaussian mixtures18мин
Implementing EM for Gaussian mixtures12мин
Clustering text data with Gaussian mixtures8мин
4.6
Рецензии: 281Chevron Right

35%

начал новую карьеру, пройдя эти курсы

36%

получил значимые преимущества в карьере благодаря этому курсу

Лучшие рецензии

автор: BKAug 25th 2016

excellent material! It would be nice, however, to mention some reading material, books or articles, for those interested in the details and the theories behind the concepts presented in the course.

автор: JMJan 17th 2017

Excellent course, well thought out lectures and problem sets. The programming assignments offer an appropriate amount of guidance that allows the students to work through the material on their own.

Преподаватели

Avatar

Emily Fox

Amazon Professor of Machine Learning
Statistics
Avatar

Carlos Guestrin

Amazon Professor of Machine Learning
Computer Science and Engineering

О Вашингтонский университет

Founded in 1861, the University of Washington is one of the oldest state-supported institutions of higher education on the West Coast and is one of the preeminent research universities in the world....

О специализации ''Машинное обучение'

This Specialization from leading researchers at the University of Washington introduces you to the exciting, high-demand field of Machine Learning. Through a series of practical case studies, you will gain applied experience in major areas of Machine Learning including Prediction, Classification, Clustering, and Information Retrieval. You will learn to analyze large and complex datasets, create systems that adapt and improve over time, and build intelligent applications that can make predictions from data....
Машинное обучение

Часто задаваемые вопросы

  • Зарегистрировавшись на сертификацию, вы получите доступ ко всем видео, тестам и заданиям по программированию (если они предусмотрены). Задания по взаимной оценке сокурсниками можно сдавать и проверять только после начала сессии. Если вы проходите курс без оплаты, некоторые задания могут быть недоступны.

  • Записавшись на курс, вы получите доступ ко всем курсам в специализации, а также возможность получить сертификат о его прохождении. После успешного прохождения курса на странице ваших достижений появится электронный сертификат. Оттуда его можно распечатать или прикрепить к профилю LinkedIn. Просто ознакомиться с содержанием курса можно бесплатно.

Остались вопросы? Посетите Центр поддержки учащихся.