首页 > 其他 > 详细

Fast global convergence of gradient methods for high-dimensional statistical recovery

时间:2020-09-28 22:06:52      阅读:48      评论:0      收藏:0      [点我收藏+]

 

 

https://www.loyhome.com/%E5%B0%8F%E7%AA%A5%E9%AB%98%E7%BB%B4%E6%95%B0%E6%8D%AE%E9%99%8D%E7%BB%B4-2/

 

小窥“高维数据降维”

 

大规模机器学习有两个含义,一个是说数据的维度特别高,更多的是说数据的个数特别多。
前者带来的问题一个是算法耗时增加,更严重的问题是overfitting。但是在一定条件下这两个问题都是可以解决的,可以参见这篇文章:http://arxiv.org/PS_cache/arxiv/pdf/1104/1104.4824v1.pdf
后者学术界主要是提出时间空间上都更高效的方法,比如Hashing,Streaming algorithms等等。因为这个问题的现实意义,工业界也很关注这一点,但他们除了关注算法效率之外,还关注算法的扩放性(scalability)。CU有一门课讲得是这个,可以参考:http://www.cs.columbia.edu/~coms699812/

 

COMS 6998-12: Dealing with Massive Data

 


 

Administrivia

Course Description

    The size of modern datasets is staggering. With Yahoo! Mail moving over 3 billion messages per day, Twitter recording more than 100 million tweets per day, Facebook users spending over 20 billions minutes every day, and Google executing over a billion searches every day, how does one make sense of all of the data that is generated?
    This course will provide an introduction to algorithm design for such large datasets. We will cover streaming algorithms, which never store the whole input in memory and parallel algorithms, which partition the computation across multiple machines. In particular we will look how to utilize the MapReduce framework for large scale data analysis.

The main goal of this course is to introduce algorithmic design techniques for dealing with large data sets. This will be primarily a theoretical analysis course, with a focus on practical algorithms and applications.

Prerequisites

Algorithms, Discrete Math. No prior knowledge of streaming or parallel algorithms is necessary.

Homework

Homework 1. Posted February 8. Due February 28 at end of class (8pm).
Homework 1.5. Posted March 2. No Due Date.
Homework 2 Posted March 24. Due April 14 at 11:59pm NY time.
Project Posted April 18. Due May 2 at end of class (8pm).

Approximate Schedule

January 24: Introduction. Notes

  For more see:

January 31: Distinct Value Estimation. Notes

   For more see:

February 7: Finding Frequent Elements. Notes

 

  • Correction. For the second algorithm described in class, to estimate count of element j, use: Count[h(x_j)]*g(x_j). In class, we used Z*g(x_j). While both of them give unbiased results (Why?) the latter has a much lower variance.
    For more see: February 14: Clustering on streams

    For more see:

February 21: Graph Algorithms on streams

    For more see:

February 28: PRAMs and Matchings

    For more see:

March 7: Intro to MapReduce

    For more see:

March 14: No Class -- Spring Break

March 21: Social Network Analysis

    For more see:

March 28: PageRank and Data Privacy

    Slides from G. Cormode‘s talk: pdf

April 4: Recommendation Systems

    Slides from J. Hofman‘s talk: pdf

April 11: Max Matchings in MapReduce

    For more see:

April 18: Graph Spanners

    For more see:

March 25: Large Scale Machine Learning

May 2:

 

Grading Policy

There will be two problem sets (30% each), one final project (30%), participation (10%).

Assignment Policy

The problem sets will require you to do proofs. You are encouraged to discuss the course material and the homework problems with each other in small groups (2-3 people), as long as you list all discussion partners on your problem set. Discussion of homework problems may include brainstorming and verbally walking through possible solutions, but should not include one person telling the others how to solve the problem. In addition, each person must write up their solutions entirely on their own; you may not look at another student‘s written solutions. List your collaborators on your solutions. Moreover, all materials you consult must be appropriately acknowledged.

Please consult me if you have any questions about this policy. When in doubt play it safe. If I suspect that you have turned in a homework assignment which you don‘t understand, you may be asked to orally defend your solutions. If you turn in a homework assignment in violation of the above policies, the highest grade you will receive on that assignment is 0, and you may receive a negative grade.

Students are expected to adhere to the Academic Honesty policy of the Computer Science Department; this policy can be found in full here.

 

 
   

 

 

技术分享图片

 

 

 

 

Fast global convergence of gradient methods for high-dimensional statistical recovery

原文:https://www.cnblogs.com/cx2016/p/13746310.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!