This is a follow up to a post to the list of Machine Learning and Data Sciences resources I put up a little while ago. This post contains some links to resources on clustering and Reinforcement Learning that I didn’t get to in the first post. Like the first one, it’s a bit haphazard, and is not meant to be definitive. Have fun, and feel free to post comments with your favorites.

**Cinderella or the Ugly Stepchild – Reinforcement Learning**

Almost every Machine Learning text or tutorial begins with something like ‘Machine Learning can be broken out into three primary tasks; Unsupervised Learning, Supervised learning, and Reinforcement Learning.’ After a quick overview of RL, they then blow off any more discussion of it.

Well, let’s correct that a bit.

What is an RL problem? Well, an RL problem is any problem where you are trying to figure out what the best course of action should be in a particular environment; basically it’s a problem of optimal control. A few examples are learning a robot controller, or learning a computer program to play a game like backgammon, or discovering the optimal ads to place in front of someone’s face when they are browsing the web. All of these problems can be thought of as an RL problem.

Unlike supervised learning, where you have the correct answers to train your learner, in RL problems you only get a reward signal back from the environment – some measure of goodness (or badness) associated with some action or policy you have taken. So with RL problems, you need to interact with an environment, in order to learn. Basically, it’s how we as people learn to navigate our world. We do something, and then see if it was good or not. If it was good, we keep doing it, if it is bad, we try something else.

Some related ideas from control theory are dynamic programing and optimal control.

First take a peek at MDPs from Wiki since MDPs are used as a framework for sequential decision problems.

** TD, UMASS and the Pixies **

While not cutting edge, you can’t go wrong with grounding yourself with Rich Sutton’s and Andy Barto’s intro text on RL. Here is a link to the free online version

Rich came up with Temporal Difference learning (TD) while he was working on his PhD at UMASS. Speaking of UMASS, check this out ! Yeah! FWIW, I took Don Levine’s Avant Garde Film class back in the day, which allegedly inspired the Pixies’ to write Debaser.

I’m looking at you Stanford.

Occasionally you bump into someone who wants to impress on you their data chops (c’mon, we all do it occasionally). Anyway, maybe they are involved in investing, or they have some startup or something in the valley. If they start dropping the following terms: ‘Stanford’,’ Coursera’, ‘Andy Ng’ along with the rest of the standard Bigdata buzzwords,start to get worried.

However, if they are talking about Stanford and Prof Ng while talking about Reinforcement learning, that is a whole different story. Why? Helicopters of course! How cool is that?!

Sure, Ng has jumped off into deep learning, but his group at Stanford has done a ton of stuff on applied RL. Take a look at his CS229 classes that cover RL.

**Context is King**

One of the many benefits to living in New York, is that there is a great community of folks involved in Machine Learning. One of them is John Langford. John has both helped build some large scale multi-armed bandit systems at Yahoo! (and I assume now at Microsoft) as well help push work on state of the art algorithms. Take a look at his blog Hunch.net. Also, take a peek at John’s presentation with Alina Beygelzimer on Contextual Bandits (I make a cameo as the annoying question guy).

Also, if you want to play around with writing your own stuff go get John Myles White’s book

If you like the RL/Bandit stuff, but just want to apply it to your online app, please feel free play around with Conductrics – you can get a free account!

## **Group and Mix – Clustering Stuff **

What is clustering anyway and what is a good clustering algorithm? Take a look at Shai Ben-David’s talk on the theory of clustering to find out. Surprisingly, there really isn’t a lot of theory around the general problem, but Shai is trying to fix that.

**The Magic of Expectation Maximization **

If you just want to know how to ‘do’ clustering you prob should have some idea of the EM algorithm. This video from Joaquin Quiñonero Candela covers the EM algorithm for clustering (K-Means, Gaussian Mixture Models). I couldn’t pass it up since he starts with the Supervised, Unsupervised, and Reinforcement learning set up, and then as expected, blows off RL 🙂 Ha!

** Information and a Brief Diversion: Kulback and Liebier **

How do we measure if things are different? Well, one way, is via the KL-divergence. The KL-D comes up all of the time, which is not that surprising since it is the measure of information gain. Also, its a great to drop this in conversations when you want to try to impress folks – hey, sometimes you just want to fit in with the data science gang. Anyway, this little guy is super important and will help form your mental bridge between Stats, Machine Learning, and Information theory. Just remember that it is a measure, not a metric, and you will be fine.

For a bit more on information theory, take a look at Mr Information theory himself, David MacKay (and check out those shorts he is rocking!).

Also, after you get into information theory a bit (pun unintended) ,maybe revisit the idea of BIGDATA in your mind, while thinking about entropy and Kolmogorov complexity. Does it make sense – eh, I am not sure.

**The Russians thought of it first – Convexity and Boyd **

You think your fancy machine learning or data science method is so brand spanking new that VCs

will shower money on you to get to market? Let Stephen Boyd disabuse you of your pretensions. While not exactly a course on Data Science, you will be surprised how much that he covers in this class . Boyd walks you through lots of your favorite ML algos from a mathematical programing perspective (Logistic regression, SVMs, etc.) This different approach can be really helpful to highlight some idea or concept that may have slipped by you. For example, we mucked around with the Schur complement in classes I have taken; I never understood that it was conditional variance for multivariate normal until reviewing his class. I think you will find little connections like that happening all of the time with his class. He also does a nice job of walking you through different loss functions (hinge, quadratic, Huber). As a bonus, he is hilarious – I often would chuckle to myself while watching his lectures on my subway rides. No way anyone ever caught on that I was watching Convex Optimization.

Also, you learn that the Russians did it all first.

*** Update ***

I reached out to Stephen Boyd and he suggested this link . It has his book online as well as links to the courses. Also, in his words ‘a particularly fun source of stuff, that not many people know about, is the additional exercises for the course, available on the book web site. this is dynamically updated (at least every quarter) and contains tons of exercises, many of them practical, on convex optimization and applications.’ To be honest, these exercises are a bit (well, maybe more than a bit) beyond my capacity, but have at it.

***

Please feel free to comment and sign up for Conductrics – start rolling out your own targeted experiments today!