Video Tutorial: Getting Started with Conductrics Web Actions

Conductrics is all about enhancing your website or other app by showing the best content variations for each visitor. There are basically two ways to use the service:

  • As a developer, using our API and wrappers
  • Conductrics Web Actions, which is our way to use the system without coding.

This video tutorial  focuses on the second option.

You’ll see how Web Actions makes it easy to:

  • Create the content variations you want to try out on your pages, such as showing some portions of your page to some visitors and hiding them for other visitors. You can also do things like change headline text, swap images, insert new content, redirect some visitors to alternate landing pages, and so on.
  • Set up reward triggers for your conversion events, so Conductrics can learn and report on which variations work best for which visitors. 
  • Target the variations to certain visitors based on geography, time, or your own custom data by setting up Content Targeting Rules. 

If you want to try Web Actions out for yourself, get access by signing up at conductrics.com and check out how it works on your own pages. Thanks for watching!

 

Posted in Uncategorized | Tagged , , , , | Leave a comment


AB Testing: When Tests Collide

Normally, when we talk about AB Tests (standard or Bandit style), we tend to focus on things like the different test options, the reporting, the significance levels, etc.  However, once we start implementing tests, especially at scale, it becomes clear that we need a way to manage how we assign users to each test.  There are two main situations where we need to manage users:

1)      Targeted Tests – we have a single test that is appropriate only for some of users.

2)      Colliding Tests – we have multiple separate tests running, that could potentially affect each other’s outcomes.

The Targeted Test

The most obvious reason for managing a test audience, is that some tests may not be appropriate for all users. For example, you might want to include only US visitors for an Offer test that is based on a US only product.  That is pretty simple to do with Conductrics. You just set up a rule that filters visitors from the specified test (or tests) if they do not have the US-Visit feature. The set up in the UI looks like this:

Offer Test Exclude Non US Visitors

What this is telling Conductrics is that for the Offertest, if the user is not from the US, then do not put them into the test and serve them the default option.  Keep in mind that only US visitors will be eligible for the test, and will be the only users who will show up in the reporting.  If you really just want to report the test results for different types of users; you just run the test normally and include the targeting features you want to report over.

Colliding Tests

Unless you are just doing a few one off tests, you probably have situations where you have multiple tests running at the same time. Depending on the specific situation you will need to make some decisions about how to control for how and when users can be exposed to a particular test. 

We can use the same basic approach we used for US visitors, to establish some flow control over how users are assigned to multiple concurrent tests.

For example, perhaps the UX team wants to run a layout test, which will affect look and feel of every page on the site.  The hypothesis is that the new layout will make the user experience more compelling and lead to increased sales conversion on the site.

At the same time, the merchandising team wants to run an offer test on just a particular product section of the site. The merchandising team thinks that their new product offer will really incentivize users to buy and will increase sales in this particular product category.  

Strategy One: Separate Tests

The most common, and easiest, strategy is to just assume that the different tests don’t really impact one another, and just run each test without bothering to account for the other test.  In reality, this is often fine in many cases, especially if there is limited overlap.

Strategy Two: The Multivariate Test.

We could combine both tests into one multivatiate test, with two decisions: Layout; Offer.  This could work, but,  once you start to think about it, maybe not the best way to go. For one, the test really only makes sense as a multivariate test if the user comes to the product section. Otherwise, they are never exposed to the product offer component of the test. Also, it assumes that both the UX and Merchandising teams plan to running  each test for the same amount of time. What to do if the UX team was only going to run the layout test for a week, but the merchandizing team was planning to run the Offer test for two weeks?

Strategy Three: Mutually Exclusive Tests

Rather then trying to force what are really two conceptually different tests into one multivariate test, we can instead run two mutually exclusive tests.  There are several ways to set this up in Conductrics.  As an example, here is one simple way to make sure users are assigned to just one of our tests.

Since the Layout test touches every page on the site, lets deal with that first. A simple approach to keep some visitors free for the offer test is to randomly assign a certain percentage of users first to the layout test.  We can do that by setting up the following filter:

LayoutTest_50pct

This rule will randomly assign 50% of the site’s visitors into the Layout test. The other 50% will not be assigned to the test (the % can be customized).  We now just need to set up a filter for the Offer test, that excludes visitors that have been placed into the Layout test.

Offertest_excludeLayout

 This rule just says, exclude visitors who are in the layout test from the Offer test.  That’s it! Now you will be able to read your results without having to worry if another test is influencing the results.

What is neat, is that by combining these assignment rules you can customize you testing experience almost any way you can think of. 

Strategy Four: Multiple Decision Point Agents 

These eligibility rules make a lot of sense for when we are running what are essentially separate tests. However, if we have a set of related tests – that are all working toward the same set of goals, we can instead use a multiple decision point agent.  With multi-point agents, Conductrics will keep track of both user conversions and when the user goes from one test to another.  Multi-point agents are where you can take advantage of Conductrics integrated conversion attribution algorithms to solve these more complex joint optimization problems. We will cover the Multi-Point agents separately and in detail in an upcoming post.

Thanks for reading and we look forward to hearing from you in the comments.

 

Posted in Analytics, Reporting, Testing and Data Science | Leave a comment


Architecture Idea: Passive Models

Passive Models

This is an idea for scaling out certain data when transitioning to a highly clustered architecture.

TL;DR Don’t just read, mostly subscribe.

Ideally suited for data that is read often, but rarely written; the higher the read:write ratio, the more you gain from this technique.

This tends to happen to some types of data when growing up into a cluster, even if you have data that has a 2:1 ratio for a single server (a very small margin in this context, meaning it is read twice for every time it is written), when you scale it up, you often don’t get a 4:2 ratio, instead you get 4:1 because one of the two writes end up being redundant (that is, if you can publish knowledge of the change fast enough that other edges don’t make the same change).

With many workloads, such as configuration data, you are quickly scaling at cN:1 with very large c [number of requests served between configuration changes], meaning that real-world e-commerce systems are doing billions of wasted reads of data that hasn’t changed.  Nearly all modern data stores can do reads this like incredibly fast, but they still cost something, produce no value, and compete for resources with requests that really do need to read information that has changed.  For configuration data on a large-scale site, c can easily be in the millions.

So, this is an attempt to reign in this cN:1 scaling and constrain it to N:1; one read per node per write, so a 32-server cluster would be 32:1 in the worst-case, instead of millions to one.

Pairing a Store with a Hub

defn: Hub – any library or service that provides a publish/subscribe API.

defn: Store – any lib/service that provides a CRUD API.

Clients use the Store’s CRUD as any ORM would, and aggressively cache the responses in memory. When a Client makes a change to data on the Store, they simultaneously publish alerts through the Hub to all other Clients. Clients use these messages to invalidate their internal caches. The next time that resource is requested, it’s newly updated version is fetched from the Store.

Since the messages broadcast through the Hub do not cause immediate reads, this allows bursts of writes to coalesce and not cause a corresponding spike in reads, but rather the read load experienced after a change is always the same, and based on the data’s usage pattern and how you spread traffic around your cluster.

To stick with the example of configuration data, let’s suppose the usage pattern is to read the configuration on every request, with a cluster of web servers load balanced by a round-robin rule. Suppose an administrative application changes and commits the configuration, it also invalidates the cached configuration on each web server through the Hub. Each subsequent request as the round-robin proceeds around the cluster will fetch an updated configuration directly from the Store. Load balancing rules that re-use servers, such as lowest-load, can have even higher cache efficiency.

From the perspective of the code using the Client, the writes made by others just seem to take a little bit longer to fully commit, and in exchange we never ask the database for anything until we know it has new information.

Further Work

The Store layer requires aggressive caching, which requires that you constrain the CRUD to things where you can hash and cache effectively. Map/reduce is not allowed, etc., it really is best for an ORM-like scenario, where you have discrete documents, and use summary documents more than complicated queries.

Posted in Uncategorized | Leave a comment


Improving the Promises API

 The Promises API seems to be everywhere these days, and it really is great at solving one of JavaScript’s weaknesses: complex dependencies between asynchronous code.

For those new to promises, their most basic form is a queue of functions waiting for some asynchronous operation to complete.  When the operation is complete, it’s result is fed to all waiting functions.

TL;DR The core API of a Promise object should be:

.wait(cb)       # cb gets (err, result) later
.finish(result) # cb will get (undefined, result)
.fail(err)      # cb will get (err, undefined)

The Promises API’s true value comes from lifting some control out of the compiler’s hands, and into the hands of our runtime code using such a simple structure.  Now, rather than the syntax of the source code being the only description of the relationship between pieces of code (e.g. a callback pyramid), now we have a simple API for storing and manipulating these relationships.

In the widely used Promises/A , the API method .then() establishes such a relationship, but fails in a number of ways for me.

The word ‘then’ is given a second meaning, already being used in “if this then that”. If not literally in your language (CoffeeScript), then in your internal dialogue when you are reading and writing conditional expressions of all kinds, such as this sentence.

Also, ‘then’ is a very abstract word, becoming any one of three different parts of speech depending on how you use it.  Good API methods should be simple verbs unless there is a really good reason.

I find that people who are new to Promises take a long time to see their value, and this overloading of an already abstract word, as it’s core method, is part of the problem.

So let’s imagine a better API, for fun, made of simple verbs that tell you exactly what is happening.

Q: What is the core service that the Promise API should provide? 

A: To cause some code to wait for other code to either finish or fail.

I suggest that wait is the most accurate verb for the action here, and communicates immediately why I would want to use promises… because I need some code to wait for the promise to finish.

Using ‘then‘ values the lyricism of the resulting code over it’s actual clarity, making it just a bit too clever.

Extensions to the API:

Many libraries add extensions for basic language statements, like assignment, delete, etc., but so far in my opinion this is just adding a function call and not really gaining anything, since these operations are never asynchronous.  In practical usage of promises to solve every day tasks, I would suggest some more pragmatic extensions based on common but difficult promises to make.

“I promise to [asynchronously] touch all the files” is an example of a hard promise to make currently, when each touch is asynchronous you don’t know which file is the last, or when they are all complete. What you need are incremental promises.

promise.progress(current, [maximum]) # emits 'progress' events
promise.finish(delta)                # calls .progress(current+delta)

“I promise to recurse over all directories”, is extra hard because you don’t even know the size of the goal at the start, and must update that knowledge recursively.

# only finish() this promise after promise b has finished
promise.include(promise_b)

This enables you to create promises that are both recursive and incremental, which lets you create a tree of promises to represent any workflow, without leaking knowledge to (or requiring it of) the waiting code.

I think the current Promises API has sliced the problem-space exactly right, but I think there are some pragmatic design choices one could make to get a better API at the end of the day.

 

Posted in Uncategorized | Leave a comment


The World’s Top 7 Data Scientists before there was Data Science

I am often a bit late to the party and only recently saw Tim O’Reilly’s “The Worlds’ 7 most powerful Data Scientists”. As data science has become a big deal, there have been a several top data science lists that have been floating around.

So for fun, I thought I would put together my own list of the top data scientists before there was data science.  The people listed here helped unearth key principles on how to extract information from data.  While obviously important, I didn’t want to include folks whose contribution was mostly on the development of some particular approach, method, or technology.  

To a large degree, the people on this list helped lay the foundation for a lot of what currently goes on as data science.  By studying what these guys* worked on, I think you can deepen the foundation of upon which your data science skills rest.  As a disclaimer, there are obviously way more than seven who made major contributions, but I wanted to riff on Tim’s piece, so seven it is.

So without further ado, on to the list:

1 Claude Shannon 
I can’t imagine anyone arguing with putting C. Shannon on the list. Claude is often referred to as the father of information theory– which from my vantage point is Data Science, considering that information theory underpins almost all ML algorithms.  Claude Shannon came up with his groundbreaking work while at Bell labs (as an aside, this is also where Vapnik and Guyon worked when they came out with their ’92 paper on using the Kernel trick for SVMs – although interestingly, they didn’t use the term support vector machine. )
For a quick overview of Claude Shannon take a look here
And for his 1948 paper A Mathematical Theory of Communication go here

2. John Tukey
Tukey is hero to all of the data explorers in the field, the folks who are looking for the relationships and stories that might be found in the data. He literally wrote the book on Exploratory Data Analysis . I guess you can see his work as the jumping off point for the Big Data gang. Oh yeah, he also came up with a little something called the Fast Fourier Transformation (FFT).

3 Andrey Kolmogorov
A real Andrey the Giant, maybe not in the order of an Euler, but this guy had breadth for sure. He gets on the list for coming up with Algorithmic Complexity theory. What’s that? It’s just the use of Shannon’s information theory to describe the complexity of algorithms in computer science. For a CS layman’s read (me), I recommend Gregory Chaitin’s book, Meta Math.  For what its worth, I’d argue that a life well lived, is one that maximizes its Kolmogorov complexity.

4) Andrey Markov
Our second Andrey on the list, I had to give Markov the nod since we make heavy use of him here at Conductrics. Sequences of events (language, clicks, purchases, etc.) can be modeled as stochastic processes.  Markov described a class of stochastic process that is super useful for simply, but effectively modeling things like language, or attribution.  There are many companies and experts out there going on about attribution analysis, or braying about their simplistic AB testing tools, but if they aren’t at least thinking Markov, they probably don’t really know how to solve these problems.  The reality is, if you want to solve decision problems algorithmically, by optimizing over sequences of events, then you are likely going to invoke the Markov property (conditional independence) via Markov Chains or Decision Processes (MDP). See our post on Data Science for more on this.

5 Thomas Bayes
I think it is fair to say that Data Science tends to favor, or is at least open to, Bayesian methods.  While modern Bayesian statistics is much richer than a mere application of Bayes’ theorem, we can attribute at least some of its development back to Bayes.  To get a hang of Bayes’ theorem, I suggest playing around with the chain rule of probability to derive it yourself.
For having a major branch of statistics named after him and for being a fellow alum of the University of Edinburgh, Bayes is on the list. By the way, if you want to learn more about assumptions and interpretations of Bayesian methods check out our Data Science post for Michael Jordan’s lectures.

6 Solomon Kullback and Richard Leibler
Maybe not as big as some of the other folks on the list, so they have to share a place, but come on, the Kullback-Leiber Divergence (KL-D)?! That has got to be worth a place here. Mentioned in our post on Data Science resources, the KL-D is basically a measure of information gain (or loss). This turns out to be an important measure in almost every single machine learning algorithm you are bound to wind up using. Seriously, take a peek at the derivation of your favorite algorithms and you are likely to see the KL-D in there.

7 Edward Tufte
I used to work at an advertising agency back in the ‘90s, and while normally the ‘creatives’ would ignore us data folks (this was back before data was cool), one could often get a conversation going with some of the more forward thinking by name checking Tufte.  I even went to one of Tufte’s workshops during that time, where he was promoting his second book, Envisioning Information. There was a guest magician that did a little magic show as part of the presentation.  A minor irritation is the guru/follower vibe you can get from some people when they talk about him.  Anyway, don’t let that put you off since Tufte spends quality ink to inform you how to optimize the information contained in your ink.

As I mentioned at the beginning, this list is incomplete. I think a strong argument for Alan Turing , Ada Lovelace, Ronald Fisher can be made.  I debated putting Gauss in here, but for some reason, he seems just too big to be labeled a data scientist. Please suggest your favorite data scientist before there was data science in the comments below. 

*yeah, its all men – please call out the women that I have missed.

Posted in Testing and Data Science, Uncategorized | 10 Comments


Installing the Conductrics Web Actions Plugin for WordPress

About the Plugin

The Conductrics Web Actions Plugin for WordPress includes Conductrics Web Actions scripts in your pages, which makes it easy to test changes to your pages, track their success, and do dynamic targeting.

Installation

To get started:

  1. Initial Installation. In the WordPress Admin, go to Plugins > Add New, then search for “Conductrics Web Actions” and click Install Now to install the plugin.
  2. Activate Plugin. The “Conductrics Web Actions” plugin should now be listed under Plugins > Installed Plugins in your WordPress admin. Click the Activate link to enable it.
  3. Provide Conductrics Account Keys. Click the “Settings” link for the Conductrics Web Actions plugin in the list of Installed Plugins. Alternatively, you can also get to the setttings page via Settings > Conductrics Actions in the WordPress admin. Click on the Conductrics Account tab, then copy and paste your API keys from the Account > Keys and Password page from the Conductrics Console. Make sure to save your changes when done. If you don’t have a Conductrics account yet, just go to conductrics.com to get a free account to play around with.
  4. Enable Web Actions. Still in the settings page for the plugin, click the Global Web Actions tab, check the Enable Web Actions checkbox, and save your changes.

The plugin is very simple. Its purpose is to make it easy to use the Web Actions feature provided by the Conductrics service. Rather than having to paste code snippets into your pages and posts, you just use the simple UI provided by the plugin, right from the WordPress admin.

Setting up an A/B Test in a Page or Post:

Now that you’ve got the plugin installed, here’s how to conduct a simple test:

  1. Go to the “Edit” page for the page or post as you would normally.
  2. You should see a Conductrics Web Actions area in the right sidebar. It might be toward the bottom of the page. If you want, you can grab the area by its title bar and drag it up under the Publish area, but that’s up to you.
  3. From the “Add” dropdown, choose “add new agent”. (You may be prompted to log into your Conductrics Account at this point, which you should only have to do once.)
  4. Click the Create Agent button to create your new Conductrics Agent (“agent” is just our term for an A/B testing project).
  5. Now you can set up what you want your test to actually change (perhaps showing or hiding an aspect of your page or theme).

You can learn more about what you can do with Web Actions in our documentation. You’ll notice that you completed the first step (“Creating an Agent”) already during the steps shown above.

Frequently Asked Questions

  • What kinds of tests can I perform with Web Actions? You can learn more about what you can do with Web Actions at http://console.conductrics.com/docs/demo/web-actions
  • How do I get a Conductrics Account? If you don’t have a Conductrics account yet, just go to http://www.condutrics.com to get a free account to play around with.
  • Who can I contact if I need help? Go to http://conductrics.com/contact/ with your question, we try to answer questions right away. We are usually available via the online chat window at the bottom of that page.

 

Global Plugin Settings

Global Plugin Settings

Creating a new agent

Creating a new agent

Setting up a test

Setting up a test

Convenient reporting

Convenient reporting

 

Posted in Uncategorized | Leave a comment


Big Data or Big Distraction

Contrary to what you have heard, the unfolding technological transformation we are witnessing isn’t really about data, not directly at any rate. It’s not that data isn’t important, but the focus on data is obscuring the real nature of change, which is the transition from a world driven by essentially static and reactive systems to one driven by hyper-localized, adaptive control systems.

.

These controllers are already in our cars, homes, and offices, and will be in our clothing, our parks; literally woven into the fabric of our physical environment. The future will not be defined by how much data is collected, but by the complexity and responsiveness of our localized environments.

Data sounds nicer than control

Unfortunately, control or control systems aren’t commonly used terms/ideas, even in many of the applied data fields (Marketing, that’s you I am talking about), but they really should be. So what is control and why is it important? Control is a process of making decisions, and accepting feedback, in order to achieve some objective. In other words, it is something that senses and acts, it isn’t inert like data.

Thermostat

Let’s use simple example of a common controller – your basic thermostat. Your thermostat’s objective is to maintain a certain temperature in a room, or your house. It does this, in the simplest case, by checking the temperature of the room (this is data collection) and then based on its reading, will Heat, Cool, or do Nothing.

The rules that govern how the controller behave are called the control logic. In simple cases, like our thermostat, the control logic can be easily written out by a human. However, more advanced applications, like autonomous driving cars, are so complex that we will often need to learn much of the control logic from data, rather than have it directly programmed by people.

Why write it when the machine can learn it?

This is where data plays one of its major roles, in helping to learn the control logic. By employing machine learning (see our data science posts here and here) , we can learn the basic logic required for a particular controller. We can then hone and optimize the efficacy of the controller by embedding addition systems for updating the controller’s logic after it has been deployed – these adaptive systems use the current data from the system’s environment in order to continuously update and improve upon the control logic.

Big Data is afraid of its shadow prices

Folks who are excited about Big Data should start to think less about data per se, and more on how data will drive how we go about 1) creating more powerful controller logic and; 2) improving precision by enabling control systems access to more precise and higher dimensional data.

By framing data in terms of the control problem, naturally leads to real data questions, like, what if I didn’t have this bit of data, how much less effective would the system be? In other words, you can start to think about the marginal value of each new bit of data, so that you can move toward having an optimal volume and precision of data with respect to your goals and objectives.

Pearls of Wisdom or ‘Correlation isn’t Causation’

While true, you often hear  “Correlation isn’t Causation” often proudly exclaimed without any real followup about what that really means. By taking a control perspective,  we can begin to get a little clarity on how to differentiate data that provides correlations and data that provides causation relationships. 

Data that is passively gathered will tend to give you correlations.  The data that you gather from your controller’s actions, however,  will give you causal relationships, at least with respect to the actions that the controller takes. In fact, you can think of AB Testing as employing a type of dumb controller, one that that takes random actions. If you want to learn a bit more about the topic from an actual expert take a look at Judea Pearl’s work (opens a Pdf).

Data is Lazy, and leads to lazy thinking.

Here is the thing, data is passive. That makes it easy to collect and talk about. Integrating it into a working system or process is the hard part. Control, by definition, is active, and that makes it hard, because you have to now think about how the entire system is going to respond to each control action. That is probably one of the main reasons there is so much attention on data, you get to dodge the hard, but ultimately most valuable questions.

The House that Big Data Built

Hoarding

A couple years ago I was at a presentation given by a VC firm that invests in Big Data companies. After the presentation, I asked the senior partner how they balance the cost of collecting and storing data with its potential benefits. His take was that you should collect and store everything because you don’t know what might be valuable and when you might need it. I thought that was a bad answer. That is basically a hope strategy: “I hope something is in there that is worthwhile”. Hope is a bad strategy. It also suggests undirected behavior; just collect everything indiscriminately. The indiscriminate collecting of things because something might be valuable later is called hoarding, and in any other context it’s considered pathological.

Posted in Uncategorized | Tagged , | 2 Comments


Intelligent Agents: AB Testing, User Targeting, and Predictive Analytics

Whether you are in marketing, web analytics, data science, or even building a Lean Startup, you probably are on board with the importance of analytical decision-making.  Go to any related conference, blog, meet up and you will hear at least one of the following terms: Optimization, AB & Multivariate Testing, Behavioral Targeting, Attribution, Predictive Analytics, LTV … the list just keeps growing.  There are so many terms, techniques, and next big things that it is no surprise that things start to get a little confusing.

If you have taken a look at the Conductrics API, or our UI (if you haven’t please signup for a free account ), you may have noticed that we use the term agent to describe our learning projects.

Why use an Agent?  Because amazingly, Optimization, AB & Multivariate Testing, Behavioral Targeting, Attribution, Predictive Analytics, LTV … can all be recast as components of a simple, yet powerful framework borrowed from the field of Artificial Intelligence, the intelligent agent.

Of course we can’t take credit for intelligent agents.  The IA approach is used as the guiding principle in Russell and Norvig’s excellent AI text Artificial Intelligence: A Modern Approach – it’s an awesome book, and I recommend anyone who wants to learn more to go get a copy or check out their online AI course.

I’m in Marketing, why should I care about any of this?

Well, personally, I have found that by thinking about analytics problems as intelligent agents, I am able to instantly see how each of the concepts listed above are related and apply them most effectively individually or in concert.  Intelligent Agents are a great way to organize your analytics tool box, letting you grab the right tool at the right time. Additionally, since the conceptual focus of an agent is to figure out what action to take, the approach is goal/action rather than data collection/reporting oriented.

The Intelligent Agent

So what is an intelligent agent?  You can think of an agent as being an autonomous entity, like a robot, that takes actions in an environment in order to achieve some sort of goal. If that sounds simple, it is, but don’t let that fool you into thinking that it is not very powerful.

Example: Roomba

Roomba

An example of an agent is the Roomba – a robot for vacuuming floors. The Roombas environment is the room/floor it is trying to clean. It wants to clean the floor as quickly as possible.  Since it doesn’t come with an internal map of your room, it needs to use sensors to observe bits of information about the room that it can use to build an internal model of the room.  To do this it takes some time at first to learn the outline of the room in order to figure out the most efficient way to clean.

The Roomba learning the best path to clean a room is similar, at least conceptually, to your marketing application trying to find the best approach to convert your visitors on your site’s or app’s goals.

The Basics

Lets take a look at a basic components of the intelligent agent and its environment, and walk through the major elements.

First off, we have both the agent, on the left, and its environment, on the right hand side. You can think of the environment as where the agent ‘lives’ and goes about its business of trying to achieve its goals. The Roomba lives in your room.  Your web app lives in the environment that is made up of your users.

What are Goals and Rewards?
The goals are what the agent wants to achieve, what it is striving to do.  Often, agents are set up so that the goals have a value.

When the agent achieves a goal, it gets a reward based on the value of the goal.  So if the goal of the agent is to increase online sales, the reward might be the value of the sale.

Given that the agent has a set of goals and allowable actions, the agent’s task is to learn what actions to take given its observations of the environment – so what it ‘sees’, ‘hears’, ‘feels’, etc.  Assuming the agent is trying to maximize the total value of its goals over time, then it needs to select the action that maximizes this value, based on its observations.

So how does the agent determine how to act based on what it observes? The agent accomplishes this by taking the following basic steps:

  1. Observe the environment to determine its current situation. You can think of this as data collection.
  2. Refer to its internal model of the environment to select an action from the collection of allowable actions.
  3. Take an action.
  4. Observe of the environment again to determine its new situation. So, another round of data collection.
  5. Evaluate the ‘goodness’ of its new situation – did it reach a goal, if not, does it seem closer or further away from reaching a goal then before it took the past action.
  6. Update its internal model on how taking that action ‘moved’ it in the environment and if it helped it get or get closer to a goal. This is the learning step.

By repeating this process, the agent’s internal model of how the environment responses to each action continuously improves and better approximates each actions actual impact.

This is exactly how Conductrics works behind the scenes to go about optimizing your applications. The Conductrics agent ‘observes’ it world by receiving API calls from your application – so information about location, referrer etc.

In a similar vein, the Conductrics agent takes actions by returning information back to your application, with instructions about what the application should with the user.

When the user converts on one of the goals, a separate call is made back to the Conductrics server with the goal information, which is then used to update the internal models.

Over time, Conductrics learns, and applies, the best course of action for each visitor to your application.

Learning and Control

The intelligent agent has two interrelated tasks – to learn and to control. In fact, all online testing and behavioral targeting tools can be thought of as being composed of these two primary components, a learning/analysis component and a controller component. The controller makes decisions about what actions the application is to take. The learner’s task is to make predictions on how the environment will respond to the controller’s actions.  Ah, but we have a bit of a problem.  The agent’s main objective is to get as much reward as possible. However, in order to do that, it needs to figure out what action to take in each environmental situation.

Explore vs. Exploit

Lets Make a Deal image

The intelligent agent will need to try out each of the possible actions in order to determine the optimal solution. Of course, to achieve the greatest overall success, poorly performing actions should be taken as infrequently as possible. This leads to an inherent tension between the desire to select the high value action against the need to try seemingly sub-optimal but under explored actions. This tension is often referred to as the “Explore vs. Exploit” trade-off and is a part of optimizing in uncertain environments. Really, what this is getting at is that there are Opportunity Costs to Learn (OCL).

To provide some context for the explore/exploit trade-off consider the standard A/B approach to optimization. The application runs the A/B test by first randomly exposing different users to the A/B treatments. This initial period, where the application is gathering information about each treatment, can be thought of as the exploration period. Then, after some statistical threshold has been reached, one treatment is declared the ‘winner’ and is thus selected to be part of the default user experience. This is the exploit period, since the application is exploiting its learning’s in order to provide the optimal user experience.

AB/Multivariate Testing Agent

In the case of AB Testing both the learning and controller components are fairly unsophisticated. The way the controller selects the actions is to just pick one of them at random.  If you are doing a standard AB style test then the controller picks from a uniform distribution – all actions have an equal chance of selection.

The learning component is essentially just a report or set of reports, perhaps calculating significance tests.  Often there is no direct communication from the learning module to the controller. In order to take advantage of the learning, a human analyst is required to review the reporting, and then based on results, make adjustments to the controller’s action selection policy.  Usually this means that the analyst will select one of the test options the ‘winner’, and remove the rest from consideration. So AB Testing can be thought of as a method for the agent to determine the value of each action.

I just quickly want to point out, however, that the AB Testing with analyst approach is not the only way to go about determining and selecting best actions.  There are alternative approaches that try to balance in real-time the learning (exploration) and optimization (exploitation).  They are often referred to as adaptive learning and control. For adaptive solutions, the controller is made ‘aware’ of the learner and is able to autonomously make decisions based on the most recent ‘beliefs’ about the effectiveness of each action. This approach requires that the information stored in the learner is made accessible to the controller component. We will see a bit of this when we look at Multi-armed Bandits in an upcoming post.

Targeting Agents

Maybe you call it targeting, or segmentation, or personalization, but whatever you call it, the idea is different folks get different experiences.  In the intelligent agent framework, targeting is really just about specifying the environment that the agent lives in.

Let’s revisit the AB Testing agent, but we add some user segments to it.

You can see the segmented agent differs in that its environment is a bit more complex. Unlike before, where the AB Test agent just needed to be aware of the conversions (reward) after taking an action, it now also needs to ‘see’ what type of user segment it is as well.

 

Targeting or Testing? It is the Wrong Question

 

Notice that with the addition of segment based targeting, we still need to have some method of determining what actions to take.   So targeting isn’t an alternative to testing, or vice versa. Targeting is just when you use a more complex environment for your optimization problem.  You still need to evaluate and select the action.  In simpler targeting environments, it might make sense to use the AB Testing approach as we did above.  Regardless, Targeting and Testing shouldn’t be confused as competing approaches –they are really just different parts of a more general problem.

 

Ah, well you may say, ‘hey that is just AB Testing with Segments, not behavioral targeting. Real targeting uses fancy math –  it is a totally different thing.’  Actually, not really.  Lets look at another targeting agent, but this time instead of a few user segments, we have a bunch of user features.

Now the environment is made up of many individual bits of information, such that there could be millions or even billions of possible unique combinations. Hmm, it is going to get a little tricky to try to run your standard AB style test here. Too many possible micro segments to just enumerate them all in a big table, and even if you did, you wouldn’t have enough data to learn since most of the combinations would have 1 user at most.

 

That isn’t too much of a problem actually, because rather than setting up a big table, we can use approximating functions to represent  the map between observed features to the value of each action

 

Predictive Analytics: Mapping Observed User Features to Actions

Not only does the use of predictive models reduce the size of the internal representation, but it also allows us to generalized to observations that the agent has not come across before. Also we are free to pick whatever functions, models etc. we want here.  How we go about selecting and calculating these relationships is often in the domain of Predictive Analytics.

Ah, but we still have to figure out how to select the best action. The exploration/exploitation tradeoff hasn’t gone away. If we didn’t care about the opportunity costs to learn, then we could try all the actions out randomly for a time, train our models and then switch off the learning and apply the models.  Of course there is a cost to learn, which is why Google, Yahoo! and other Ad targeting platforms, spend quite a bit of time and resources trying to come up with sophisticated ways to learn as fast as possible.

Summary

Many online learning problems can be reformulated as an intelligent agent problem.

Optimization –  is the discovery of best action for each observation of the environment in the least amount of time. In other words, optimization should take into account the opportunity cost to learn.

Testing – either AB or Multivariate,  is just is one way, of many, to learn the value of taking each action in a given environment.

Targeting – is really just specifying the agent’s environment. Efficient targeting provides the agent with just enough detail so that the agent can select the best actions for each situation is finds itself in.

Predictive Analytics – covers how to specify which internal models to use and how to best establish the mapping between the agent’s observations, and the actions. This allows the agent to predict what the outcome will be for each action it can take.

I didn’t get to talk about attribution and LTV.  I will save that for another post since this post is already long,  but in a nutshell, you just need to extend the agent to handle sequential decision processes.

What is neat is that even if you don’t use our Conductrics, intelligent agents are a great framework to arrange your thoughts when solving your online optimization problems.

If you want to learn more please come sign up for a Conductrics account today at Conductrics

Posted in Analytics, Testing and Data Science | Tagged , | Leave a comment


List of Machine Learning and Data Science Resources – Part 2

This is a follow up to a post to the list of Machine Learning and Data Sciences resources I put up a little while ago. This post contains some links to resources on clustering and Reinforcement Learning that I didn’t get to in the first post. Like the first one, it’s a bit haphazard, and is not meant to be definitive. Have fun, and feel free to post comments with your favorites.

Cinderella or the Ugly Stepchild – Reinforcement Learning

Almost every Machine Learning text or tutorial begins with something like ‘Machine Learning can be broken out into three primary tasks; Unsupervised Learning, Supervised learning, and Reinforcement Learning.’ After a quick overview of RL, they then blow off any more discussion of it.

Well, let’s correct that a bit.

What is an RL problem? Well, an RL problem is any problem where you are trying to figure out what the best course of action should be in a particular environment; basically it’s a problem of optimal control. A few examples are learning a robot controller, or learning a computer program to play a game like backgammon, or discovering the optimal ads to place in front of someone’s face when they are browsing the web. All of these problems can be thought of as an RL problem.

Unlike supervised learning, where you have the correct answers to train your learner, in RL problems you only get a reward signal back from the environment – some measure of goodness (or badness) associated with some action or policy you have taken. So with RL problems, you need to interact with an environment, in order to learn. Basically, it’s how we as people learn to navigate our world. We do something, and then see if it was good or not. If it was good, we keep doing it, if it is bad, we try something else.

Some related ideas from control theory are dynamic programing and optimal control.
First take a peek at MDPs from Wiki since MDPs are used as a framework for sequential decision problems.

TD, UMASS and the Pixies

While not cutting edge, you can’t go wrong with grounding yourself with Rich Sutton’s and Andy Barto’s intro text on RL. Here is a link to the free online version
Rich came up with Temporal Difference learning (TD) while he was working on his PhD at UMASS. Speaking of UMASS, check this out ! Yeah! FWIW, I took Don Levine’s Avant Garde Film class back in the day, which allegedly inspired the Pixies’ to write Debaser.





I’m looking at you Stanford.

Occasionally you bump into someone who wants to impress on you their data chops (c’mon, we all do it occasionally). Anyway, maybe they are involved in investing, or they have some startup or something in the valley. If they start dropping the following terms: ‘Stanford’,’ Coursera’, ‘Andy Ng’ along with the rest of the standard Bigdata buzzwords,start to get worried.
However, if they are talking about Stanford and Prof Ng while talking about Reinforcement learning, that is a whole different story. Why? Helicopters of course! How cool is that?!
Sure, Ng has jumped off into deep learning, but his group at Stanford has done a ton of stuff on applied RL. Take a look at his CS229 classes that cover RL.

Context is King

One of the many benefits to living in New York, is that there is a great community of folks involved in Machine Learning. One of them is John Langford. John has both helped build some large scale multi-armed bandit systems at Yahoo! (and I assume now at Microsoft) as well help push work on state of the art algorithms. Take a look at his blog Hunch.net. Also, take a peek at John’s presentation with Alina Beygelzimer on Contextual Bandits (I make a cameo as the annoying question guy).

Also, if you want to play around with writing your own stuff go get John Myles White’s book

If you like the RL/Bandit stuff, but just want to apply it to your online app, please feel free play around with Conductrics – you can get a free account!

Group and Mix – Clustering Stuff

What is clustering anyway and what is a good clustering algorithm? Take a look at Shai Ben-David’s talk on the theory of clustering to find out. Surprisingly, there really isn’t a lot of theory around the general problem, but Shai is trying to fix that.

The Magic of Expectation Maximization

If you just want to know how to ‘do’ clustering you prob should have some idea of the EM algorithm. This video from Joaquin Quiñonero Candela covers the EM algorithm for clustering (K-Means, Gaussian Mixture Models). I couldn’t pass it up since he starts with the Supervised, Unsupervised, and Reinforcement learning set up, and then as expected, blows off RL :) Ha!

Information and a Brief Diversion: Kulback and Liebier

How do we measure if things are different? Well, one way, is via the KL-divergence. The KL-D comes up all of the time, which is not that surprising since it is the measure of information gain. Also, its a great to drop this in conversations when you want to try to impress folks – hey, sometimes you just want to fit in with the data science gang. Anyway, this little guy is super important and will help form your mental bridge between Stats, Machine Learning, and Information theory. Just remember that it is a measure, not a metric, and you will be fine.
For a bit more on information theory, take a look at Mr Information theory himself, David MacKay (and check out those shorts he is rocking!).
Also, after you get into information theory a bit (pun unintended) ,maybe revisit the idea of BIGDATA in your mind, while thinking about entropy and Kolmogorov complexity. Does it make sense – eh, I am not sure.

The Russians thought of it first – Convexity and Boyd

You think your fancy machine learning or data science method is so brand spanking new that VCs
will shower money on you to get to market? Let Stephen Boyd disabuse you of your pretensions. While not exactly a course on Data Science, you will be surprised how much that he covers in this class . Boyd walks you through lots of your favorite ML algos from a mathematical programing perspective (Logistic regression, SVMs, etc.) This different approach can be really helpful to highlight some idea or concept that may have slipped by you. For example, we mucked around with the Schur complement in classes I have taken; I never understood that it was conditional variance for multivariate normal until reviewing his class. I think you will find little connections like that happening all of the time with his class. He also does a nice job of walking you through different loss functions (hinge, quadratic, Huber). As a bonus, he is hilarious – I often would chuckle to myself while watching his lectures on my subway rides. No way anyone ever caught on that I was watching Convex Optimization.
Also, you learn that the Russians did it all first.

*** Update ***
I reached out to Stephen Boyd and he suggested this link . It has his book online as well as links to the courses. Also, in his words ‘a particularly fun source of stuff, that not many people know about, is the additional exercises for the course, available on the book web site. this is dynamically updated (at least every quarter) and contains tons of exercises, many of them practical, on convex optimization and applications.’ To be honest, these exercises are a bit (well, maybe more than a bit) beyond my capacity, but have at it.
***

Please feel free to comment and sign up for Conductrics – start rolling out your own targeted experiments today!

Posted in Testing and Data Science | 1 Comment


A List of Data Science and Machine Learning Resources

Every now and then I get asked for some help or for some pointers on a machine learning/data science topic.  I tend respond with links to resources by folks that I consider to be experts in the topic area.   Over time my list has gotten a little larger so I decided to put it all together in a blog post. Since it is based mostly on the questions I have received, it is by no means complete, or even close to a complete list, but hopefully it will be of some use.  Perhaps I will keep it updated, or even better yet, feel free to comment with anything you think might be of help.

Also, when I think of data science, I tend to focus on Machine Learning rather than the hardware or coding aspects. If you are looking for stuff on Hadoop, or R, or Python, sorry, there really isn’t anything here.

Neo Makes Cheese

Before you do anything else, start boning up on your linear (matrix) algebra. This is the single most important thing you can do to get yourself bootstrapping your ML education.

This is the deal, and I don’t care what anyone else has told you, if you want to have any hope in understanding what is going on in Machine Learning, Data Science, Stats, etc. you have got to get a handle on Linear Algebra. 

Painful! Trust me, I know. I got an ‘F’ the first time I took it in college.  I had no idea what was going on. The only thing I really remember was the professor shouting at us after poor marks on homework and tests, ‘How can you make cheese if you don’t know where milk comes from!? Its plain, common ordinary horse sense!’

Kinda nuts, and it really didn’t make total sense, but his point was, you have got to have the basics down before you can actually make anything useful.

The flip side is that once you start getting a feel for linear algebra, you can much more easily hop around from various advanced topics. This is because much, but not all, of ML topics rest on applications of linear algebra.

Where to start? For me, there really is only one place that I go to get a refresh on a topic that I realize I don’t really understand, and that is Gilbert Strang’s undergrad class at MIT. Just an awesome intro course and it makes me covet the students who get to go to MIT. See his class here Linear Algebra Class

General Machine Learning

Now this resource is a big one – and I think just this link makes this post worth it. As far as I am concerned, www.videolectures.net is one of the most valuable sites on the internet.  Sure, maybe some of the other ‘disruptive’ educational sites are useful, but almost everything Machine Learning is in here and then some – Mother lode, Paydirt, or whatever you want to call it, you just hit it with this. http://videolectures.net/Top/Computer_Science/Machine_Learning/
But don’t horde it, pass this resource along.

Also, a good first lecture on ML is Iain Murray’s tutorial from a machine learning summer school– Here is the lecture Murray Teaches ML

On to some TOPICS

LDA for Topic Models or How I Learned to Pronounce Dirichlet
One, this is useful, Two, data science folks are on about this so if you want to fit in – or hit up any ‘Big Data’ VCs, you better be able to name drop this, and be able to back it up if you get called out – not that the VC will be able to call you out ;). 
David Blei is probably the best source to start looking for Topic modeling research applications
http://www.cs.princeton.edu/~blei/
http://www.cs.princeton.edu/~blei/topicmodeling.html

Matt Hoffman – Matt is over at ADOBE now, but he wrote some python code for online Topic Models (I think this is his research as well)- check it out http://www.cs.princeton.edu/~mdhoffma/

From LDA to SVD to LSI
A non-Bayesian/probabilistic approach to topic modeling is Latent semantic indexing, where you use a version of SVD (actually I think it is really basically Principal components -which is a related eigendecomposition/factorization). There is a wiki here on LSI to get you started
http://en.wikipedia.org/wiki/Latent_semantic_indexing I know it is normally kinda lame to just link to Wiki, but it is a pretty good overview.

SVD or Recommending Everyone’s Favorite Factorization
If you didn’t feel the need to look over Strang’s course, take a look at this class on the SVD , which is the basis for most every recommendation system – I highly recommend getting at least the basic idea down.

Bayesian Or Frequentist Or Who Cares?

Huge topic that I am not going to really try to flesh out, but Michael Jordon (the one at Berkeley, not Chicago) has a nice lecture contrasting the two: Bayesian or Frequentist

Also check out MJ on the Dirichlet Process – not the best audio, but since he also sets up the Parametric/Non-Parametric and Bayesian/Frequentist  classifications on slide two, it is worth bending your ear a bit. MJ DUNKS!

On to Non Parametric Bayesian approaches

There seems to be a bit of chatter in the startup/ Big data space around Non Parametric Bayesian methods.  If you want to see more after checking out MJ’s talk, take a look at David MacKay’s tutorial on the Gaussian Process

Also check out Gaussian Process for Machine learning by Chris Williams and Carl Rasmussen here ‘Gaussian Process for Machine Learning’.  What is great is that this book is online and free! What is not so great is that it is pretty hairy going, but take a peak at chapter 6, it has a comparison of GPs with Support Vector Machines (SVMs).

PMR Madness! Joints and Moralizing

I took Chris Williams’ class on Probabilistic Models and Reasoning   several years back when I was studying AI at Edinburgh.  Take a look at the slides etc. There is stuff on graphical models, junction tree, etc.  Chris is one of those crazy smart guys who I use in my mind’s eye as a litmus test for when someone is trying to pass themselves off as an expert in the space.  Sort of a where are they on the CW scale – most often it’s pretty low ;-)

Also see Carl’s talk on Gaussians

The First Order of Business – Let’s Get Stochastic

Online vs Offline – everyone is all agog about bigdata hadoop etc., but if you are interested in more than hardware/IT, you will want to think about how you are going to use data and what types of systems approaches are most appropriate. If you are going to be doing ML on a lot of data you should be aware of SGD. Leon B is the man for this – here is a video and his home page

Deep Learning or Six Degrees of Geoff Hinton

Here are just a few of those that have been students or post docs at his lab; Yann LeCun (ANNs/Deep Learning), Chris Williams, (GPs), Carl Rasmussen (GPs), Peter Dayan (NeuroScince and TD-Learning), Sam Roweis , and recently Iain Murray (see lecture above).

After the NYTimes had a piece on deep learning there was a fair amount of online chatter about it.  What is it? Let Yann LeCun tell you. Yann is a professor over at NYU, and has worked with Neural Nets for quite some time, using energy models and his convolutional net. Take a look at Yann’s presentation he gave to the Machine Learning summer school I attended back in ’08. http://www.cs.nyu.edu/~yann/talks/lecun-20080905-mlss-deep.pdf

** Update **

Yann suggested the following updated links: 1) A recent invited talk at ICML 2012 and 2) some slides from a more recent summer school IPAM. I have not had the time to take a look, but since Yann suggested these personally, check them out.

**

As an aside, one of the great things about those Pascal machine learning summer schools is that you get to hang with these folks informally. So chat SVMs and feature selection with Isabelle Guyon at dinner, lunch with Rich Sutton, and perhaps talking shop with Yann over a glass of Pineau. If you can make it happen, I highly recommend attending one of these, next one looks to be in Germany.

Also, feel free to peruse Geoff Hinton’s site for some goodness on Autoencoders and RBMs.

NLP, but not LDA

I couldn’t figure out how to conjugate the verb ‘to be’ until I was like 12, so, not surprisingly, I never really got into NLP.

I was, however, fortunate to take a class with Philip Koehn, while I was at Edinburgh, who has helped drive much of the recent work on machine translation – I’ll let him explain it to you here.  You can also get his book if you are interested Statistical Machine Translation, but to be honest, I haven’t read it.

You may have noticed that he used the term informatics. Here is a description/definition of it on the Informatics web site at Edinburgh. I actually think informatics is a better term than Data Science, but hey, out with the old in with the new(ish).

Named Entity – The Subjects and Maximizing Entropy
If you are into NLP you might want to be able to figure out who the players are in text documents.  You will need to do this
http://en.wikipedia.org/wiki/Named-entity_recognition
Maybe this is good – I have never used it – http://nlp.stanford.edu/software/CRF-NER.shtml
If you want to do your own NE model, MAX Ent models have been used – here is a resource with MAXENT for NLP.  I admit, I don’t know what the state of art is, or if this is still used, so feel free to comment with some better/newer stuff.

 Okay, this post is getting long,  so I will wrap it up. I didn’t get to Reinforcement learning or Cluster methods (other than LDA), so perhaps I will extend this post, or write an follow up soon. Please feel free to add thoughts via the comments and if you haven’t yet, please sign up for your free Conductrics account.
Sign Up for a free account!

Posted in Analytics, Testing and Data Science, Uncategorized | 6 Comments


Try Conductrics and get your account in seconds!

Get Access