Posts tagged: statistics

A theory of 'networking' but more of a perspective on market research

the masses.jpgI get a lot of Twitter-followers, based on keywords like, eh, ‘Screenplay’ and ‘Consulting,’ and it annoying to say the least. Why? Because I don’t believe in mass-networking and I will explain why in the rest of this post (there are other reasons, this was just the trigger).

Over the years, I’ve accumulated a lot of experience in market research and I’ve become pretty good at it. From selling myself over the phone in a minute or less, to overcoming gatekeepers, to learning to really listen to people, to arranging personal interviews and transcribing the results, to designing and co-ordinating mass-research campaigns, and analysing them, I feel I’ve seen a lot and I’ve also seen a lot bad practices.

Why does market research get a bad rap? Because a lot of it focusses on getting as many results in as possible. It’s called “statistical significance,” for the newcomers to this field, and it means that getting the answers of 10 people is less subjective than getting the answers of 2. Of course you have to take into account that there are different types of market research: those that lead to clear outcomes and those that lead to lot’s of data that can be analysed and interpreted. Being a practical guy, I much prefer the first as the other kind often feels like a waste of money to me.

How do you design an outcome-focussed research campaign? Pretty simple: you get as close as possible to the outcome and you test it. In tech-world, this would be developing a prototype and testing it. In web-world, this would be something like A/B-marketing, where you design different versions of the same page and test their effectiveness on different samples. Of course, you can conduct plenty of market research before also, but as a favourite lecturer of mine once told me: “how do you research innovation (i.e. something new)? You can’t, because people really don’t know how they feel once the innovation is there.

The other kind of market research, the one that produces a lot of data, is a gold-mine for journalists, analysts, and consultants. They all love to deal with abstractions that can be applied to many different situations. “Research showed that people are getting tired of green advertising. Therefore, we can write an article/report/advice to our clients that green advertising sucks.” The end, pay me.

That is not to say that more results does not provide a more unbiased perspective on a problem, but it’s just not as simple as asking a lot of people the same questions. There are ways around that, such as collecting demographic and psychographic data and I don’t want to cheapen that. I’ve written before about how statistics only matter as much as where your data comes from. But I’ve also written that triangulation is a large part of my research philosophy, which means getting different perspectives on the same issue. Yes, kind of like A/B marketing. So, I do desk-research, I do web-based surveys, I do interviews with consumers and experts. All of which provides me with a more objective view of the solution to a problem.

Networking, and now we come to the gist of it, is also a philosophy with different flavours. One, the Twitter-kind, focusses on buzz-words, reciprocity (I follow you, you follow me), and the masses. Facebook and LinkedIN are more about: so, how do you know this person? Similarly, in real life, business cards are the equivalent of Twitter: “I’m a consultant, here’s my card, can I have yours?” And friendships, both business and personal ones, are the ones that are about: “so, how do we know each other again?”

I have yet to get much value out of the web, so my cynical view on Twitter may be too cynical. I have also, as yet, received fairly little value from business cards, I should mention. I don’t go browsing through them and call people at random, same way I don’t twitter at random. Facebook is my number one web-tool, as I use it as a platform to do other things. Similarly, my friends in real life are an important platform for me also, to discuss ideas and hopefully build on those.

I think there is some kind of parallel between what I feel is effective market-research (many different perspectives, not quantity-, but quality-focussed) and networking (essentially the same). Arguably, my stance on networking may come from my own personal attitudes, I won’t deny it, but also because I believe, from my marketing background, that it just isn’t effective.

But this is just my opinion. What’s yours? Network in mass or Network in class?

Vincent

Challenges of Collaborative Filtering

Previously Vincent wrote about collaborative filtering here on Tech It Easy and made a really good business case on the topic of user-generated content (UGC) versus Expert input. Here, I’ll go a bit more deep into the ways collaborative filtering is done and what are the challenges.

For simplicity, I have divided the ways to filter in two. There’s the Pandora way, where the approach is that a song can be explained by about 150 different genes and recommendations are (in very simple terms) other songs in the neighborhood in that multi-dimensional space. To accurately achieve this, they use expert opinions. Then there’s the Amazon, Last.fm, Netflix et al. way of clustering users with similar histories and recommending what other people in that cluster have liked.

The huge difference in these approaches is best illustrated by the fact that for the Pandora way to work, you don’t actually need any users. The expert’s role in the latter way is to somehow come up with a way to model these clusters accurately.

The latter is much more interesting, because it’s always a challenge to infer anything from user data. The Pandora way’s “only” major challenge is the assumption that people like similar things (ie. how big the searched neighborhood should be).

The other main reason for interest in the Amazon/Netflix way is, of course, money. The $ 1,000,000 Netflix Prize is, simply put, a hunt for a certain RMSE (root mean squared error). When described this way, some interesting questions arise.

 

For the record, I liked Napoleon Dynamite

For the record, I liked Napoleon Dynamite

One question I think is important is what’s the theoretical limit for accuracy in Netflix’s case. In other words, let’s assume that all users at Netflix rate fully rationally all and only the movies they have seen on a cardinal scale. That’s a pretty heavy load of assumptions and I’m pretty sure that’s not all. That’s why even though Netflix could accurately forecast the data it wouldn’t mean it mirrors users’ true preferences. So, what actually is this upper limit on accuracy, or lower limit on RMSE, in Netflix’s case is a good question.

 

For these reasons it shouldn’t be surprising that “just a guy in a garage”, a psychologist employing behavioral decision making assumptions instead of hard rationality, could get so good scores in the Netflix Prize. A pretty good story on that was in Wired a while ago.

For the reasons above, it’s also pretty backwards to think that the problem is fitting the data into the algorithm, so I wouldn’t really call it a “Napoleon Dynamite problem” as NY Times did recently. But do note, that the “Pragmatic Theory” team interviewed in this article, just like “just a guy in a garage”, didn’t actually invent anything new, they just realized to use a method didn’t know or had forgotten about, in this case singular value decomposition. One such method is the Principal Component Analysis, which is available in pretty much any statistical software package available (no, Excel doesn’t count) (and yes, could think Pandora way as something similar to Factor analysis).

One difficulty in Netflix’s case it pretty much boils down to what’s in a number. Remember that in this case the teams work only on user rating data, but they are of course free to add more data from other sources as well. This doesn’t change the fact that the only user data they have are user’s ratings.

As a sidenote, I guess that one reason demographics aren’t used is legal issues. Vince pointed that things like the “Napoleon Dynamite problem” could be solved with more data like demographics and mood. Now, usually more data means just more problems, but let’s forget about that for this discussion.

On this topic, I recently listened to a really interesting lecture about modern consumer analysis by Petri Vasara from Pöyry consulting. They had come up with neat tool, ConsuNaut (PDF) to show what certain segments are doing at what times (comparing to the old “your target audience watches TV x hours day” way) and what was their mood etc. One “press release friendly” finding of this tool is that the Global Rush Hour, or when most of the world’s people are commuting, is at 18-19 Finnish time (UTC+2).

Anyway, back to the topic. What I also see as a problem is the actual “forecasting” part. Now, this doesn’t affect Netflix that much, because I assume that it is in their interest to get customers rent whatever movies, even – using the out-of-fashion term – “long tail”. Even more so if there are inventory costs involved. What happens when a new movie enters the pool? Remember, that for clustering to work, there has to be data, which is pretty sparse for a new movie. How long does it take for new movie’s recommendations to be accurate and how does it affect other recommendations?

In other words, how stable is the solution for the problem? How does seeing the latest James Bond, because everyone goes to see that, change the recommendations to someone who doesn’t like other action movies? Is he recommended Transporter 2? Is fan of Pixar movies offered Disney’s children’s animations, or worse yet, DreamWorks’ animations?

Wall-E

Not Madagascar 2

So, while Netflix way is about fitting data and finding clusters, Pandora bases it assumption on the idea that all music can be labeled accurately and objectively. The main criticism against this approach in my opinion is the post-modern philosophy of subjectiveness. Is there really one truth? (Also, how many genes does it need?)

I was attending a guest lecture by Andrzej P. Wierzbicki on “The Problem of Objective Ranking: Foundations, Approaches and Applications”, where he, for example, discussed the “dangers and errors of the subjectivist reduction of objectivity to power and money”. So he was painting with a broad brush, but there were lots of gems. He also noted that intersubjective rational ranking is difficult and full objectivity is impossible, which should demotivate the Pandora crowd a little.

So, what might at surface look like a statistical challenge is deep down much more cross-disciplinary and it goes all the way to our assumptions of reality. This is why it is important to keep in mind the most important thing, the end of all this – the business angle. It is not Netflix’s or Pandora’s interest to 100% accurately predict anything, they only need to do it well enough. Well, not Netflix’s anyway. The whole reason for improving Cinematch is purely economical, they have found out that people actually rent more if the recommendations are good (enough). There’s a reason they’re offering one million dollars for 10% improvement. I’d love to know how quickly that million pays itself back.

And, really, let’s face it. Most of the collaborative filtering things today are just toys so none of this really matters. There’s a lot of assumptions and approximations and the results are good enough for the purpose. For example, iTunes’ Genius is certainly flawed and limited, but it’s way better than normal random or shuffle play. But if you want to go that extra mile, then you see that the challenge gets exponentially more difficult.

To top it off, in the end there’s the age old problem of optimization, which is that on average, the solutions are “good”, but not “interesting” and definitely not varied. But to add “interestingness” we have to add uncertainty and that’s whole new world of pain (Allais paradox being the least)… but risk should have its rewards, shouldn’t it?

Kari Silvennoinen is a Ph.D student at Helsinki School of Economics and is currently working on behavioral decision making topics.

Collaborative filtering: is it better to weigh user-input or expert-input?

user generated experts.jpgFor those that don’t know, collaborative filtering is a method of making suggestions for other products, based on your previous shopping habits. It is used by sites/web-apps, like Netflix, Pandora Radio, and Amazon, and, I think, Ulik, and mostly based on user-generated content.

Just working it out logically, you could say several things about user-generated content:

  1. there’s a lot of it, but attention is limited to a few leading sites
  2. not all users are equal, there are demographic, emotional, intelligence, and other factors that affect how users vote.
  3. users are cheap, which also sometimes means that you get what you paid for.

I’m personally not a fan of user-generated content, at least on a massive scale, because of some of the things in that list.

Alternative is the expert-based method, which means that expert-critics analyse a product and give it a rating. It is not often used in a collaborative setting, meaning that it makes suggestions for other products, i.e. Rotten Tomatoes or Metacritic would be sites collect expert input, but don’t, afaik, suggest other matching movies.

The most famous example of an expert-based collaborative filtering system is Pandora Radio, which is built on top of the Music Genome Project, a collection of 50-or-so music-experts that analyse music and assign attributes to it. Those attributes can then be used to match songs. Users’ input isn’t ignored, they can vote on songs, which affects their future track-lists.

A few characteristics of expert-based systems are:

  1. They entail significant wage-costs for employees that have invested in their expertise. Counter this against the possible income of a service like Pandora—advertising & referral-fees—and there could be a discrepancy.
  2. They cannot rate as much content, as quickly, as a more user-generated system could.
  3. They, on the other hand, maintain a consistent quality, that is unmatched by the varying quality that comes out of user-input.

I’m personally much more of a fan of an expert-based system, but sceptical of its economic merit, looking just at point 1.

Most systems seem to be orientated at users mainly, which, if you have a dataset like Netflix’s, is a smart way to go about things. There are some limitations that that entails, as the Netflix prize has revealed, namely that it cannot account for “strange” films like “Napoleon Dynamite,” and that it doesn’t take into account any user-based information, such as demographics or mood.

What do you think, audience? Knowing that users are cheap but a-plenty (but also overwhelmed with competing attention-buckets), and experts are few and expensive, is the solution to still go the user-generated route and try to make that work? In my opinion, expert-based systems require different business models than are popular online these days. You cannot get away with charging nothing, expecting users to magically click your advert, and hope to pay those university-educated experts. That, or, the margins for your products have to be so high (e.g. insurance & travel), to make such a system work (not that I think collaborative filtering and insurance really make that much sense—”give me the insurance radio-station please!“… eh no.)

Enjoy the weekend!
Vincent

Staypressed theme by Themocracy