<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Challenges of Collaborative Filtering</title>
	<atom:link href="http://www.techiteasy.org/2008/12/02/challenges-of-collaborative-filtering/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.techiteasy.org/2008/12/02/challenges-of-collaborative-filtering/</link>
	<description>A Technology and Business Weblog provided to You by a Global Group of Friends.</description>
	<lastBuildDate>Thu, 18 Mar 2010 12:29:11 +0100</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Kari Silvennoinen</title>
		<link>http://www.techiteasy.org/2008/12/02/challenges-of-collaborative-filtering/#comment-2933</link>
		<dc:creator>Kari Silvennoinen</dc:creator>
		<pubDate>Wed, 03 Dec 2008 15:14:26 +0000</pubDate>
		<guid isPermaLink="false">http://jeremyfain.wordpress.com/?p=1473#comment-2933</guid>
		<description>Leafar, good points. What I meant by &quot;toys&quot; was that I&#039;d like to see more profit-generating applications instead of just something added to a web app as an afterthought like &quot;tag clouds&quot; and stuff like that.



I also tried to bring out the point that while news articles on the subject focus a lot on the programming side of things, there&#039;s a strong foundation on, among others, psychology and statistics too and that there are already good  algorithms and one major challenge is the data in it&#039;s own right.



I did not focus that much on implementation or user experience, because those are things that I&#039;ve no idea about. For example, how to solve stock vs. flux, I just threw those questions to air. The approach you described sounds like a practical one.



What do you see as the challenges in this field?</description>
		<content:encoded><![CDATA[<p>Leafar, good points. What I meant by &#8220;toys&#8221; was that I&#8217;d like to see more profit-generating applications instead of just something added to a web app as an afterthought like &#8220;tag clouds&#8221; and stuff like that.</p>
<p>I also tried to bring out the point that while news articles on the subject focus a lot on the programming side of things, there&#8217;s a strong foundation on, among others, psychology and statistics too and that there are already good  algorithms and one major challenge is the data in it&#8217;s own right.</p>
<p>I did not focus that much on implementation or user experience, because those are things that I&#8217;ve no idea about. For example, how to solve stock vs. flux, I just threw those questions to air. The approach you described sounds like a practical one.</p>
<p>What do you see as the challenges in this field?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vincent van Wylick</title>
		<link>http://www.techiteasy.org/2008/12/02/challenges-of-collaborative-filtering/#comment-2932</link>
		<dc:creator>Vincent van Wylick</dc:creator>
		<pubDate>Tue, 02 Dec 2008 17:44:26 +0000</pubDate>
		<guid isPermaLink="false">http://jeremyfain.wordpress.com/?p=1473#comment-2932</guid>
		<description>Ha, Mr. Ulike, I was wondering when you were going to drop by. :) Thanks for introducing me to the concept of stock vs. flux, it is indeed a good way to look at the problem dynamically, from the no-users to some-users, to a-constant-flux-of-users perspective.</description>
		<content:encoded><![CDATA[<p>Ha, Mr. Ulike, I was wondering when you were going to drop by. <img src='http://www.techiteasy.org/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Thanks for introducing me to the concept of stock vs. flux, it is indeed a good way to look at the problem dynamically, from the no-users to some-users, to a-constant-flux-of-users perspective.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: leafar</title>
		<link>http://www.techiteasy.org/2008/12/02/challenges-of-collaborative-filtering/#comment-2934</link>
		<dc:creator>leafar</dc:creator>
		<pubDate>Tue, 02 Dec 2008 13:41:47 +0000</pubDate>
		<guid isPermaLink="false">http://jeremyfain.wordpress.com/?p=1473#comment-2934</guid>
		<description>One missing link : the wikipedia article

http://en.wikipedia.org/wiki/Collaborative_filtering



What you call the pandora way is called the content based approach.



And i doubt a lot about this phrase :  &quot;Most of the collaborative filtering things today are just toys so none of this really matters.&quot; Because i think you miss the point of the user experience trying to focus too much on the theoritical approach.



Your problem on a new movie should have end up on a discussion about stock vs flux. Back to the content based approach that will slowly leave the user/user Cf approach as the ratings flows in. Like in real life, interest for james bond is based on james bond, actors, director ... etc until we have some newspapers/friends reviews.



It&#039;s well written but for me it is too much or not enough.



Will be happy to talk with you about it next time your write an article on the subject.</description>
		<content:encoded><![CDATA[<p>One missing link : the wikipedia article</p>
<p><a href="http://en.wikipedia.org/wiki/Collaborative_filtering" rel="nofollow">http://en.wikipedia.org/wiki/Collaborative_filtering</a></p>
<p>What you call the pandora way is called the content based approach.</p>
<p>And i doubt a lot about this phrase :  &#8220;Most of the collaborative filtering things today are just toys so none of this really matters.&#8221; Because i think you miss the point of the user experience trying to focus too much on the theoritical approach.</p>
<p>Your problem on a new movie should have end up on a discussion about stock vs flux. Back to the content based approach that will slowly leave the user/user Cf approach as the ratings flows in. Like in real life, interest for james bond is based on james bond, actors, director &#8230; etc until we have some newspapers/friends reviews.</p>
<p>It&#8217;s well written but for me it is too much or not enough.</p>
<p>Will be happy to talk with you about it next time your write an article on the subject.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kari Silvennoinen</title>
		<link>http://www.techiteasy.org/2008/12/02/challenges-of-collaborative-filtering/#comment-2930</link>
		<dc:creator>Kari Silvennoinen</dc:creator>
		<pubDate>Tue, 02 Dec 2008 10:14:03 +0000</pubDate>
		<guid isPermaLink="false">http://jeremyfain.wordpress.com/?p=1473#comment-2930</guid>
		<description>Yes, if you&#039;re taking sample average, the errors average out the bigger the sample is. And yes, random mistakes in user&#039;s rating are tolerable. And yes, your ratings are never &quot;wrong&quot; in a sense. The problem is when those errors are not random, like in the &quot;new movie&quot;-overrating-problem you described.



It&#039;s all just a number and the guys crunching the numbers are supposed to find out if someone&#039;s just overrating movies. You also point out a good point why assuming everyone&#039;s ratings as objective is pretty flawed. But, large enough sample and this average out, right? =)



The problem is that we&#039;re not looking at aggregates, but trying to predict on individual level where averages aren&#039;t that useful. Of course Netflix could optimize its operations based on just aggregate data, and that&#039;s probably worth something too.



When people realise they gave way too high scores just because they were still excited about the movie, do they go back and &quot;correct&quot; them? Or do they think that &quot;well, now when I&#039;ve seen this movie, I think I gave a wrong score to those some movies?&quot; Is it worth the effort to go back and re-rate movies after they&#039;ve got new information on what to base their &quot;more correct&quot; rating?



The human video-clerks making mistakes but the excitement is the uncertainty part I briefly discussed in the end. The inverse of this is why all movies seem to be &quot;safe bets&quot; (sequels, generic story, etc.) because I guess it&#039;s easy to estimate how much they reel in. For &quot;interesting&quot; movies, there&#039;s always the risk. It&#039;s like portfolio management from movie studios&#039; side, with high enough volume of safe bets, you can afford a couple interesting movies.



That&#039;s a really good point that people aren&#039;t really after good movies, but good experiences. One might wonder if it&#039;s possible to forecast good experience from previous movie ratings (is the experience visible in the number?) And true, if everything&#039;s good, then everything&#039;s medicore, which leads to a problem that should the recommendation systems actually recommend you crap just to recalibrate your ratings. In the long run it&#039;d be good for the user, but in short run you&#039;re just wasting user&#039;s time and money... =)



Strictly speaking, this isn&#039;t AI, but one approach that I didn&#039;t cover here was AI-like methods like machine learning, neural networks, genetic algortihms, etc.



And finally, true, there are countless factors here. One of the challenges is to find out what the important ones are.</description>
		<content:encoded><![CDATA[<p>Yes, if you&#8217;re taking sample average, the errors average out the bigger the sample is. And yes, random mistakes in user&#8217;s rating are tolerable. And yes, your ratings are never &#8220;wrong&#8221; in a sense. The problem is when those errors are not random, like in the &#8220;new movie&#8221;-overrating-problem you described.</p>
<p>It&#8217;s all just a number and the guys crunching the numbers are supposed to find out if someone&#8217;s just overrating movies. You also point out a good point why assuming everyone&#8217;s ratings as objective is pretty flawed. But, large enough sample and this average out, right? =)</p>
<p>The problem is that we&#8217;re not looking at aggregates, but trying to predict on individual level where averages aren&#8217;t that useful. Of course Netflix could optimize its operations based on just aggregate data, and that&#8217;s probably worth something too.</p>
<p>When people realise they gave way too high scores just because they were still excited about the movie, do they go back and &#8220;correct&#8221; them? Or do they think that &#8220;well, now when I&#8217;ve seen this movie, I think I gave a wrong score to those some movies?&#8221; Is it worth the effort to go back and re-rate movies after they&#8217;ve got new information on what to base their &#8220;more correct&#8221; rating?</p>
<p>The human video-clerks making mistakes but the excitement is the uncertainty part I briefly discussed in the end. The inverse of this is why all movies seem to be &#8220;safe bets&#8221; (sequels, generic story, etc.) because I guess it&#8217;s easy to estimate how much they reel in. For &#8220;interesting&#8221; movies, there&#8217;s always the risk. It&#8217;s like portfolio management from movie studios&#8217; side, with high enough volume of safe bets, you can afford a couple interesting movies.</p>
<p>That&#8217;s a really good point that people aren&#8217;t really after good movies, but good experiences. One might wonder if it&#8217;s possible to forecast good experience from previous movie ratings (is the experience visible in the number?) And true, if everything&#8217;s good, then everything&#8217;s medicore, which leads to a problem that should the recommendation systems actually recommend you crap just to recalibrate your ratings. In the long run it&#8217;d be good for the user, but in short run you&#8217;re just wasting user&#8217;s time and money&#8230; =)</p>
<p>Strictly speaking, this isn&#8217;t AI, but one approach that I didn&#8217;t cover here was AI-like methods like machine learning, neural networks, genetic algortihms, etc.</p>
<p>And finally, true, there are countless factors here. One of the challenges is to find out what the important ones are.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vincent van Wylick</title>
		<link>http://www.techiteasy.org/2008/12/02/challenges-of-collaborative-filtering/#comment-2931</link>
		<dc:creator>Vincent van Wylick</dc:creator>
		<pubDate>Tue, 02 Dec 2008 09:32:52 +0000</pubDate>
		<guid isPermaLink="false">http://jeremyfain.wordpress.com/?p=1473#comment-2931</guid>
		<description>I&#039;m not a statistician, but isn&#039;t it the case that if you study a phenomenon, in this case a user&#039;s voting-history, long enough, then the errors average out? Playing devil&#039;s advocate here—I know what I wrote last time—isn&#039;t that kind of the power of many(!) users, that there are so many of them, hence you can generalise along a huge time and space dimension, as it where? (Devil&#039;s advocate robe off: of course, you&#039;ll need the users first.)



In other words, I think its a mistake to draw generalisations from a single instance and a single user. We all have a Bambi, 2001, or Napoleon D. (I only liked the last 5 mins) in our closet, that doesn&#039;t mean that our other ratings are wrong.



About new movies. I don&#039;t know if ever look at IMDB-ratings for blockbusters as they just come out? I&#039;ve come to learn (= be trained) that these movies are usually overrated the first few weeks, after which their ratings normalise—they can&#039;t all be in the IMDB-top 100… About new movies, I&#039;ve also come to learn that if only 50 people vote for a movie, they also usually overrate it, probably because they are the film-maker&#039;s friends. In other words, users like me arent&#039; stupid and will know that new films can&#039;t be rated perfectly through some magic voodoo.



There was another factor I found interesting in the NYTimes-story, which is that human video-clerks make mistakes, but also bring a form of excitement to the table. People like to be surprised and even extremely bad films can lead to some pretty heated debates afterward—any experience is all about the memories, after all, and if everything is &quot;good,&quot; it eventually becomes mediocre.



As far as pay-back is concerned, I think that either the Wired or the NYTimes story mentioned what kind of numbers, Netflix is renting out, 1 million is peanuts to them + its money well-spent.



Anyway, the more I learn and think about collab. filtering, the more I love it, because there are so many factors that these AIs don&#039;t yet take into account yet.</description>
		<content:encoded><![CDATA[<p>I&#8217;m not a statistician, but isn&#8217;t it the case that if you study a phenomenon, in this case a user&#8217;s voting-history, long enough, then the errors average out? Playing devil&#8217;s advocate here—I know what I wrote last time—isn&#8217;t that kind of the power of many(!) users, that there are so many of them, hence you can generalise along a huge time and space dimension, as it where? (Devil&#8217;s advocate robe off: of course, you&#8217;ll need the users first.)</p>
<p>In other words, I think its a mistake to draw generalisations from a single instance and a single user. We all have a Bambi, 2001, or Napoleon D. (I only liked the last 5 mins) in our closet, that doesn&#8217;t mean that our other ratings are wrong.</p>
<p>About new movies. I don&#8217;t know if ever look at IMDB-ratings for blockbusters as they just come out? I&#8217;ve come to learn (= be trained) that these movies are usually overrated the first few weeks, after which their ratings normalise—they can&#8217;t all be in the IMDB-top 100… About new movies, I&#8217;ve also come to learn that if only 50 people vote for a movie, they also usually overrate it, probably because they are the film-maker&#8217;s friends. In other words, users like me arent&#8217; stupid and will know that new films can&#8217;t be rated perfectly through some magic voodoo.</p>
<p>There was another factor I found interesting in the NYTimes-story, which is that human video-clerks make mistakes, but also bring a form of excitement to the table. People like to be surprised and even extremely bad films can lead to some pretty heated debates afterward—any experience is all about the memories, after all, and if everything is &#8220;good,&#8221; it eventually becomes mediocre.</p>
<p>As far as pay-back is concerned, I think that either the Wired or the NYTimes story mentioned what kind of numbers, Netflix is renting out, 1 million is peanuts to them + its money well-spent.</p>
<p>Anyway, the more I learn and think about collab. filtering, the more I love it, because there are so many factors that these AIs don&#8217;t yet take into account yet.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
