The Afterthoughts – Gmail forwarding and service interoperability – an interesting observation

“The Afterthoughts” is a series where I revisit some of my older blog entries and see how things have changed since the time I made the blog post and now.

The posts that I will choose initially will be from 2004 to 2006.

So here is the first one in the series:

Post: Gmail forwarding and service interoperability – an interesting observation
Originally posted on: 2005-11-21

The entry goes about explaining how when you connect various services together, you could end up with the same information multiple times.

This is increasingly becoming a problem these days. Services like Twitter and Friendfeed are not solving the problem elegantly, so you see more and more duplicates and links to the original post.

Here is a typical scenario today:
I make a blog entry. In order to ensure that my readers see my post immediately, I have a service that automatically posts a message in Twitter. This is like instantly messaging my friends (actually Twitter followers) telling them, “Look, I made a blog entry”.

Now, I use a lot of Web 2.0 services. So, in order to ensure that all my friends have a single feed to follow my activities, I use some aggregator like FriendFeed or Tumblr.

Some friend of yours (let's call him Bob) likes your blog entry and bookmarks it on del.icio.us. Another friend, Andrews bookmarks it in Magnolia.

Let us now say, there is another person Dave, who is a friend of you, Bob and Andrews. He is following all 3 of us in Friendfeed.

How many entries is Dave going to see of the original entry?
6 in total! 3 from you – 1 from your blog post directly, 1 from Twitter, 2 from Tumblr (1 via the blog post and 1 via Twitter), 1 from Bob via del.icio.us and 1 from Andrews via Magnolia.

The screenshot shows duplicate entries from mashable's blog feed and from Twitter:

Friendfeed - problems with aggregations services

Now this is real noise. And this is more true if Dave is not even interested in the blog post to begin with.

So the solution?
Friendfeed allows you to hide specific feeds from specific people. For example, Dave can hide all bookmarks from Bob or all Tumblr entries from me.

Friendfeed's attempt at eliminating duplicates

Now that is not a good solution because not all bookmarks from Bob are duplicates.

Tools like Feedblendr and Blogbridge have solved this problem for simple RSS aggregation. However things are different when it comes to social network and aggregation.

So right now there is no simple way of detecting duplicates and more and more people are complaining about this in the blogosphere explaining how Friendfeed is more noise than information and why the good old Google Reader is still relevant.

Here is one such discussion. As the discussion suggests, it is not just about eliminating duplicates; it also requires you to merge discussions/comments in each of these posts keeping in mind that not everyone is a friend of everyone else.

So what has changed over the last 2 years?
If anything, the problem has become a tougher one. I am sure the startup that does duplicate elimination and gives you a filtered feed taking your social networks into consideration is going to be the next hyped startup in the Web 2.0 world.

Privacy disasters with aggregation services

Imagine you have a host of aggregation services like Friendfeed, Tumblr, Suprglu, Lifestreams connected to each other, such that each one is reading from your various feeds and republishing the content.

Now imagine a disaster where one of these services, say Twitter, suddenly, because of some flaw, exposes your private messages.

It's like a Tsunami that cannot be controlled! Your private data would flow into various input streams in a matter of seconds and there is no turning back.

Things will only get worse with activity feeds and Beacon.

The bottom line is: Be careful about where your data is going and what data you put online.

Big fish, small fish – my personal experience of working in a startup

It’s been about 3 months since I joined Ugenie.

Having spent about two and a half years in IBM, which is a mammoth on any scale, and working in Ugenie now, which has a really small employee strength, what changes do I see?

I guess most large organizations have similar characteristics, so instead of naming IBM in the rest of my post, I will just mention it as ‘Big Fish’ to represent all large organizations. Again, I guess nothing is unique about my feelings of Ugenie and most startups have similar characteristics, so I will call it ‘Small Fish’.

If I look back into the days I spent at Big Fish, I have mixed feelings. There are quite a few things that I have gained, but some that I had to lose.

The immediate change that I saw in Small Fish is the rate at which things move. Ideas emerge by the minute, and are implemented within a couple of days or for ideas taking more time, a week. More time is spent in getting things done than on planning and processes. Long term goals are perhaps goals for the month and I guess there is no point thinking about a year down the line or anything on a similar scale! The ‘tomorrows’ or ‘over the next week’ are replaced by ‘now’ and ‘today sometime’. There are lesser ‘meetings’ and they are short. You definitely don’t need a calender for your meetings.

The second change I see is how large organizations spend lots of money on infrastructure and how startups tend to save on every penny. I remember the ultra modern conference rooms, the posh pantries/wash-rooms, the money that was spent on things like events, all hands, yearly gifts, interior decoration, posters talking about “X” day (replace X with Innovation, Mothers or something like that) etc, I could go on. Don’t get me wrong. I am not saying Small Fish does not spend on its employees, but every penny spent is done so cautiously.

And now let me delve into the differences in terms of:

Breadth/Depth: I guess one of the reasons that made me stick to Big Fish is the breadth of technologies on offer. Every day, you would come across someone working on a project that you never knew existed or you come across a page on the Intranet that excites you. Every day, you keep widening your breadth of knowledge.

Things are different in Small Fish. Small Fish offers the much required depth.

I remember someone telling me of how there needs to be a balance between the breadth of our knowledge and the depth in some specific field. This is like the letter ‘T’, with the top horizontal line representing the breadth and the vertical line representing the depth.

Ownership: I don’t see too much of a difference here. While in Big Fish, I used to own the components I developed and I was responsible for timely delivery of that component and ensuring that the consumers of my work are kept happy. Things are similar in Small Fish, may be with minor differences. The ownership here tends towards the whole of the product/application than just the component you own. (This is closely related to the Roles/Responsibilities observation below).

Priorities: Back in 2005, when I was working on product development, there was a phase where I found it difficult to prioritize my tasks. There were a bunch of bugs to fix, there were some mails to respond to and there were some bug databases that I had to update and it seemed like all of them had the same priority.

If I compare that period with the present, I would say it’s quite similar. There is more work than anyone can handle, there are some things that only you can do and the deadlines are sometimes impossible to meet because of various technical/non-technical issues.

However I seem to be a bit more comfortable in my present position than I was back in 2005. This could be attributed to the fact that I have two and a half years of experience behind me now or to the fact that I am in a startup now and it is normal for everyone to have their plates full.

Roles/Responsibilities: I would say there is no such thing in a startup. While in Big Fish, we had clear responsibilities and having completed those tasks, we could consider our job done, in Small Fish, things work differently. There is no such thing as ‘my responsibility’. Or if you really want to put it in terms of that, you would have to say, ‘everything is my responsibility’. While many people don’t like that, I see every such occurrence as an opportunity to learn and I really enjoy it.

Opportunities: There is no dearth of this in Big Fish or Small Fish, but there is a difference. In Big Fish, you need to search for them or understand that ‘x’ is opportunity knocking at your door, while in Small Fish you would just take it up, without perhaps realizing that it was an opportunity.

Social network: Well, if we are talking of getting to know more people with diverse personalities and skill sets there is really no end to how many people you can connect with in Big Fish. This is severely restricted in Small Fish. I remember having some technical discussions with people who have significantly more experience than me in Big Fish and I should say the things that I learnt then are things that you don’t get to learn from a book. This has definitely added to my experience. It was about ‘learning from the failures/experiences of others’.

Smaller fishes tend to have a younger crowd. So while the teams are dynamic, the number of people with more experience than you and with diverse skillsets is limited. This has nothing to do with the actual people in Small Fish, but is rather because of the size of Small Fish, which, because of its very nature is small.

Awards/Recognition: Frankly, there was no dearth of it in Big Fish. But on second thoughts, other than the monetary rewards and the benefits of the actual work that you did to earn the award, do these awards really matter to the rest of the world?

Other activities: This is severely restricted in Small Fish. Big Fish invests a lot on employees. So every day you hear people being on training or attending some conference or even having gone abroad to learn some technology.

Processes: If you ask anyone working at a startup, especially someone who has worked in a large organization before, I guess one thing they would mention is the processes. What is my take on this?

I would say there needs to be a balance. While on one side too many processes is definitely going to be time consuming and a pain on the employees, having a well defined process would mean that everything that needs to be taken care of is actually taken care of. So if Big Fish is tending towards one end of this spectrum, Small Fish is towards the other end, with the best point being somewhere close to the middle.

So people ask me, do you think it was worth it?
Well, no doubt about that. I guess you have to lose some things to gain some things. And the things that Small Fish offers it’s tough to expect that from Big Fish and this, to a major extent, holds the other way round too. Now having had enough experience in Big Fish and no experience in some Small Fish, I would say, yeah, it was worth it and I am glad I decided to join Ugenie.

Now remember that these are my observations and I could be wrong in terms of how various Big Fishes/Small Fishes work or even in terms of the Big Fish where I worked and the Small Fish where I currently work. Also my own opinions might change as I gain more experience at Small Fish and compare it to my experiences in Big Fish.

iRead – a social book discovery revolution

It has been a while since I thought I should write a review of iRead.

iRead is a social book discovery application. It has been quite successful on Facebook and has a very large userbase. Currently iRead has a total install base of about 1.4 million users, mostly from Facebook.

So what do we mean by social book discovery?

iRead is not just about maintaining a bookshelf online. It tries to bring the social aspect into picture.

‘social’?
iRead depends a lot on your social network. You can share your bookshelf with your friends, learn what your friends are reading and what their reading tastes are. You can discuss about books in various book clubs. You could participate in Quizzes or even add your own. You can find out how compatible your reading tastes are with other people in the network.

iRead does not require a separate registration. It is available right in your social network. (As of now the application is available in Facebook, Orkut, MySpace, Hi5 and Bebo.) So when we are talking about friends, we are talking about your friends from the network where you are using iRead. So if you use iRead in Facebook, you see your Facebook friends in iRead, while in Orkut you see your Orkut friends. Many a times, all it requires is to just add the application to your profile.

‘book discovery’?
For one, iRead provides recommendations based on your reading tastes. Then there are various other mechanisms by which you can discover new books to read.

Let’s explore some.

Several ways to browse

* You could first start off by searching for books and adding them to your bookshelf. This helps us learn about your tastes and recommend books that you may like.

* When searching, you could either enter the name of the book, or its author, or if you know the ISBN, you could enter that.

* If you want to just browse through the application you could start off by looking at what other iReaders are doing. The home page shows the most recent activity in the network.

News feeds on homepage

* So let’s say you find some interesting book. Just click on the book and you are taken to the book details. Here you get to know how many readers the book has, how many reviews people have written for the book and get some instant user reviews and an editorial review. You can also find out similar other books.

Book details for Da Vinci Code

* If you see that the book is interesting, just click on the ‘See All’ reviews link. This will display all the reviews for the book. Read the ones you like and you will soon learn what the book is about.

Book review page for GEB

* Since there are multiple ways to reach your data, your reviews are never buried. So even if you are writing a review for a book, that already has a thousand reviews, you can expect your review to be read by other iReaders.

* If the book interests you, you might want to check out other books by the same author. Just click on the author’s name. This will show all books by the author. You could also click on the small icon next to the author’s name to search for the author in Author’s corner. This will give you other details like the profile of the author, what others think about the author, how many fans the author has etc.

Authors corner

* Author’s corner is a forum for readers to interact with their favorite authors. So if you are the author of a book and are looking for a forum to interact with your readers, this is where you should be. Author’s corner allows authors to maintain their profile, and also learn about their readers’ expectations.

* While reading reviews, you might find that the review from a particular user is very interesting. You might now want to look at this reader’s bookshelf. Many a times, I have found this to be a good mechanism to discover new books. You can get an assurance of how close your tastes are by looking at the number of common books amongst you. Ok, now you might want to look at other reviews by this reader.

* You could also contact the reader by leaving a wall post/scrap.

* You may also want to check out who among your friends is on iRead and what they are reading. Click on the Friends link in the header. If you want to know about your friends’ reading tastes and they are not yet on iRead you could invite them to add the application.

Friends reads on iRead

* For selected books, you could even browse inside the book. A lot of out of copyright books are available for free online viewing. Some other selected books are available for limited preview.

Other features worthy of mention

Take your reads with you

The top header on iRead
So what if you are in all these networks and want to use iRead everywhere?
iRead has a feature to import your bookshelf from Facebook to Orkut, MySpace and/or Hi5. Once imported, you will see the same bookshelf in all the networks. However the friends shown to you depends on the network you are currently in.

Import books from other sources

Import books from other sources
If you have been maintaining books in some other place, you may want to try importing books using the import books option. The link to this is found below the search box.

Add a book

Can’t find a book you want to add to your bookshelf? You can add it to our catalog. The link to add a book is found below the search box.

So what’s more?!

Happy iReading!

Disclaimer: I work for Ugenie and am part of the iRead application development team. The views expressed here are my own and not necessarily those of Ugenie.

Downloading data using Greasemonkey – Part 2

So I finally found some time to continue my experiments with the data download from browser to the server.

This time my target was Orkut. I decided that I write a simple script to extract my Orkut profile and then display a sub-set of these fields in my own site using my own formatting.

I did not write a Greasemonkey script this time, but just used Firebug to write Javascript. Here is the browser side script:

var arrayToExtract = new Array('listdark', 'listlight');

for(var z=0;z<arrayToExtract.length;z++){
   var elements = $$('.'+arrayToExtract[z]);   // Just got lucky here. $$ is available!
   for(var i=0;i<elements.length;i++){
       var item = elements[i].getElementsByTagName('p');
       if(item[0] == undefined)
           continue;
       postData(item[0].innerHTML);
       postData(item[1].innerHTML);
   }
}

function postData(data){
   var scriptElement = document.createElement('script');

   scriptElement.setAttribute('src','http://buzypi.in/backup?data='+data+'&file=orkut&date='+Date());

   document.body.appendChild(scriptElement);

}

The script above posts the profile information one by one to the server and the server captures it and appends it in a file. The server side code is as follows:

<?php
global $_REQUEST;

$file_name = $_REQUEST['file'];
$data = $_REQUEST['data'];
$more = $_REQUEST['more'];

$DIRECTORY = 'data';

$file_with_location = dirname(__FILE__).'/'.$DIRECTORY.'/'.$file_name;

$file_handle = fopen($file_with_location,'a');

fwrite($file_handle,$data);

if($more == "true")
   ;
else
   fwrite($file_handle,"\n");

$success_value = fclose($file_handle);

echo "/*";
if($success_value === TRUE){
   echo "Successfully appended: ".$data."<br/>";
   if($more == "true"){
      echo "Expect you to send more data";
   }
} else {
   echo "Failed to write data";
}

echo "*/";

?>

Guess what happened when I executed the script?

The data was appended to the file alright, but the ordering of the items was messed up in some places.

Here is a sample:

job description:
work phone:
I am a social networking application developer. I work on the Books iRead application in Ugenie. Our app is currently available in Facebook, Bebo, Orkut and Myspace.
career interests:
...

while the expected output was:

job description:
I am a social networking application developer. I work on the Books iRead application in Ugenie. Our app is currently available in Facebook, Bebo, Orkut and Myspace.
work phone:
career interests:
...

The job description content should have been received before ‘work phone’, but this was not the case.

So what is the solution?

There are 2 things I can think of:
1. Ensure that data posted is atomic.
2. Come up with a simple sliding window protocol arrangement between the browser and the server.

Solution 1 is not always feasible, because of the limits on GET URL size. In fact, we might need to split the body just so that it can be posted using GET’s. So the only solution that can take care of this is (2).

I will post more entries as I progress. Meanwhile, if you have any better solution to the problem, comment here.

PHP Functional Programming – A code snippet

Given a string of comma-separated values, how do you convert each of them into a link of the form:
<a href=”http://–item&#8211;.google.com/”>–item–
and return a comma-separated list of these strings?

Snippet 1:

<?php
//Make these links to google.com

$string = "news,reader,mail";

$array_of_string = split(",",$string);

$final = array();

foreach($array_of_string as $item){
	$final[] = "<a href='http://".$item.".google.com/'>$item</a>";
}

echo implode(", ",$final);
?>

Snippet 2 (uses functional constructs):

<?php

//Make these links to google.com

$string = "news,reader,mail";

$array_of_string = split(",",$string);

echo implode(", ", 
		array_map(
			create_function('$item',
	'return "<a href=\'http://".$item.".google.com/\'>$item";'
					),$array_of_string
			)
		);

?>

Downloading your data using Greasemonkey

Whenever I use some service over the web, I look for several things. Ease of use and customisability are important factors.

However, the most important thing I consider is vendor lock-in (or rather the lack of it). Let's say I am using a particular mail service (ex, GMail). If someday, I find a better email service, would it be easy for me to switch to that service? How easy is it for me to transfer my data from my old service to my new service?

For services like Mail, there are standard protocols for data access. So this is not an issue. However for the more recent services, like blogging, micro-blogging etc, the most widely used data access methodology/format is 'HTTP' via 'RSS' or 'ATOM'.

However, it's not the case that all services provide data as RSS (or XML or in any other parseable form). For example, suppose I make a list of movies I have watched, in some Facebook application, or a list of restaurants I visited, how do I download this list? If I cannot download it, does it mean I am tied to this application provider forever? What if I have added 200 movies in my original service and I come across another service that has better interface and more features and I want to switch to this new service but not lose the data that I have invested time to enter in my original service?

In fact, recently when I tried to download all my Twitters, I realized that this feature has been disabled. You are not able to get your old Twitters in XML format.

So what do we do when a service does not provide data as XML and we need to somehow scrape that data and store it?

This is kind of related to my last blog entry.

So I started thinking of ways in which I could download my Twitters. The solution I thought of initially was using Rhino and John Resig's project (mentioned in my previous blog entry). However, I ran into parse issues like before. So I had to think of alternative ways.

Now I took advantage of the fact that Twitters are short (and not more than 140 characters).

The solution I came up with uses a combination of Greasemonkey and PHP on the server side:

Here is the GM script:
If you intend to use this, do remember to change the URL to post data to.

// @name           Twitter Downloader

// @namespace      http://buzypi.in/

// @author         Gautham Pai

// @include        http://www.twitter.com/*

// @description    Post Twitters to a remote site 

// ==/UserScript==

function twitterLoader (){
	var timeLine = document.getElementById('timeline');
	var spans = timeLine.getElementsByTagName('span');
	var url = 'http://buzypi.in/twitter.php';
	var twitters = new Array();
	for(var i=0;i<spans.length;i++){
		if(spans[i].className != 'entry-title entry-content'){
			continue;
		}
		twitters.push(escape(spans[i].innerHTML));
	}
	
	for(var i=0;i<twitters.length;i++){
		var last = 'false';
		if(i == twitters.length - 1)
			last = 'true';
		var scriptElement = document.createElement('script');
		scriptElement.setAttribute('src',url+'?last='+last+'&data='+twitters[i]);
		scriptElement.setAttribute('type','text/javascript');
		document.getElementsByTagName('head')[0].appendChild(scriptElement);
	}
}

window.addEventListener('load',twitterLoader,true);

The server side PHP code is:

<?php

global $_REQUEST;
$data = $_REQUEST['data'];
//Store data in the DB, CouchDB (or some other location)
$last = $_REQUEST['last'];
if($last == 'true'){
	echo "
	var divs = document.getElementsByTagName('div');
	var j= 0;
	for(j=0;j<divs.length;j++){
		if(divs[j].className == 'pagination')
		break;
	}
	var sectionLinks = divs[j].getElementsByTagName('a');
	var href = '';
	if(sectionLinks.length == 2)
		href = sectionLinks[1].href;
	else
		href = sectionLinks[0].href;
	var presentPage = parseInt(document.location.href[document.location.href.indexOf('page')+'page'.length+1]);
	var nextPage = parseInt(href[href.indexOf('page')+'page'.length+1]);
	if(nextPage < presentPage)
		alert('No more pages to parse');
	else {
		alert('Changing document location');
		document.location.href = href;
	}
	";
} else {
	echo "
	var recorder = 'true';
	";
}

?>

The GM script scrapes the twitters from a page and posts it to the server using <script> includes. The server stores the twitters in some data store. The server also checks if the twitter posted was the last twitter in the page. If so, it sends back code to change to the next page.

Thus the script when installed, will post twitters from the most recent to the oldest.

Ok, now how would this work with other services?

The pattern seems to be:
* Get the data elements from the present page – data elements could be movie details, restaurant details etc.
* Post data elements to the server.
** The posting might require splitting the content if the length is more than the maximum length of the GET request URL.
* Identify how you can move to the next page and when to move to the next page. Use this to hint the server to change to the next page.
* Write the server side logic to store data elements.
* Use the hint from the client to change to the next page when required.

The biggest advantage of this method is we make use of the browser to do authentication with the remote service and also to do the parsing of the HTML (which, as I mentioned in my previous post, browsers are best at).