Semantic Web -> Single data source -> The future of search -> Google base

It has just been 2 weeks since me and were discussing, “What will happen to search engines like Google, when the concept of Single data source comes in”.

The concept of single data source would mean that no data would exist in static pages. All the data would reside in some storage unit and the pages would be created (if at all required) at run time based on the users' interests.

The existing search engines work on static pages. How well would this work in Web 2.0? Suppose the only pages that existed in the Internet were dynamic pages, what can the search engines index?

Enter Google… Enter Google Base.

I should have thought of it before. As some “Google 1 hour video” says, Google will never give up. They think way ahead of others!

People are spreading rumors about Google base. Here is what Slashdot has to say. The comments are interesting as well.

Google stepped in and made an official announcement too.

People at Google are not fools! They know that once the world moves towards Semantic web and Web 2.0, the amount of static content is going to be drastically reduced. This would mean that search engines cannot boast of having indexed 8 million (or billion) pages and if they do that, it would be considered seriously out-fashioned. (Google has in fact stopped putting that number in their home page; why they did this is a different story altogether!)

It seems like Google says, “How can we solve this problem? Ask people to send data to us? Yeah, why not?! Why should we go around and ask people for data? Let us ask them to publish it here. We want all info. We have the capacity to store it all here. Make your data dynamic and we'll instantly show the world the data that you created.” (You publish, we subscribe! Inverse-RSSing hah?)

Smart!!!

Now the question comes, whether they are really moving towards the semantic web or not. I think they are. I did not get a chance to see Google base as yet; assuming that all the rumors are spreading true facts about Google Base, Google is using a “name=value” kinda structure in Google base, which is a basic pre-requisite for facts representation in Semantic web.

This could mean that Google would then say, “Just publish it wherever you want in a definite syntax, and we will take it from there”. The only difference between this way of indexing and the present way is that in the new method, Google is able to interpret the content in a much better way as the data is structured.

Advertisements

In the world of IPv6 every molecule has its own address

Take it, you also take it… give it to your friends, give it to your relatives… give it to your pets… give it to everyone. The resource which was so scarce just some time back, will be considered to be so abundant in the near future.

Well, I am not an expert in IPv6 or networking, but here goes:

Consider this article on IPv6: The Next Generation Internet Protocol. The author says:

To accommodate almost unlimited growth, and a variety of addressing formats, IPv6 addresses are 128 bits in length. This address space is probably sufficient to uniquely address every molecule in the solar system! (For a full description read the article).

Well, this is not surprising.

This would mean that every folder could have its own IP address, because the number of molecules required to store the fact that a folder exists is much more than 1. You might ask me, “Are you crazy? Why should every folder have its own IP address”, but mind you, if it is there, people will use it.

Enter nanotechnology, and we move beyond molecules and enter into the world of electrons. What happens next. It is already told that in nanotechnology you are able to control the motion of electrons to make computations. So now, is it possible to really give every folder its own IP address?

Perhaps not. The reason is that the number of logical states is infinite. A folder does not have any physical existence. Just because it is static now, does not mean that it will be so even in the future. Folders could be dynamic. No data will ever exist statically and that is where we are heading.

What am I trying to prove? Well, the point that IP addresses are 128 bits and that every molecule can have its own IP and its comparisons with Avogadro's number seems foolish. True, IPv6 solves the problem that IPv4 brought with it and that should be the end of the discussion. You can never say it is exhaustive (can you?). IP addresses are used to uniquely identify hosts. Wikipedia says, “A host is any machine connected to a computer network, a node that has a hostname.” That need not be the case. We could have logical entities connecting to the network (take folders themselves as examples). What then?

Bye, bye folders

It occurred to me when Gmail used it. Many other sites/products/services were using this same concept, albeit with different names.

I am talking about the concept of tagging (or labeling). There is a very interesting study on tagging here.

It made me wonder, why at all have folders? Why not have just tags? How about a world without folders?

Well, if you think I am fantasizing then you should look at evidence that people are moving in this direction.

* The much hyped Reiser4 filesystem supports Semantic filesystems.
* WinFS supposedly has this.
* MAC OS's Spotlight search is a work-around to this.

2 years down the line, you might see that “Folders are history”! But the question is what will replace them? Is it tags? Well, in its present form, tags are not quite a replacement to folders. While tags have several advantages like automatic rule application, there are some disadvantages of tagging compared to folders. One that I can describe right-off is “Context awareness”.

Suppose I tag a file by name “Project” does it mean it is MY project, or does it mean SOMEONE ELSE'S project or does it mean PROJECT RESOURCES? In folders, it is possible to give a context to the data, while tags in their present form, will not help us here.

A good solution is to have the best of both worlds. The concept of tag clusters would probably be what would actually replace folders. It is not yet unclear how tag clusters would look, so it would be early to comment on it, but you can expect something soon!

Problems with Podcasts

Podcasts are the new buzz thing in the WWW. While RSS provides a mechanism to subscribe to textual feeds, Podcasts help in subscribing to audio/video content. So, instead of those small orange bars, you will now see colorful iTunes images or Odeo images.

However there is an inherent problem with podcasts. They are not searchable. A typical podcast, for example, Slashdot Review contains many different news items. In this example, Slashdot review contains all the important stories published in Slashdot in that day.

In RSS, suppose I am not interested in reading a particular news item, I can just skip and read the next one. But in Podcasts, since all the news items are aggregated together into a single audio feed, we are not able to skip certain items.

However considering the fact that Podcasts are still in their infancy, we can expect a solution soon.

One such solution is to extend the RSS type to include 'skip points' in the audio file. By 'skip point' I mean a description of which news item starts at what offset. The Podcast descriptor would contain not just the location of the file, but also the contents of the audio file. This would also require a special podcast player, which is able to read and understand the podcast descriptor. Of course, this needs to be standardized so that podcasts from all providers adhere to a single standard. Another advantage of this is that the descriptors could be searched in a standard way and podcast directories are able to show news items and the exact location of those news items in podcast files.

However one problem with this technique is that, it is not easy to make listeners listen to advertisements. It would be easy to skip advertisements if the listener is not interested in it. A second problem is that the accuracy of the podcast descriptor is in the hands of the provider.

Any other solution?

Windows XP a complete re-write?

This is just humor (no real facts down here… Or is there?!):

I remember having read that Windows XP was rewritten from scratch. But the day I found out that it was not possible to create a folder by name “con” in Windows XP I started experimenting.

It is not possible to create this folder in either FAT or an NTFS partition. This clearly indicates that the problem lies not in Windows kernel as such, but in the filesystem (considering that filesystem is separate from the kernel in a microkernel approach). And since NTFS is adopted in Windows XP and FAT was there earlier, NTFS was not a complete rewrite.

Me and tried something else. How about trying to create a folder from Linux on a mounted Windows drive? It was not possible to do this.

Now comes the question, is Windows XP a real rewrite? I don't think so. Microsoft is known for showing off things which others would have created ages ago. The recent show PDC'05 is an evidence of this. The speaker was demonstrating the new browser IE 7 and said, “And look what we have here : Tabs!!!”

This made me think:

Is it true that Windows XP hangs less than Windows 98? If so, how did Microsoft do this?

This is how they did it:

Windows 98 kernel:




explorer();


Windows XP kernel:

while(1)
{
if(!stable(explorer))
{
kill(explorer);
explorer();
}
}

And lo!!! Windows XP does not hang!!!

Random thought

People enjoy coding… how different is it from playing computer games?

Analysis-Paralysis and Information overload

I had this interesting thought today.

How many times has it happened to you that you come up with a brilliant idea and then after a lot of research you realize that someone else is working on it and are way-ahead?

But what I felt is that if this continues, then you will always be in a state of Analysis-paralysis. With the problem of Information overload, this problem is more intense. (Wanna know more about Anti-patterns?)

It is better therefore, to get into ACTION! This is probably the reason why RSS is a huge success, so is tagging. While there are groups which design standards, there are groups which actually jump into the playground and implement things. Someday the 2 groups converge.

And why did I have this thought? Well, tagging is evolving and you will soon hear about “Tag clusters”. While you might feel that this is normal, the clusters are responsible for giving a context to tags. Now this is where Semantic web concepts help.