Tuesday, October 25, 2011

Content sharing makes the web a confusing place



Today I opened my blogger, and went over the "News from Blogger" section and noticed 2 similar entries. 

This reminds me some critic I read a few years back while experimenting with Maven documentation.
This is yet another  search repetitions or search pollution case.

Pollution caused by Feed Sharing

While amazed me that this time the culprit is Google, this makes me wonder the integrity of feed sharing.
Picture a newspaper's feed with the same article over and over again, simply duplicated because you are registered for several different newspapers.

The whole idea of the feed is to get info from different sources. But if 2 sources share that information, you automatically get a duplication in your feed.
Looking at it differently, if you are a feed owner, you want to enrich it by sharing content from another popular source.

Which means that my feed should be dependent on the subscriber.
If the subscriber is subscribed to feed A and B, but these feeds share content, each feed should not publish the shared content, otherwise the subscriber gets duplicate feed entries.

This won't resolve the repetition in the search results. Searching for some substring in the published item, you will find the original and the shares.

Pollution caused by Templates

This is a simpler case to crack. I know the maven example for it.
Maven offers templates for documentation, however, if not modified you get a repetition in search result.  Much like the feed item shares.

However easier to understand, I think this problem might be tougher to resolve.
I wonder if this could be easily resolved by adding a "Meta tag" to the template with an ID to the template.
For example, the maven templates should have something like :

<meta name="template_ID" content="maven_doc_template"/>

This will enable Google to aggregate these results and minor the repetition by showing "show more results like this" link which is used when there are many results from same source.

This solution can also resolve the problems in the feed duplication without the need to have a dependency on the subscriber. A feed reader should not show 2 items with the same template ID, and the template ID should be different for every item, but the same between shares.

Conclusion


The goal, as it seems, is to ID the repetition in some way. 
adding a template ID might be good, but you cannot enforce it, while making feed-subscriber dependency might be a security issue, so we can't use that either.

I think, template ID would be nice if added. It can automatically resolve feed item duplication as this is not a user side dependent solution.
It will also resolve new Maven documentations repetition assuming Maven will generate the meta-tag and the editor won't delete them.
However it will not resolve the existing documentations, and you are unable to enforce the meta-tag.



Comment on what you think about this topic.


No comments :

Post a Comment