<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Karmona Pragmatic Blog &#187; Semantic Web</title>
	<atom:link href="http://blog.karmona.com/index.php/category/semantic-web/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.karmona.com</link>
	<description>Pragmatic Software Management, Internet Trends, Life and more...</description>
	<lastBuildDate>Wed, 14 Jul 2010 19:40:12 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>The Social Graph Challenge</title>
		<link>http://blog.karmona.com/index.php/2008/12/30/the-social-graph-challenge/</link>
		<comments>http://blog.karmona.com/index.php/2008/12/30/the-social-graph-challenge/#comments</comments>
		<pubDate>Tue, 30 Dec 2008 20:28:49 +0000</pubDate>
		<dc:creator>Moti Karmona &#124; מוטי קרמונה</dc:creator>
				<category><![CDATA[Delver]]></category>
		<category><![CDATA[Disruptive Technology]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Social Network]]></category>

		<guid isPermaLink="false">http://blog.karmona.com/?p=336</guid>
		<description><![CDATA[I was analyzing, dreaming, monitoring, crawling, debugging, reading, breathing, cursing, scaling, visualizing and learning the social graph for the last couple of months and I thought it might be a good idea to write a little something about The Social Graph Challenge with a pragmatic twist on few other common concepts.
 
&#8212;&#8212;&#8212; Blitz Introduction to The Social Graph &#8212;&#8212;&#8212;
The social graph is [...]]]></description>
			<content:encoded><![CDATA[<p><img class="size-thumbnail wp-image-342" title="The Story Behind The Delver Kid Image" src="http://blog.karmona.com/wp-content/uploads/2008/12/more-kids-150x150.jpg" alt="The Story Behind The Delver Kid Image" width="150" height="150" align="left" />I was analyzing, dreaming, monitoring, crawling, debugging, reading, breathing, cursing, scaling, visualizing and learning the social graph for the last couple of months and I thought it might be a good idea to write a little something about <strong>The </strong><strong>Social Graph Challenge</strong> with a pragmatic twist on few other <a title="Brad's Thoughts on the Social Graph" href="http://www.bradfitz.com/social-graph-problem/">common</a> <a title="Pragmatic Twist on Social Graph Concepts and Issues | ReadWriteWeb" href="http://www.readwriteweb.com/archives/social_graph_concepts_and_issues.php">concepts</a>.</p>
<p> </p>
<p style="text-align: center; "><strong>&#8212;&#8212;&#8212; Blitz Introduction to The Social Graph </strong>&#8212;&#8212;&#8212;</p>
<p>The social graph is just a simplified mathematic <a title="Graph Theory" href="http://en.wikipedia.org/wiki/Graph_theory">abstraction</a> when nodes are people and edges are relations between them.</p>
<p>In the last decade the internet have became more social than was ever expected it to be with the rapid growth and adaptation of social networks, social media and user-generated contributions and interactions. </p>
<p>Nowadays, there is a growing feeling that it is feasible to model and map the social web into a real-life social graph replication.</p>
<p style="text-align: center; "><img class="size-full wp-image-355 aligncenter" style="border: 0px initial initial;" title="Delver Starfish" src="http://blog.karmona.com/wp-content/uploads/2008/12/starfish.jpg" alt="Delver Starfish" width="195" height="130" /></p>
<p style="text-align: center; "><strong>&#8212;&#8212;&#8212; Pragmatic Overview on The Social Graph Challenge &#8212;&#8212;&#8212;</strong></p>
<p style="text-align: center; "><strong><span style="font-weight: normal; "><a title="Modeling the Social Graph" href="#Modeling">Modeling</a> | <a title="Building the Social Graph" href="#Building">Building</a> | <a title="Processing the Social Graph" href="#Processing">Processing</a> | <a title="The Social Graph Size" href="#Size">Size</a> | <a title="Two Cents on Social Graph Architecture" href="#Architecture">Architecture</a></span></strong></p>
<p style="text-align: left;"><strong>(1)<a name="Modeling"></a> Modeling the Social Graph</strong></p>
<p><strong>*** Vocabulary </strong></p>
<p>To better understand how complicated it is to create a vocabulary for expressing metadata about people, their interests, relationships and activities you should simply pay a quick visit to the <a title="The FOAF Project" href="http://www.foaf-project.org/">FOAF Project</a> <a title="FOAF Technical Spec" href="http://xmlns.com/foaf/spec/">technical specification page</a></p>
<p>The FOAF (&#8220;Friend of a Friend&#8221;) <a title="The FOAF Project" href="http://www.foaf-project.org/">Project</a>  has the most comprehensive model available today and it is still lacking some basic modeling granularity e.g. time awareness metadata, no privacy model, <a title="FOAF Relationship Model | Term-Knows" href="http://xmlns.com/foaf/spec/#term_knows ">poor relationship model</a> </p>
<p><strong>*** The Social Cloud</strong></p>
<p>It is common mistake to forget that people are more than just flat internet identities (e.g. <a title="Moti Karmona | Linkedin" href="http://www.linkedin.com/in/karmona">Linked profile</a>) and to complete the profile modeling we must add all their content to the graph e.g. Personal Blog, Flickr images, YouTube Videos, Delicious bookmarks, Tweets, Blog Comments etc.</p>
<p>Modeling all these content and consumption types will yield a broader definition (a.k.a. The Social Cloud) with even more complex modeling challenges.</p>
<p style="text-align: center; "><img class="size-full wp-image-345 aligncenter" style="border: 0px initial initial;" title="More Delver Kids" src="http://blog.karmona.com/wp-content/uploads/2008/12/kids.jpg" alt="More Delver Kids" width="276" height="107" /></p>
<p><strong>(2)<a name="Building"></a> Building the Social Graph</strong></p>
<p><strong>*** The Paradigm Shift</strong></p>
<p>While conventional internet crawlers, follow hyperlinks within web pages and <a title="Lynx, a text-mode web browser" href="http://www.delorie.com/web/lynxview.cgi?url=http://blog.karmona.com">treat pages as plain-text</a>, social crawlers should have social-&#8221;awareness&#8221;:</p>
<ul>
<li>Identify and extract people identities fragments (e.g. social network profiles, blog authors)</li>
<li>Identify relationships (e.g. social networks connections, blog-roll fans)</li>
<li>Identify relations between content and people (author, bookmark, reference etc.)</li>
</ul>
<p><strong>*** The Standards Dilemma – No Silver Bullet</strong></p>
<p>Beside <a title="FOAF on Wikipedia" href="http://en.wikipedia.org/wiki/FOAF_(software)">FOAF</a>, there are several open standard like <a title="RSS | Wikipedia" href="http://en.wikipedia.org/wiki/RSS_(file_format)">RSS</a>, <a title="ATOM | Wikipedia" href="http://en.wikipedia.org/wiki/Atom_(standard)">ATOM</a> for content syndication and <a title="Microformats | Wikipedia" href="http://en.wikipedia.org/wiki/Microformats">microformats</a> like <a title="HCard | Wikipedia" href="http://en.wikipedia.org/wiki/HCard">HCard</a>, <a title="XFN | Wikipedia" href="http://en.wikipedia.org/wiki/XHTML_Friends_Network">XFN</a> for profiles and network discovery,  that seems promising and can help with the identification quest but although this is being pushed by giants (e.g. <a title="Google Social Graph API" href="http://code.google.com/intl/iw/apis/socialgraph/">Google Social Graph API</a>) the adaptation is <a title="List of FOAF Containers | Open Social Directory" href="http://web.archive.org/web/20080205184017/http://www.opensocialdirectory.org/wiki/List_of_FOAF_Containers">still</a> <a title="Foaf Sites" href="http://esw.w3.org/topic/FoafSites">low</a> and have many correctness and corruptions issues - e.g. <a title="Claimed to be WordPress using XFN" href="http://socialgraph-resources.googlecode.com/svn/trunk/samples/findyours.html?q=http://wordpress.com">all these people</a> claimed to be Wordpress.com using the XFN (rel=&#8221;me&#8221;) microformat </p>
<p><strong>*** The Promise of Structured Sources (a.k.a. The structure myth)</strong></p>
<p>The <strong>Myth</strong>: Most social Media sites (e.g. <a title="Moti Karmona | Facebook" href="http://www.facebook.com/profile.php?id=673836059">FaceBook</a>, <a title="Moti Karmona | LinkedIn" href="http://www.linkedin.com/in/karmona">LinkedIn</a>, <a title="Moti Karmona | MySpace" href="http://www.myspace.com/moti_karmona">MySpace</a>, <a title="Moti Karmona | Flickr" href="http://flickr.com/people/moti_karmona/">Flickr</a> etc.) have a public available structured profile pages so in principle all need to be done is some XPath magic on HTML DOM to finish the parsing task.</p>
<p><strong>But</strong>… Most of the work isn&#8217;t parsing but data modeling which require deep understanding of each site user model and usage</p>
<ul>
<li>Many Social Media sites have <a title="EULA | Wikipedia" href="http://en.wikipedia.org/wiki/EULA">EULA</a> restrictions which prohibit <span style="text-decoration: underline;">any</span> access or use to the site content but if you are lucky you will get some offical API&#8217;s instead.</li>
<li>Social Media sites have many (~weekly) structural changes in their CSS/HTML.</li>
<li>Social Media sites have many changes (~monthly) in their data privacy policy and have complex privacy model which create inconsistency in profile, network and content presentation.</li>
</ul>
<p><strong>*** Few more Challenges with Social Crawling:</strong></p>
<ul>
<li><strong>Privacy-Ownership-Control </strong>- The <a title="The Data Portability Project" href="http://www.dataportability.org/">data</a> is the property of the <a title="A Bill of Rights for Users of the Social Web" href="http://opensocialweb.org/2007/09/05/bill-of-rights/">users</a></li>
<li><strong>Unstructured Source</strong>s &#8211; It isn&#8217;t a trivial task to extract social entities from unstructured sources (e.g. blogs) and might require offline semantic processing on your collected data.</li>
<li><strong>Cross Network Relations</strong> &#8211; How to find those important hidden cross network relations e.g. between the biggest reliable network graph (e.g. <a title="Moti Karmona | Facebook" href="http://www.facebook.com/profile.php?id=673836059">FaceBook</a>) and the richest content contributions (e.g. <a title="State of the Blogosphere 2008 | Karmona.com" href="http://blog.karmona.com/index.php/2008/09/22/technorati-state-of-the-blogosphere/">Blogosphere</a>, YouTube, <a title="Moti Karmona | Flickr" href="http://flickr.com/people/moti_karmona/">Flickr</a> etc.)</li>
<li><strong>Identify Social Signs</strong> (e.g. Social Widgets, Comments, Blogroll etc.)</li>
<li><strong>Social Graph Update Mechanism</strong> and crawlers distribution</li>
<li>Profiles <a title="Google URL Canonization" href="http://code.google.com/intl/iw/apis/socialgraph/docs/canonical.html ">Canonization</a> </li>
<li>&#8230;</li>
</ul>
<p style="text-align: center; "><img class="size-full wp-image-346 aligncenter" style="border: 0px initial initial;" title="Delver Rodents" src="http://blog.karmona.com/wp-content/uploads/2008/12/rodents.jpg" alt="Delver Rodents" width="222" height="82" /></p>
<p><strong>(3)<a name="Processing"></a> Processing the Social Graph</strong></p>
<p><strong>*** The Identity Crisis</strong></p>
<ul>
<li><strong>Filtering Impersonation</strong> e.g. <a title="Sites claimed to be TechCrunch using XFN" href="http://socialgraph-resources.googlecode.com/svn/trunk/samples/findyours.html?q=techcrunch.com">all these site</a> use XFN (<em>rel=&#8221;me&#8221;</em>) to &#8220;say&#8221; they are <a title="TechCrunch" href="http://www.techcrunch.com">TechCrunch</a></li>
<li><strong>Identify </strong>and have different modeling for <strong>n</strong><strong>on-individual identities</strong> (groups, shared authorship) e.g. <a title="Knitter Blogs" href="http://zimmermaniacs.blogspot.com/">Knitters Blog</a> with 629 knitting contributors :)</li>
<li>Strive to merge identities  (a.k.a. profile fusion) when possible e.g. Moti Karmona in <a title="Moti Karmona | LinkedIn" href="http://www.linkedin.com/in/karmona">LinkedIn</a> and Moti Karmona in <a title="Moti Karmona | Facebook" href="http://www.facebook.com/profile.php?id=673836059">FaceBook </a>could be two instances (/profiles) of the same person and merging this profiles will enable:
<ul>
<li>Cross network connectedness =&gt; Bridging between network richness (e.g. <a title="Moti Karmona | Facebook" href="http://www.facebook.com/profile.php?id=673836059">FaceBook</a>) to content richness (e.g. <a title="State of the Blogosphere 2008 | Karmona.com" href="http://blog.karmona.com/index.php/2008/09/22/technorati-state-of-the-blogosphere/">Blogosphere</a>)</li>
<li>Richer people representation using identities aggregation =&gt; Richer networks</li>
</ul>
</li>
<li><strong>The Fusion </strong><strong>Challenge</strong>: You can pay a short visit to the <a title="Social Aggregators | Mashable" href="http://mashable.com/2007/07/17/social-network-aggregators/">nearest social aggregator directory</a> but you can&#8217;t get away from some more complex algorithms for <a title="Disambiguating Web Appearances of People in a Social Network | Ron Bekkerman" href="http://www.www2005.org/cdrom/docs/p463.pdf">disambiguating web appearances of people</a> with more common names like <a title="Common name like James Smith" href="http://blog.karmona.com/index.php/2008/07/07/mary-and-james-smith/">James Smith</a> who doesn’t &#8220;play&#8221; in the social aggregation playground (like 98.7% of the graph).</li>
</ul>
<p><strong>*** </strong><strong>Graph Enrichment </strong></p>
<ul>
<li><strong>I</strong><strong>mplicit Relations</strong> - Enrich the network with “implicit” relationships (Colleagues, Graduates, Neighbors) e.g. I have a LinkedIn profile and all my connections are hidden for public crawlers but the fact I work in <a title="Delver - Search Your World" href="http://www.delver.com">Delver</a>  is public so if <a title="Delver - Search Your World" href="http://www.delver.com">Delver</a> is startup company with less than ~50 people than there is a good chance I know all the other workers in <a title="Delver - Search Your World" href="http://www.delver.com">Delver</a> =&gt; This simple heuristic rule can create an implicit relation between me and other workers of <a title="Delver - Search Your World" href="http://www.delver.com">Delver</a> without me explicitly claim that I know them (as I did in <a title="Moti Karmona | Facebook" href="http://www.facebook.com/profile.php?id=673836059">FaceBook</a>)</li>
<li>Generating the <strong>inverted relations </strong>when needed Followed vs. Follower</li>
<li>Deeper, <strong>s</strong><strong>emantic extraction </strong>of social entities <strong>u</strong><strong>n-structured content</strong></li>
</ul>
<p style="text-align: center; "><img class="size-full wp-image-347 aligncenter" style="border: 0px initial initial;" title="Delver Faces" src="http://blog.karmona.com/wp-content/uploads/2008/12/faces.jpg" alt="Delver Faces" width="328" height="113" /></p>
<p><strong>(4)<a name="Size"></a> The Social Graph Size</strong></p>
<p>Let&#8217;s have some quick (and very dirty) guesstimates:</p>
<p><a title="Internet Statistics" href="http://www.internetworldstats.com/stats.htm">World Population</a> is approx. ~6.7 Billion / <strong>22</strong>% Internet penetration =&gt; <strong>1.5 Billion internet users</strong> </p>
<p>Let&#8217;s say 65% of these users have some kind of presence in Social Media (~20% have more than one) =&gt; <strong>~1 Billion Profiles <span style="font-weight: normal;">x</span><span style="font-weight: normal;"> ~</span>10<span style="font-weight: normal;"> content items per profile</span></strong></p>
<p>+ <strong>1 Billion Profiles Nodes <span style="font-weight: normal;">x ~<strong>100 </strong><a title="Dunbars Friends " href="http://blog.karmona.com/index.php/2008/07/07/dunbars-friends/">network relations per profile</a>  =&gt; ~<strong>110 Billion Graph Edges + ~10 Billion Graph Nodes</strong></span></strong></p>
<p>It is highly depended on graph implementation but with this numbers, you can easily find yourself with <strong>~1-2 Terabytes of graph metadata alone</strong> (<span style="text-decoration: underline;">without </span>contents and profiles<span style="color: #ff0000;">*</span>) </p>
<p style="text-align: center; "><img class="size-full wp-image-348 aligncenter" style="border: 0px initial initial;" title="Delver Diving Suite " src="http://blog.karmona.com/wp-content/uploads/2008/12/diving-suite.jpg" alt="Delver Diving Suite " width="235" height="157" /></p>
<p><strong>(5)<a name="Architecture"></a> Two Cents on Social Graph Architecture</strong></p>
<p>Updating and querying gigantic, dynamic, distributed, directed, cyclic, colored, weighted graph have &#8220;some&#8221; algorithmic, computational complexity &#8211; a little more complex than a blog post could cover…;-)</p>
<p>You can take a quick look at the tiny 15 Giga, 25 million nodes <a title="LinkedIn Architecture" href="http://hurvitz.org/blog/2008/06/linkedin-architecture">graph implementation in LinkedIn</a> to get a glimpse to the technological challenge … </p>
<p><span style="color: #ff0000;">*</span> Note: Indexing content and profiles data (e.g. for Building a <a title="Delver.com - Search Your World" href="http://www.delver.com">Social Search Engine</a>) is an architecture challenge equivalent to any modern search engine with ~10 Billion documents <a title="Teh Size of the Internet" href="http://blog.karmona.com/index.php/2007/09/26/the-size-of-the-internet/">index</a></p>
<p style="text-align: center; "><img class="size-full wp-image-349 aligncenter" style="border: 0px initial initial;" title="The Delver Kid" src="http://blog.karmona.com/wp-content/uploads/2008/12/delver-kid.jpg" alt="The Delver Kid" width="192" height="198" /></p>
<p>This is only the tip of the <a title="Delver - Search Your World" href="http://www.delver.com">iceberg</a> but it is more than enough for one blog post ;)</p>
<p>_________</p>
<p>Credit: <span style="text-decoration: underline;">A</span><span style="text-decoration: underline;">ll </span>the images were taken from <a title="Tamar Hak" href="http://tamarhak.com">Tamar Hak</a>&#8217;s <span style="text-decoration: underline;">amazing </span>artwork &#8211; creating The Delver Kid image.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.karmona.com/index.php/2008/12/30/the-social-graph-challenge/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Web Number-Dot-Zero for Dummies</title>
		<link>http://blog.karmona.com/index.php/2007/08/01/web-number-dot-zero-for-dummies/</link>
		<comments>http://blog.karmona.com/index.php/2007/08/01/web-number-dot-zero-for-dummies/#comments</comments>
		<pubDate>Wed, 01 Aug 2007 19:57:00 +0000</pubDate>
		<dc:creator>Moti Karmona &#124; מוטי קרמונה</dc:creator>
				<category><![CDATA[Internet]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Social Network]]></category>
		<category><![CDATA[Web 2.0]]></category>

		<guid isPermaLink="false">http://blog.karmona.com/index.php/2007/08/01/web-number-dot-zero-for-dummies/</guid>
		<description><![CDATA[Well… I couldn&#8217;t resist the urge, so here it is:
Web 1.0 is about browsing… (a.k.a. &#8220;Been there, done that&#8221;)
Web 2.0 is about Social Collaboration, Blogs, Wikis, Mash-ups and 42 other things (a.k.a. &#8220;You are here…&#8221;)
Web 3.0 is about the Semantic Web, Geospatial Web, Artificial Intelligence and broader range of next-generation technologies and approaches for making [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://bp0.blogger.com/_yHZeAQccbHo/RrDl_MtakoI/AAAAAAAAAfY/zeTprmQJNIA/s1600-h/web2.0.gif"></a><a href="http://blog.karmona.com/wp-content/uploads/2007/08/web2_0.png" title="Web 2.0"><img align="left" src="http://blog.karmona.com/wp-content/uploads/2007/08/web2_0.thumbnail.png" alt="Web 2.0" title="Web 2.0" /></a>Well… I couldn&#8217;t resist the urge, so here it is:</p>
<p><strong>Web 1.0</strong> is about browsing… (a.k.a. &#8220;Been there, done that&#8221;)</p>
<p><strong>Web 2.0</strong> is about Social Collaboration, Blogs, Wikis, Mash-ups and 42 other things (a.k.a. &#8220;You are here…&#8221;)</p>
<p><strong>Web 3.0</strong> is about the Semantic Web, Geospatial Web, Artificial Intelligence and broader range of next-generation technologies and approaches for making the Web smarter &#8211; towards providing easy, transparent and organized access to the world’s data, information, and knowledge. (a.k.a. Just started working on it ;-)</p>
<p><strong>Web 4.0</strong> already started to buzz around &#8220;Web OS&#8221; (a.k.a. Too early)</p>
<p><strong>Web 5.0</strong> is whatever comes after Web 4.0… (a.k.a. Really not relevant)</p>
<p><span style="font-weight: bold">Web 6.0</span> is something I want to trademark before anyone else :-)</p>
<p>etc.</p>
<p>My two cents: The web-some-number-dot-zero is a really a weird buzz, trying to tag the obvious evolution of internet but I will surely do my best to make money out of it……</p>
<p>(4 Pasha ;-)</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.karmona.com/index.php/2007/08/01/web-number-dot-zero-for-dummies/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
