December 30th, 2008 by Moti Karmona | מוטי קרמונה · 2 Comments
I was analyzing, dreaming, monitoring, crawling, debugging, reading, breathing, cursing, scaling, visualizing and learning the social graph for the last couple of months and I thought it might be a good idea to write a little something about The Social Graph Challenge with a pragmatic twist on few other common concepts.
——— Blitz Introduction to The Social Graph ———
The social graph is just a simplified mathematic abstraction when nodes are people and edges are relations between them.
In the last decade the internet have became more social than was ever expected it to be with the rapid growth and adaptation of social networks, social media and user-generated contributions and interactions.
Nowadays, there is a growing feeling that it is feasible to model and map the social web into a real-life social graph replication.

——— Pragmatic Overview on The Social Graph Challenge ———
Modeling | Building | Processing | Size | Architecture
(1) Modeling the Social Graph
*** Vocabulary
To better understand how complicated it is to create a vocabulary for expressing metadata about people, their interests, relationships and activities you should simply pay a quick visit to the FOAF Project technical specification page
The FOAF (“Friend of a Friend”) Project has the most comprehensive model available today and it is still lacking some basic modeling granularity e.g. time awareness metadata, no privacy model, poor relationship model
*** The Social Cloud
It is common mistake to forget that people are more than just flat internet identities (e.g. Linked profile) and to complete the profile modeling we must add all their content to the graph e.g. Personal Blog, Flickr images, YouTube Videos, Delicious bookmarks, Tweets, Blog Comments etc.
Modeling all these content and consumption types will yield a broader definition (a.k.a. The Social Cloud) with even more complex modeling challenges.

(2) Building the Social Graph
*** The Paradigm Shift
While conventional internet crawlers, follow hyperlinks within web pages and treat pages as plain-text, social crawlers should have social-”awareness”:
- Identify and extract people identities fragments (e.g. social network profiles, blog authors)
- Identify relationships (e.g. social networks connections, blog-roll fans)
- Identify relations between content and people (author, bookmark, reference etc.)
*** The Standards Dilemma – No Silver Bullet
Beside FOAF, there are several open standard like RSS, ATOM for content syndication and microformats like HCard, XFN for profiles and network discovery, that seems promising and can help with the identification quest but although this is being pushed by giants (e.g. Google Social Graph API) the adaptation is still low and have many correctness and corruptions issues - e.g. all these people claimed to be Wordpress.com using the XFN (rel=”me”) microformat
*** The Promise of Structured Sources (a.k.a. The structure myth)
The Myth: Most social Media sites (e.g. FaceBook, LinkedIn, MySpace, Flickr etc.) have a public available structured profile pages so in principle all need to be done is some XPath magic on HTML DOM to finish the parsing task.
But… Most of the work isn’t parsing but data modeling which require deep understanding of each site user model and usage
- Many Social Media sites have EULA restrictions which prohibit any access or use to the site content but if you are lucky you will get some offical API’s instead.
- Social Media sites have many (~weekly) structural changes in their CSS/HTML.
- Social Media sites have many changes (~monthly) in their data privacy policy and have complex privacy model which create inconsistency in profile, network and content presentation.
*** Few more Challenges with Social Crawling:
- Privacy-Ownership-Control - The data is the property of the users
- Unstructured Sources – It isn’t a trivial task to extract social entities from unstructured sources (e.g. blogs) and might require offline semantic processing on your collected data.
- Cross Network Relations – How to find those important hidden cross network relations e.g. between the biggest reliable network graph (e.g. FaceBook) and the richest content contributions (e.g. Blogosphere, YouTube, Flickr etc.)
- Identify Social Signs (e.g. Social Widgets, Comments, Blogroll etc.)
- Social Graph Update Mechanism and crawlers distribution
- Profiles Canonization
- …

(3) Processing the Social Graph
*** The Identity Crisis
- Filtering Impersonation e.g. all these site use XFN (rel=”me”) to “say” they are TechCrunch
- Identify and have different modeling for non-individual identities (groups, shared authorship) e.g. Knitters Blog with 629 knitting contributors :)
- Strive to merge identities (a.k.a. profile fusion) when possible e.g. Moti Karmona in LinkedIn and Moti Karmona in FaceBook could be two instances (/profiles) of the same person and merging this profiles will enable:
- Cross network connectedness => Bridging between network richness (e.g. FaceBook) to content richness (e.g. Blogosphere)
- Richer people representation using identities aggregation => Richer networks
- The Fusion Challenge: You can pay a short visit to the nearest social aggregator directory but you can’t get away from some more complex algorithms for disambiguating web appearances of people with more common names like James Smith who doesn’t “play” in the social aggregation playground (like 98.7% of the graph).
*** Graph Enrichment
- Implicit Relations - Enrich the network with “implicit” relationships (Colleagues, Graduates, Neighbors) e.g. I have a LinkedIn profile and all my connections are hidden for public crawlers but the fact I work in Delver is public so if Delver is startup company with less than ~50 people than there is a good chance I know all the other workers in Delver => This simple heuristic rule can create an implicit relation between me and other workers of Delver without me explicitly claim that I know them (as I did in FaceBook)
- Generating the inverted relations when needed Followed vs. Follower
- Deeper, semantic extraction of social entities un-structured content

(4) The Social Graph Size
Let’s have some quick (and very dirty) guesstimates:
World Population is approx. ~6.7 Billion / 22% Internet penetration => 1.5 Billion internet users
Let’s say 65% of these users have some kind of presence in Social Media (~20% have more than one) => ~1 Billion Profiles x ~10 content items per profile
+ 1 Billion Profiles Nodes x ~100 network relations per profile => ~110 Billion Graph Edges + ~10 Billion Graph Nodes
It is highly depended on graph implementation but with this numbers, you can easily find yourself with ~1-2 Terabytes of graph metadata alone (without contents and profiles*)

(5) Two Cents on Social Graph Architecture
Updating and querying gigantic, dynamic, distributed, directed, cyclic, colored, weighted graph have “some” algorithmic, computational complexity – a little more complex than a blog post could cover…;-)
You can take a quick look at the tiny 15 Giga, 25 million nodes graph implementation in LinkedIn to get a glimpse to the technological challenge …
* Note: Indexing content and profiles data (e.g. for Building a Social Search Engine) is an architecture challenge equivalent to any modern search engine with ~10 Billion documents index

This is only the tip of the iceberg but it is more than enough for one blog post ;)
_________
Credit: All the images were taken from Tamar Hak’s amazing artwork – creating The Delver Kid image.
Tags: Delver · Disruptive Technology · Search · Semantic Web · Social Network · Web 3.0
October 20th, 2008 by Moti Karmona | מוטי קרמונה · 4 Comments
I am very “proud” to introduce the ultimate geek widget: Base 64 Encode / Decode Online Widget
Q. Where can I see this dark magic?
A. Here… :)
Q. How can I add this cool Base64 widget to my blog?
A. Simply copy-paste this little script:
___________________________________________
<script type=”text/javascript” src=”http://blog.karmona.com/base64widget.js” ></script>
___________________________________________
Q. Does this blog widget support ALL blog platforms?
A. Sure… (including dasBlog :)
Please contact me if you have any issues / questions / suggestions,
Have fun!
Tags: Blogging · Delver · Development · Tools · Widgets
October 4th, 2008 by Moti Karmona | מוטי קרמונה · 2 Comments

When “The Moscow Cats Theater” came to New York, the Russian clown Yuri Kuklachev was interviewed: “the secret of training them is realizing that you can’t force cats to do anything [...] If the cat likes to sit you can’t force her to do anything else [...] Each cat likes to do her own trick [...] Maruska is the only one who does the handstand. I find the cat and see what they like to do and use that in the show [...] I have a cat now that loves to be in the water…”
– REUTERS, 2006
__________________________________________
Personally, I think that managing engineers is much more complicated than herding cats (although I didn’t have the twisted pleasure to herd a cat yet)
When you go out of your way to hire the best people around than soon enough you will find yourself herding a superior, class A, hyper-developed mutant Ligers* who are much more knowledgeable than the herder (a.k.a. you)
In this environment you have to learn to simply trust your people (although this is not simple at all :), mark the vision, let them loose and only help to get rid of the stones in their way (this concept was best described as the Open Kimono** policy in Peopleware)
Well…. Managing the Delver Engineers is like Herding Legendary Ligers and you need to make a superior effort to see what these ligers “likes to do” and run fast enough to set the Vision and move the rocks out of the way.
__________________________________________
* The Liger, is a (huge) hybrid cross between a male lion and a female tiger
** Open Kimono Attitude: You take no steps to defend yourself from the people you have put in positions of trust.
By the way, The best answer I found on the origin of the term “Herding Cats” was in Google Answers
Tags: Delver · Development · Leadership · Management · People · Peopleware · Project Management · Software Management
July 15th, 2008 by Moti Karmona | מוטי קרמונה · No Comments
Tags: Delver · Internet
April 12th, 2008 by Moti Karmona | מוטי קרמונה · No Comments

* My daughter first words were ‘Aaa…Baa’ :-)
* The Delver Alpha was released and deployed – Want an invite?
* Ron Gross, Ofer Egozi and Tal Shiri have joined Delver’s R&D team.
* Video Bitz is a striking success story
Tags: Delver · Recruiting · Software Management
February 1st, 2008 by Moti Karmona | מוטי קרמונה · No Comments
Updates from “Rabbit-Hole”
We came out of stealth mode at Demo Conference…
We have a new Facebook page @ http://www.facebook.com/pages/Herseliya-Israel/Delver/7851012277, our IT operation is ready …and we are still looking for Smart people to join us…
Tags: Delver · Demo
January 4th, 2008 by Moti Karmona | מוטי קרמונה · No Comments

“Oh dear! Oh dear! I shall be too late!” (Alice’s Adventures in Wonderland)
I am currently deep down the Rabbit-Hole…
Cya after the Delver* Alpha ;-)
– Moti
Tags: Delver
November 18th, 2007 by Moti Karmona | מוטי קרמונה · No Comments
Delver is looking for great people to join our A-Team (Excuse my “eighties”)
Please email us (jobs @ delver. com) with resumes if you are ready for the delver challenge.
Good Luck!
Tags: Delver
November 8th, 2007 by Moti Karmona | מוטי קרמונה · 1 Comment
My rough estimation is that the number of software project managers in the world is smaller in (at least) one scale from the conceived time-estimation techniques and this post is my humble four-cents contribution on how to do pragmatic time estimation for software projects (just finished one in Delver).
- Start with the mother of all lists to store your Product Manager wish list– We use eScrum Product-Backlog to store our work-items
- Prioritized them – We use 0-Yesterday; 1-Must; 2-Important; 3-Nice-to-Have and 4-”Forget-About-It”… ;-)
- Get relative estimations on all items
- Granularity is the bronze-bullet for time estimations – Strive to the finest grained possible in reasonable time-frame e.g. We usually aim for 2-5 days granularity in 2-3 days of time-boxed-estimation-period since the finest granularity in planning without reasonable time-box might take twice the time of doing the planned work (a.k.a. The Estimation Paradox)
- Experience can turn your bronze bullet into silver one (ye ye, a silver one) – Relative estimation is calculated relatively upon a common scale of known work items from the team history e.g. We use Scrum “Story Points” and constantly measure the team velocity for time estimation adjustments
- Fibonacci sequence (0, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89 etc.) can be used to “embed” the complexity and risk of rough (with insufficient drilldown) estimations e.g. if your estimation granularity for specific task reach ~40 days then your pragmatic estimation should be around 55 days (= the closest Fibonacci sequence) since it is reasonable to believe your (insufficient) granularity conceals risk, complexity and unknowns issues which requires Fibonacci-like-”buffering”
- Strive to synchronize your time estimation techniques into very simple one – different time estimation conventions in the same development team is the 2nd reason for time delays. (I will give 0.95$ grant if you can guess what the 1st reason)
- I know I am different but personally, I do prefer to have “pragmatic hours” vs. the normal Agile “ideal hours” and to start the project when 1 “Story Point” = 1 “Pragmatic Day” so long everyone understand this will change as soon as you start the project and then you need to return to velocity tracking to calculate the end
- Don’t be naïve (a.k.a. “Ideal Days”) with two known flavors:
- Optimistic time estimations, assuming 24*7 of concentrated programming ability with no outside interference (a.k.a. no such thing)
- “Stupid” hand-waiving time estimations a.k.a. It is only 10 min to code this (but ~5 days to Integrate, Review, Design, Test, Schema and DAL changes, I18N support, Styling etc.)
- Get the rough project estimation = Sum of all product backlog story points / 22 (work days in month) / Number of relevant people
- Usually this calculation will show you don’t have enough time for the project (even without project dependencies buffer which can be added later)
- Start the “Tradeoff Game” – Try to cut items (content) based on the relative ROI
- Revalidate your priorities since they will be the main tool (beside dependencies) for creating the project work plan.
As I see it, estimating software projects in a realistic time-frame is a statistic prediction of chaotic, time-delay-series of events and will never be straightforward nor easy so you can only do your best in the estimation and then track the project as it goes and make the needed adaptation on the way upon crystal clear project priorities.
Good Luck!
Tags: Agile · Delver · Development · Management · Planning · Project Management · Scrum · Software Management
November 3rd, 2007 by Moti Karmona | מוטי קרמונה · 1 Comment
My boss have returned from the recent techcrunch with a “brand new multimedia and Internet-enabled quad-band GSM EDGE-supported” iPhone and missed (yet again) the trendy next generation iPhone Killer gadget - iPoor @ http://ipoor.org
Tags: Delver · Simplicity · gadgets