Karmona Pragmatic Blog

Don't get overconfident… Tiny minds also think alike

Karmona Pragmatic Blog

The GeoSpatial Cloud

November 1st, 2010 by Moti Karmona | מוטי קרמונה · 3 Comments

In continue to my geo distance post, I have decided to post something on the simplest cloud architecture (RDS) for location based services (LBS)

IMHO, (to cut a long story short :) Amazon RDS is more than enough for most geo location applications.

Amazon Relational Database Service (Amazon RDS) is a web service that makes it easy to set up, operate, and scale a relational database in the cloud.

Amazon RDS gives you access to the full capabilities of a familiar MySQL database. This means the code, applications, and tools you already use today with your existing MySQL databases work seamlessly with Amazon RDS.

Two preliminary steps:
* Start a small DB instance with the kind help of AWS management console (image above)
* Use the RDS instance as if it is your “disruptive”  MySQL instance e.g. manage it using MySQL Workbench 5.2.29

“Flirting” with MySQL spatial capabilities (which seems to be “fully” supported by AWS RDS)

* Create the (MyISAM) Table with spatial index

CREATE TABLE `locations` (
`lat` decimal(10,6) DEFAULT NULL,
`long` decimal(10,6) DEFAULT NULL,
`loc` point NOT NULL,
`name` varchar(45) DEFAULT NULL,
SPATIAL KEY `loc` (`loc`)

* Insert few values to populate your table

INSERT INTO locations VALUES(1,40.748433, -73.985655, GeomFromText(‘POINT(40.748433 -73.985655)’), ‘The Empire State Building‘);
INSERT INTO locations VALUES(2, 40.689166, -74.044444, GeomFromText(‘POINT(40.689166 -74.044444)’), ‘The Statue of Liberty’);
INSERT INTO locations VALUES(3, 40.758611, -73.979166, GeomFromText(‘POINT(40.758611 -73.979166)’), ‘Rockefeller Center’);
INSERT INTO locations VALUES(4, 40.757266, -73.985838, GeomFromText(‘POINT(40.757266 -73.985838)’), ‘Times Square’);
INSERT INTO locations VALUES(5, 40.7527, -73.9818, GeomFromText(‘POINT(40.7527 -73.9818)’), ‘New York Public Library’);

* Execute a simple test drive query which returns all the locations and their distance from ‘The Empire State Building’ + Comparing two distance calculation methods (1) MySQL euclidean calculation (2) Haversine calculation (results below)

SELECT locations.name,
GLength(LineStringFromWKB(LineString(locations.loc, GeomFromText(‘POINT(40.748433 -73.985655)’))))*100
AS euclidean,
6378.1 * 2 * ASIN(SQRT(
POWER(SIN(RADIANS(40.748433 – locations.lat) / 2),2) +
COS(RADIANS(40.748433)) * COS(RADIANS(locations.lat) )
* POWER(SIN((RADIANS(-73.985655 – locations.long)) /2), 2) )) AS haversine
FROM locations
— HAVING euclidean < 1
— ORDER BY euclidean ASC LIMIT 10;

Two Surprises/Issues

(1) SRID (Spatial Reference Identifier) support in MySQL is a disgrace – In MySQL, the SRID value is just an integer associated with the geometry value. All calculations are done assuming Euclidean (planar) geometry.

Possible Workarounds: (1) Euclidean calculation can be enough (2) Use Haversine function if you need the accuracy

(2) MySQL ‘Where’ clause can’t use column aliases for filtering

Possible Workarounds: (1) Use ‘Having’ clause (see above) (2) Use the explicit function or field and not the alias

*** *** *** *** *** *** *** *** *** *** *** ***

Related/Interesting reference – Geo Distance Search with MySQL (Presentation | 2008)

Important note: To help new AWS customers get started in the cloud, AWS is introducing a new free usage tier. Beginning November 1, new AWScustomers will be able to run a free Amazon EC2 Micro Instance for a year, while also leveraging a new free usage tier for Amazon S3, Amazon Elastic Block Store, Amazon Elastic Load Balancing, and AWSdata transfer – Very Exciting Times!!!

Disturbing unrelated fact: Starting in 1931, every graduate of the Japanese Naval Academy was asked: “How would you carry out a surprise attack on Pearl Harbor?”

Tags: Cloud · Development · Disruptive Technology · Geo

3 responses so far ↓

  • 1  Shlomo // Nov 2, 2010 at 9:33 am

    Very nice article :)

    Regarding the two issues you mentioned – if you’re using the Haversine function, you are performing a table scan. The solution is to use both that, and the mysql myisam index to filter data. One does the rough filtering and the other one the finer one.

    The “issue” with mysql is an issue with any database – you actually evaluate the “select” part of the query, last. So the aliases are not set until after the “where” part. That’s the ANSI standard.
    I also believe that it won’t re-compute the whole computation twice, if it sees it’s the same. At least SQL Server wouldn’t.

    You also did not mention that a big no-no with mysql here is that it need the table to be MyISAM which has it’s set of problems. PostgreSQL for example, has this support built in it’s (only) storage engine, which is transactional.

  • 2  Moti Karmona | מוטי קרמונה // Nov 2, 2010 at 3:44 pm

    Hi Shlomo,

    Thanks! How are you? :)
    All is true but can you please clarify which set of problems make MyISAM a big no-no?

    — Moti

  • 3  Shlomo // Nov 3, 2010 at 12:31 pm

    I’m good, getting better :)

    Well, you have the issue of locking for starters – MyISAM only has full table locks. No transactions, no crash recovery…
    In a sentence, it’s not durable, but it takes up less space on disk and is faster for non-concurrent workloads.
    Most OLTP apps use InnoDB and that’s the default storage engine in the 5.5 version even for system tables.

Leave a Comment

Allowed tags <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>