<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Santiago Palladino &#187; Database</title>
	<atom:link href="http://weblogs.manas.com.ar/spalladino/category/database/feed/" rel="self" type="application/rss+xml" />
	<link>http://weblogs.manas.com.ar/spalladino</link>
	<description>Another spot on the blogosphere</description>
	<lastBuildDate>Wed, 18 Aug 2010 20:30:53 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>SimpleDb SQL-Like Query Language</title>
		<link>http://weblogs.manas.com.ar/spalladino/2008/12/18/simpledb-sql-like-query-language/</link>
		<comments>http://weblogs.manas.com.ar/spalladino/2008/12/18/simpledb-sql-like-query-language/#comments</comments>
		<pubDate>Thu, 18 Dec 2008 13:17:06 +0000</pubDate>
		<dc:creator>spalladino</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[SimpleDb]]></category>

		<guid isPermaLink="false">http://weblogs.manas.com.ar/spalladino/?p=24</guid>
		<description><![CDATA[In my last post, I blogged about non-first normal form, SimpleDb in particular, and the problems that arose in the query language when dealing with multiple attributes, such as &#8220;a != b&#8221; not being the same as &#8220;not a = b&#8221;. The SimpleDb team has now released a comfortable SQL like query language to be [...]]]></description>
			<content:encoded><![CDATA[<p>In my <a href="http://weblogs.manas.com.ar/spalladino/?p=23">last post</a>, I blogged about non-first normal form, SimpleDb in particular, and the problems that arose in the query language when dealing with multiple attributes, such as <em>&#8220;a != b&#8221;</em> not being the same as <em>&#8220;not a = b&#8221;</em>. </p>
<p>The SimpleDb team has now released a comfortable <a href="http://docs.amazonwebservices.com/AmazonSimpleDB/2007-11-07/DeveloperGuide/">SQL like query language</a> to be used for selecting data, in addition to the old language.</p>
<p>What is most interesting about this language is how the multi-valued attributes issue is resolved, a problem that SQL (luckily) does not have to deal with.</p>
<blockquote><p><font face="verd">Each attribute is considered individually against the comparison conditions defined in the predicate. Item names are selected if <em>any</em> of the values match the predicate condition. To change this behavior, use the <strong>every()</strong> operator to return results where <em>every</em> attribute matches the query expression.</font></p>
</blockquote>
<p>The simple addition of the <em>every</em> keyword allows easy querying over multi valued attributes.
<p>Therefore, the query <font face="Courier New">select * from domain where stamp &gt; &#8217;100&#8242;</font> would return an item with two stamps valued 50 and 150. But <font face="Courier new">select * from domain where <strong>every</strong>(stamp) &gt; &#8217;100&#8242;</font> would not return it.
<p>It also allows for some interesting uses, such as <font face="Courier New">select * from domain where every(tag) in (&#8216;work&#8217;, &#8216;sports&#8217;)</font>; which will return all items tagged <em>only</em> with work and sports, since <em>every</em> tag in the item must be contained in the set for the item to be returned.</p>
<p>In a few words, an excellent addition to the engine by the SimpleDb team. Some minor quirks are still present (it is still necessary to include an attribute in the where section in order to be able to sort by it), but overall it is a great piece of work.</p>
]]></content:encoded>
			<wfw:commentRss>http://weblogs.manas.com.ar/spalladino/2008/12/18/simpledb-sql-like-query-language/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Non First Normal Form</title>
		<link>http://weblogs.manas.com.ar/spalladino/2008/11/20/non-first-normal-form/</link>
		<comments>http://weblogs.manas.com.ar/spalladino/2008/11/20/non-first-normal-form/#comments</comments>
		<pubDate>Thu, 20 Nov 2008 20:54:44 +0000</pubDate>
		<dc:creator>spalladino</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[SimpleDb]]></category>

		<guid isPermaLink="false">http://weblogs.manas.com.ar/spalladino/?p=23</guid>
		<description><![CDATA[Normalization is one of the key concepts involved when designing a good relational database model. There is quite a lot of theory behind this relatively simple concept as you can see from the wikipedia article. To make a long story short, you have to evaluate the dependencies between different attributes (for instance, all attributes in [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/Database_normalization">Normalization</a> is one of the key concepts involved when designing a good relational database model. There is quite a lot of theory behind this relatively simple concept as you can see from the wikipedia article. </p>
<p>To make a long story short, you have to evaluate the dependencies between different attributes (for instance, all attributes in a table depend on the primary key, but there can be many more dependencies among the attributes).</p>
<p>These dependencies will result in a grade of normalization of the model, going from first normal form up to the sixth one. The condition for first normal form is just having a primary key, so nearly everything you will do in a relational database will be in first normal form.</p>
<p>There are, however, cases in which first normal form is violated, such as non-relational databases. These are said to be in <a href="http://en.wikipedia.org/wiki/Database_normalization#Non-first_normal_form_.28NF.C2.B2_or_N1NF.29">non-first normal form</a>, and have the particularity of having multi-valued attributes.</p>
<table cellspacing="0" cellpadding="2" width="271" border="1">
<tbody>
<tr>
<td valign="top" width="98"><strong>Item</strong></td>
<td valign="top" width="171"><strong>Colours</strong></td>
</tr>
<tr>
<td valign="top" width="98">Doll</td>
<td valign="top" width="171">Pink, Red, White</td>
</tr>
<tr>
<td valign="top" width="98">Action Figure</td>
<td valign="top" width="171">Blue, Black, Brown</td>
</tr>
</tbody>
</table>
<p>This case would be solved in a relational database by creating an ItemColours table containing item-id and all the colours available (or even better, their keys). But non-relational databases, such as <a href="http://aws.amazon.com/simpledb/">SimpleDb</a>, can solve this in a single table (or domain, as they call it).</p>
<p>What&#8217;s more, SimpleDb doesn&#8217;t even require you to define the columns for each domain, as each item inserted can have its own attributes. Therefore, the domain ends up being a collection of items, each item being a set of multi-valued attributes.</p>
<table cellspacing="0" cellpadding="2" width="445" border="1">
<tbody>
<tr>
<td valign="top" width="95">ItemName</td>
<td valign="top" width="348">Attributes</td>
</tr>
<tr>
<td valign="top" width="100">John Doe</td>
<td valign="top" width="348">Age=20; Phone=555,556,557; Car=Escort,Mondeo</td>
</tr>
<tr>
<td valign="top" width="104">Alice Doe</td>
<td valign="top" width="348">Age=30; Phone=333; Book=DaVinci</td>
</tr>
</tbody>
</table>
<p>This leads to some very interesting schemas, which may not seem very natural to someone used to a standard database like SQL.</p>
<p>What&#8217;s most interesting about this is the query language. The fact that attributes are multi-valued leads to some unexpected results, such as <em>&#8216;not a = b&#8217;</em> having <strong>different</strong> semantics than <em>&#8216;a != b&#8217;</em>. Let&#8217;s see why by analyzing the <a href="http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1231&amp;categoryID=152">SimpleDb query language</a>.</p>
<p>Without diving into the grammar definition, a SimpleDb query can be represented by something like:</p>
<div style="border-right: gray 1px solid; padding-right: 4px; border-top: gray 1px solid; padding-left: 4px; font-size: 8pt; padding-bottom: 4px; margin: 20px 0px 10px; overflow: auto; border-left: gray 1px solid; width: 97.5%; cursor: text; max-height: 200px; line-height: 12pt; padding-top: 4px; border-bottom: gray 1px solid; font-family: consolas, 'Courier New', courier, monospace; background-color: #f4f4f4">
<pre style="padding-right: 0px; padding-left: 0px; font-size: 8pt; padding-bottom: 0px; margin: 0em; overflow: visible; width: 100%; color: black; border-top-style: none; line-height: 12pt; padding-top: 0px; font-family: consolas, 'Courier New', courier, monospace; border-right-style: none; border-left-style: none; background-color: #f4f4f4; border-bottom-style: none">not? [<span style="color: #006080">'att1'</span> comp <span style="color: #006080">'val1'</span> op <span style="color: #006080">'att1'</span> comp <span style="color: #006080">'val2'</span> op ...]
union|intersection
not? [<span style="color: #006080">'att2'</span> comp <span style="color: #006080">'val1'</span> op <span style="color: #006080">'att2'</span> comp <span style="color: #006080">'val2'</span> op ...]
union|intersection
...
</pre>
</div>
<p>This is, the query is composed by predicates (a predicate is anything between square brackets) joined by set operations: union or intersection, with the possibility to negate them.</p>
<p>All comparisons within a single predicate must be done against a single attribute, and these may be equals, not equals, greater than, etc. The boolean operators to join them are the usual: and|or.</p>
<p>So, if in our example we want all items (well, they are people, but they won&#8217;t mind if we treat them like items, won&#8217;t they?) aged above 25, we just query <em>['Age' &gt; '25']</em>. Range query? <em>['Age' &gt; '25' and 'Age' &lt; '35']</em>. Multi-predicate query? <em>['Age' &gt; '25' and 'Age' &lt; '35'] intersection ['Book' starts-with 'Da']</em>. So far so good.</p>
<p>Now to the mean cases. When you have multi predicate queries, the engine evaluates each predicate condition against <strong>all</strong> of the values of the corresponding attribute, and adds the item if <strong>any</strong> of the values matches the condition. </p>
<p>Let&#8217;s suppose we have a numeric attribute (let&#8217;s forget for now that SimpleDb does not have typed attributes and all are considered strings) called <em>Foo</em>. And we have the following items:</p>
<table cellspacing="0" cellpadding="2" width="274" border="1">
<tbody>
<tr>
<td valign="top" width="85"><strong>Item Name</strong></td>
<td valign="top" width="187"><strong>Foo</strong></td>
</tr>
<tr>
<td valign="top" width="89">A</td>
<td valign="top" width="187">25</td>
</tr>
<tr>
<td valign="top" width="92">B</td>
<td valign="top" width="187">25, 35</td>
</tr>
<tr>
<td valign="top" width="94">C</td>
<td valign="top" width="187">10, 40</td>
</tr>
<tr>
<td valign="top" width="96">D</td>
<td valign="top" width="187">5</td>
</tr>
<tr>
<td valign="top" width="96">E</td>
<td valign="top" width="187">&nbsp;</td>
</tr>
</tbody>
</table>
<p>The query <em>['Foo' &gt; '20' and 'Foo' &lt; '30']</em> will return both items A and B, because the 25 value of B will make the predicate true, and since any of the values verifies, B is added.</p>
<p>Now, if we pick the query <em>['Foo' &gt; '20'] intersection ['Foo' &lt; '30']</em>, it will return A, B <strong>and C</strong>. This is because the first predicate will match 40 in C, and the second one will match 10, and therefore C is added. Since the conditions are expressed on different predicates (although they refer to the same attribute) they are not evaluated over the same values.</p>
<p>Therefore <em>['Foo' &gt; '20' and 'Foo' &lt; '30']</em>&nbsp; <strong>is not the same as</strong> <em>['Foo' &gt; '20'] intersection ['Foo' &lt; '30']</em>.</p>
<p>An excellent example to understand these differences is the one shown in the already mentioned Query 101 article. Consider <em>['Foo' = '25' and 'Foo' = '35']</em>. This will <strong>never return anything</strong>, no matter the dataset, since we are requesting all items that have a value for Foo that is simultaneously 20 and 30.</p>
<p>On the other hand, <em>['Foo' = '25'] intersection ['Foo' = '35']</em>, will return B.</p>
<p>Now to the initial case, the not equals. Let&#8217;s pick the query <em>['Foo' != '25']</em>. It will clearly not return A. However, it will return B, because when the predicate is evaluated on the 35 value, it is true, so B is added to the result.</p>
<p>The query <em>not ['Foo' = '25']</em> will have the expected behaviour, and return items C and D. <strong>It will also return E</strong>, because E has no value defined for Foo, so any comparison over Foo will be false, and its negation, true.</p>
<p>Manipulating data is also non-trivial. Whenever you update an item, you must specify whether the new values should be added to the existing ones or replace them.</p>
<p>To sum up, working with denormalized may be appealing during the modeling process due to its flexibility; but it requires being extra careful when querying and manipulating the data. </p>
<p>Nevertheless, bear in mind that the lack of normalization enforcement frees up a lot of database resources that (supposedly) allow for a much greater scalability and handling huge amounts of data. And <strong>this </strong>is what you should have in mind when you consider whether to use a relational or a completely denormalized database, since this will have the greatest impact on your users. </p>
<p>Remember that unnatural queries can be dealt with much easier than a whole crippling database already deployed onto production.</p>
]]></content:encoded>
			<wfw:commentRss>http://weblogs.manas.com.ar/spalladino/2008/11/20/non-first-normal-form/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
