ORM's are better than databases, right?



ORM's are clearly better, aren't they?

Pulling a database into a nice set of classes is the new thing. It's called ORM -- object relational mapping. Microsoft's Entity Framework is an ORM for C#. Hibernate is one for Java. The idea is that databases are a pain to use, where-as classes are both easier and more flexible. So of course we let the system translate to and from yucky SQL and database tables. But now I'm wondering, wouldn't learning SQL be easier?

On the "regular programming is just better than SQL" idea: suppose we want to find something complicated, like "parties with an odd number of guests, where none are friends, or friends of friends". In a program, we can grind out each part, bolt on the limitations, and we'll get something ugly but working. Solving that in SQL involves a stroke of inspiration, I assume, and may not be possible (can SQL check even/odd?) As someone who took a database course a long time ago and once taught Access, I can say that's definitely true.

SQL is clunky and awkward

This first complaint is legitimate. Possibly the only one that will be: SELECT isn't a great word, since it doesn't do the selecting. WHERE selects. SELECT merely tells us what parts to display. SELECT id FROM people WHERE name<>"unknown"; selects people with names, and displays only their id's. Literally the first thing people see about SQL is the confusing SELECT keyword.

But the real problem isn't looks -- it's the backwards way SQL works. Let's start with Cats and the foods they like. This is what convinces most people that databases are clunky. Here's the way it should look, as a class:

class Cat {
  int ID; // someone somewhere will probably need this
  string name="Fluffy"; // the default name
  List<string> Foods; //  <-- a lovely list of this cat's favorite foods

List<Cat> Cats; // create the actual list

Next is the database method. Notice how -- this is crazy -- cats don't say what they eat. We need another table for that:

create table Cats (
  id int primary key,
  name varchar(20) default 'fluffy' /* databases need to limit the size of strings */

create table catEats (
  food varchar(20),
  cat_id int  /* this must match the id of some Cat */

Database rows must be a fixed length. There's no way for each Cat to include a list of foods (well, sort of). We're forced to make a separate table. To a programmer, catEats is a frankenstein where every cat's personal food lists are jammed together.

For a little context, and since I'm terrible at databases, here's how we fill those tables.
insert  into cats (id, name) values (101, "Boots");
insert  into cats (id, name) values (102, "Max");
insert into catEats (food, cat_id) values ("mice", 101);  /* Boot likes to eat mice */
insert into catEats (food, cat_id) values ("milk", 101);  /* ...and milk */
insert into catEats (food, cat_id) values ("corn", 102);  /* Max is corn-fed */

This is how our lovely individual food lists look in a database:

select * from catEats;
| food   | cat_id |
| mice   |    101 |  // in a working database, 
| milk   |    101 |  // these wouldn't even be grouped by cat_id,
| corn   |    102 |  // making it even harder to read
| bread  |    103 |
| mice   |    103 |
| apples |    103 |

As coders, we've had to use this as a hack sometimes. We know the problem: cat_id is a contextless number. We need to be told it's into the table cats and has to match id. Bleh. But it turns out database people know this, and created the Foreign Key concept:

// better definition of catEats table:
create table catEats (
  food varchar(20),
  cat_id int,
  /* this is mark-up for the cat_id field above */
  /* tells us that cat_id is meant to match an "id" in the "cats" table: */
  foreign key (cat_id) references cats(id)

Even with the hints from Foreign Key, catEats as a separate table still feels wrong. But the proof is how well it works. Let's find the foods eaten by the third cat:

select food from catEats
  where cat_id=103; /* assume 103 is the correct cat# */

Not too bad. But let's defeat SQL by asking for foods with 4 or more letters:

select distinct food from catEats
  where length(food)>3;

What? SQL has a string-length function? I should have known. And it can easily avoid duplicate items with distinct? Well, that makes sense, having a single word for a common operation. But this next one is going to be a real problem: "names of cats that eat mice". The code version is pretty fun:

// cats that eat mice:
for(int c=0; c<Cats.length; c++) {
  if(Cats[i].Food.contains("mice")) print( Cats[i].name );

Meanwhile, the database version requires a dreaded JOIN. Cat names are in one table, while food names are in the other. We need to manually match them on the shared cat-ID. Here's a basic join to get rows like "Snowball, mice":

 select name, food from
  cats join catEats
    on cats.id=catEats.cat_id; /* get row with matching cat_id's */

It's a freaking nested loop. We to check every cat against every food row for matching id's. Bleh. The end result doesn't even look good. We're trying to attach foods to their cats, but this is the best we can do:

// cats join catEats on matching id's:
Fluffy mice
Fluffy catnip
Boots mice
Boots cheese
Boots phone cords

Terrible as that is, let's finish it by removing non-mice rows and duplicate cat names. This gets a list of cat names, like "Fluffy, Boots, Henry", but only the mice-eaters:

 select distinct name from
  cats join catEats
    on cats.id=catEats.cat_id
  where food="mice"; /* <--- the new line */

The funny thing is, it doesn't work like we think it does. In this case, the WHERE goes 1st, getting rid of non-mice foods from the food list. The JOIN simply replaces a cat_id with the name -- it has preset links to avoid a loop through cats. SQL is a descriptive language -- you describe the result you want, and it figures out the best way of doing it. SQL looks a lot slower than it is -- if you try to read it top-to-bottom as a series of function calls. Which is apparently what I was always trying to do.

There's also another way to write "cats that eat mice" in SQL. That doesn't seem right. Code is flexible. Databases queries aren't -- they're like finding the exact words to cast a magic spell. But this is different and works:

// names of cats that eat mice, version #2:
select distinct name from cats, catEats
  where food="mice" and id=cat_id; /* <-- what? an AND?

It turns out that select from cats, catEats does a full, every-pair JOIN (including cats with foods they don't eat). At the end we strip out all but the mice-eaters. Also food="mice" and id=cat_id? Who knew SQL had an AND?

Anytime we need to relate cat names to their foods, we'll need that JOIN. It turns out that SQL has functions, except better, which it calls VIEW's. We can write one for that join:

/* creates catOneFood as an alias for the join: */
create view catOneFood as  /* <--- catOneFood is the alias */
  select cats.id, name, food from
  cats join catEats
  on cats.id=catEats.cat_id;

It seems obvious we'd pre-make that view. So "names of cats that eat mice" is merely this:

/*  shorter "casts who eat mice":  */
select distinct name from catOneFood where food="mice";

As coders, we can still laugh at how a VIEW is a global variable. Silly SQL can't have local variables ... what, it can?!? Arrg. WITH is the same as VIEW, but only for one query. Below, we make a temporary table of long-name cats, then use it to pick out foods they eat:

/* foods eaten by long-name cats, solved by first making a local table: */
with longNameCates as ( /* longNameCats is an alias */
  select name, id from cats
    where length(name)>5
/* now the real query: */
select food from
  longNameCats join catEats /* <-- using the alias */
  where longNameCats.id=catEats.cat_id;

If that's not enough, there's even an alternate syntax, which looks pretty good. We can compute a temporary table anywhere, as long as we use parens and add "as myAlias" to the end:

/* yet another way to write long nat-name foods: */
select food from
  (select name, id from cats where length(name)>5) as t2
  join catEats
  on t2.id=catEats.cat_id;

If I didn't know better, I'd say SQL was written in the general spirit of a real programming language.

SQL has conditional expressions??

That length function is making me so mad. As we know, function calls and flexible tests are things code can do, and SQL can't. Let's rummage through SQL's various tests to prove how limited its conditional expressions are.

It has AND and OR. I should have known that. It's also got something named IN, which is merely an "or" shortcut:

// foods fluffy and boots can eat:
select distinct food from catOneFood where name="fluffy" or name="boots";
// IN is nothing -- just a shortcut for OR:
  "   "   "  "  where name IN ("fluffy", "boots");

SQL has the full complement of less-than and so on. Well, fine. after I saw length(name)>5 I assumed length(name)<=3 was also allowed. Charmingly, not-equals is the old-fashioned <>. BETWEEN is another shortcut for silly SQL-users who can't figure out amt>=5 AND and<=10 (which is also legal sql). They have to use amt BETWEEN 5 and 10.

But then we come to LIKE. It checks for regular expressions. Regular freaking expressions! I can barely do that in coding languages that aren't Perl. In SQL it's easy:

// (underscore is "any one thing" and % is "any number of anything")
select name from cats
  where name like '_h%'; /* sheena chip zharf */

  where name like '%Z'; /* ends with Z */

I don't even know why I'm trying to explain it. It's regular-expressions. They're just great. I was going to get mad that mySQL forces you to use a different syntax for more complex reg-exps, except that's what everyone else does. And the alternate syntax is just fine:

select name from cats
  where regexp_like(name, '[aeiou]$';  /* ends with a vowel */

  where regexp_like(name, '^[a-z]*[a-z0-9]$';  /* only letters, may end in number */

Hmmm. To sum up, SQL has the boolean logic of a normal computer language, and easy-to-use regular expressions. SQL is still clearly inferior to being able to write a program, but not as inferior as I first thought.

Arg. It has pivot tables

It turns out databases' ridiculous consolidated-list tables are actually useful. Who knew! I mean, besides the inventor and everyone who's used them for the past 40 years. But beyond that, it's a complete surprise that SQL does some things better than I can do in code. Let's take the crude catEats table again, looking something like this:

cat_id food
------ ----
101     mice
101     catnip
102     mice
102     cheese
102     phone cords

If SQL wasn't so terrible, this would be several small lists. Cat#101 would have food lisr (mice, catnip), cat 102 would have foods list (mice, cheese, phone cords), and so on. Or maybe each food would have a list of the cats which eat it. It turns out that SQL's odd format lets you do both, equally well.

GROUP BY lets you slice a sublist using any category you want, then lets you pull aggregate data out. At first it looks like another way to remove duplicates:

/* a useless group by, not showing its real purpose: */
select cat_id from catEats
  group by food;

// mice
// catnip
// ... each food is listed once

So we found a longer way of writing: select distinct food from catEats;. But we're not seeing anything interesting because we didn't ask. The rows in a GROUP BY are actually summaries of that group. In this case, various stats about every row with that food. Below we ask for some:

select food, count(*), max(cat_id), sum(cat_id)
  from catEats
  group by food;

// mice 2 102 203   // 2 cats eat mice, 102 is highest that does, sum is 203
// catnip 1 101 101 // a single cat (clearly cat #101) eats catnip
// ...

You can specify the operation and the column it's on, giving lots of options: max(wt) would be the heaviest food, and sum(price) is the cost to buy 1 of each. max(length(food)) is the longest food name each cat will eat:

/* get list of cat names and len of longest foods: */
select name, max(length(food)) as fLen /* column heading will be fLen */
  from catOneFood
  group by name

// fluffy 6  <-- fluffy might eat "rice" and "donuts"
// max 4  <-- max might possible eat "ham", "corn", and "tin"
// boots 6 <-- boots may eat "donuts" and "x"

Did you catch that nested function call? max(length(food))? SQL can nest function calls. I'm having more and more trouble hating it.

Getting the name of foods with that length is an extra step: match the max length to food-names with that length:
/* cat names and len of longest foods */
with catAndFoodLen as ( /* <-- catAndOneFood is an alias */
  select name, max(length(food)) as fLen
    from catOneFood
    group by name
/* use "fluffy 6" to filter for only fluffy's length 6 foods: */
select catOneFood.name, catOneFood.food from
  catOneFood join catAndFoodLen
    on catOneFood.name=catAndFoodLen.name and

But sql is still terrible. Get this: if you use group-by you have to replace WHERE with HAVING. Jerks. Here's "foods eaten by at least 2 cats":

select food count(cat_id) from catEats
  group by food   /* sample row: "mice 4" */ 
  having count(cat_id)>1; /* the same as a WHERE, removes some rows */

The thing is, there are actually 2 places we can filter rows: before the group-by, and after. Rows in the original table, or rows containing the computed summary. WHERE vs. HAVING. They're similar, but different. This uses both to get foods eaten by at least two 200-series cats:

/* foods eaten by at least 2 cats with id's of 200 or more: */
select food count(cat_id) from catEats
  where cat_id>=200  /* before we group by food, remove low-id cats */
  group by food
  having count(cat_id)>1;  /* remove low-count rows from the group-by summary */ 

The positions of WHERE and HAVING even help show us when they are applied.

The last thing I can complain about is the way count and max can be used outside of group-by. select count(*) from cats; gives the total amount of cats. By comparison, select name, count(*) from cats group by name; is a list with how many times each cat name occurs. That's confusing, but not really. I already know I need to learn not to read sql front-to-back.

I may have found the real problem with sql: it makes too much darned sense. Computer languages are filled with bits of oddness -- shortcuts that are actually longcuts, patches and work-arounds. How are we expected to learn a language without a bunch of funny quirks?

Who owns what?

Originally we created catEats because we weren't able to give each cat a personal food list. In our minds, it's not a real table. It's a hack to store sub-tables.

Databases have a whole thing to describe this. When you have a single foreign key into another table, relational databases call that a many-to-one relationship, and draw it with a V-shaped arrow:

 cats -------< catEats

We can imagine this as sublists. The V, and the direction it faces, represents how each cat has a V around all of its foods:

         /  101 mice
        /   101 catnip
cat #101    101 bread
        \   ...
         \  ...
         /  102 rice
cat #102    102 shoes
         \  102 mice

It's sooo pretentious, especially calling it many-to-one. Each food-item has an arrow to 1 cat. That's the "one". If we draw them as lines and think of them as 2-way, then each cat could have many arrows coming out of it to foods. That's the "many". Yeesh! Why not call it sublist ownership, which is what it is?

But suppose we have a list of states and every cat lives in one. Cats now have a state_id field. Technically that's another many-to-one arrow. Our new picture looks like this:

states -------< cats -------< catEats

How the SQL adding this would look:
  /* there would be a master table for every state */
  create table states ( id int, stateName varchar(16) );
  insert into states (id, stateName) values (0,"catLand");
  /* grow the cat table to have state_id's: */
  alter table cats add state_id int;  /* add extra to cats for the state_id */
  update cats set state_id=1 where id=101; /* start adding state_id's to cats */

By my rules, this means cats is supposed to be split into sub-lists for each state. But that's not right. Cats is the main list. The link to where a cat lives is only an attribute. So what does the ---< many-to-one arrow actually mean? It could be two different things, so how do you know?

Obviously, it means whichever we need it to mean, depending on what we're trying to figure out right now. select states.id, count(*) from cats group by state_id; treats cats as if they were grouped by state. select name from cats where state_id=2; is like looking at state#2's personal cat-list.

What about "cats from states starting with a vowel?" Here's sql where we're looking through cats as one list, checking every state:

select name from
  cats join states
    on cats.state_id=states.id
  where regexp_like(states.stateName, ^[aeiou]);

On the other hand, we often imagine the WHERE as going first. Which means we're finding the vowel-states, then combining their cat sub-lists.

"How many different foods do the cats in each state eat?" is a little of each. It groups cats by state, then combines the foods for only those cats into a single list to get the count:

/* these 1st two lines replace the state_id with the stateName: */
select stateName, amt from
  states join
  /* this does the work: gets "state_id unique-food-count": */
  (select cats.state_id, count(distinct food) as amt
     from cats join catEats on id=cat_id
     group by state_id) as t2
   on states.id=t2.state_id;

So what does all of this mean?

There are lots of ways to arrange data. When we try to solve any given problem, we'll pick the way that makes it simplest. In a database, each particular query shapes the data into whatever form we like. Our only option for the actual storage is in a table, but that's fine. It reminds us that each time we have to chose to see it as a big list, or sublists, or whatever else.

In code it seems great that we can store data in the "best" way. But that's not so great. We still need to arrange it in various ways to solve various problems. It's optimized for the most common way, but it's going to be more of a pain for the problems where it isn't.

Links everywhere

The example with the foods cats eat is clearly wrong. Listing the name of the food is silly. We should have a master food list. Instead of using the name of a food, use the food's id#.

/* master list of foods: */
create table foods (
  id int primary key,
  name varchar(20) default 'chicken'
  vitA float /* amount of vitamin A */

In the modified catEats, "102 6" means cat#102 eats food#6. The new picture just a mess. catEats wants to be sliced up into cats, and also into foods. That makes no sense:

cats---< catEats >--- foods

To get a list with names of cats and foods, we'll need to join all three tables on the id's, which is a monstrosity:

select cats.name, foods.name from
  cats join catEats on cats.id=catEats.cat_id
    join foods on food.id=catEats.food_id;
  3  | -                 101
tuna |  \               /fluffy   => tuna, fluffy
         \food:3   | --/
          cat #101 |

It's just plain silly. The catEats table doesn't have any real data. It only holds the 2 ends of an arrow. Oh, wait. That's an edge-list. It's a perfectly fine thing. Programmers love edge lists, and we store them exactly that way -- as pairs of end-points. The picture still doesn't make visual sense to me, but "table with 2 fat-end V's coming from it means an edge-list" is an easy rule to remember.

But the catEats design is still bad, since it's symmetric. It shouldn't be. I created it to give each cat a food-list, but the picture makes it look as if each food also has a cat list. Rats -- I just realized that's a good thing. It's another option. I can get tuna's cat-list for "cats that eat tuna":

/* look at tuna's list of the cats that eat it: */
select cats.name from
  cats join catEats on cats.id=catEats.cat_id
  where catEats.food_id=3; /* 3 is tuna */

Maybe my basic problem with sql is I suck at abstract thinking. If I saw Foods[3].Cats[i] in a program, I'd know we were looking at the cat sub-list for that food. In sql that's still a thing, but it's not spelled out.

I can still prove SQL is terrible. What if each cat, next to the state_id, had a breed_id? We get this picture, where cats has two fat-end V's coming from it:

// each cat has 1 state and 1 breed:
states -------< cats
breeds ----------^

// same picture, with breeds moved: 
states ------- < cats > ------- breeds

It looks suspiciously as if cats is an edge-list linking states and breeds, which we know it isn't since we made it. Oh, gah! SQL is once again right and I'm wrong. The picture doesn't say cats is an edge-list. It's showing us that cats can be used as one. We could use it to find the number of states where each breed can be found ("tabby 46" means tabbies are found in most states):

select breedName, count(distinct cats.state_id) from
  breeds join cats on breeds.id=cats.breed_id
    group by breedName; /* <-- grouping by breed */

That's pretty cool, but relational databases still muck it up. They say states and breeds have a many-to-many relationship. That's just dumb. There's no relationship between states and breeds. You'd have to use the links in cats ... oh, wait, that's what we've been doing.

In fact, many-many is even a good, neutral way of saying it. It's letting us know we could group breeds by state, or the reverse, or could just think of each item having 0+ arrows to the other table.

SQL doesn't have sophisticated logic

I didn't mention this earlier, since it's so obvious -- programs can do math, IF's and other logic that databases simply can't. We can look up to see that besides length(name), SQL has ... what!! It's got a ton:

 /* toLower and toUpper: */
select upper(name) from cats; /* FLUFFY */

/* padding, left and right: */
select lpad(name,10,"-") from cats /* ----fluffy */

/* string replace: */
 select replace(name,'f','X') from cats; /* XluXXy */
 /* it even works on strings? Arrg: */
 select replace(name,'ff','X') from cats; /* fluXy */

/* ordering with strcmp: */
select * from cats where strcmp(name, "h")>=0; 

But we still have it beat on math. SQL can't find names with an even number of letters. What! It can? Using the same math a program would?:

/* even-length cat names: */
select * from cat where length(name)%2=0;

/* actual math: */
select * from cats as c1 join cats as c2
  where c1.id-c2.id>2;
       ^^^^^^^^^^^^^ Are you kidding me?

Its even got IF's:

/* spreadsheet-style nested value-returning if's: */
select name, if(shipType=0,"boat", if(shipType=1,"ship","trawler")) from fleet;

/* an IF to choose the WHERE test: */
select * from cats
  where if(length(name)>5, state_id=1, state_id=2);

Are ORM's better?

I guess not. An ORM can convert our database into classes where each cat has a personal sublist of what it eats. That seems pretty cool. But SQL can also do that with an appropriate WHERE or GROUP BY. Sql can also easily treat catEats as 1 big list, or sublists on all sorts of criteria.

The way we arrange data in classes communicates the purpose. If there's no master list of all cats, instead a cat-list for each state, that tells us the program is mostly about the regional distribution of cats. But when do we need to be told that?

Putting our tables into classes gives us editor support. That's nice. SQL is just text strings -- the editor can't auto-complete a table name or flag WJERE. But SQL queries aren't that long, and your program can build them out of smaller strings. You could even write functions to help.

I think ASP.net is a key to understanding the ORM idea. It lets you write round-trip webpages using only C#. It translates it to and from javascript. An ASP.net programmer need know nothing about the web. As you'd guess, it's clunky, but there's an appeal to a having to learn nothing new. javascript and html are pretty easy, but apparently can be scary. SQL is worse -- it looks like COBOL, isn't case-sensitive, isn't imperative. If "make programmers learn another language" is a deal-breaker, an ORM looks pretty good.




Note 1: We could create fields food1, food2, and food3. But come-on. As programmers, that's the same as trying to make a list using Food f1, f2, f3;. It's terrible in a DB for the same reasons it is in a program.



Comments. or email adminATtaxesforcatses.com