C#'s database solution



C#'s solution to database queries is LINQ. It's odd and awkward in the delightful C# way. First, it has two different syntaxes: one is standard functions, the other is almost SQL. Second, it invents a syntax for anonymous classes. Third, instead of producing a list, it creates iterators which can be used to get the result.

Finally, the new commands work on regular lists of classes. In theory those lists of classes came from a database, sucked in ORM-style. But they don't need to be. The new Where, Join, and GroupBy commands are also C#'s list-searching functions.

Anonymous class, new constructors

A typical query uses SELECT and produces a new type, probably a type that never existed before. For example SELECT name, age from cats produces a (string, int) list. C# has tuples now, but didn't when linq was written, so it needed a special syntax for anonymous types:

var t1 = new { x = 1, y = "abc" }; // creates (x:1, y:"abc")  t1.x t1.y

var t2 = new { C[0].name, cow }; // (name:"bob", cow:"bessy")  t2.name t2.cow

They're declared using the var keyword since there's no syntax for the actual type. As shown, the field name matches either the var or field name used to make it, or you can use myName= to assign a name.

Since classes auto-generated from a database won't have constructors, linq gives us an on-the-fly constructor. It uses roughly the same syntax as anonymous variables but includes the class name after the new:

new Cat_t { ID = -1, name = "fluffy", typeID = 0 }
// same as (if you have it): new Cat_t(-1, "fluffy", 0);

Whenever you see new followed by a curly-brace it's one of these alternate constructors.


The version of commands that look like functions are "method syntax". The ones that look like SQL are called "query syntax", which are later. Where is an easy one, it's basically a Find:

 int[] N = {2,3,6,7,8,10,13};
 int[] Nmiddle = N.Where(i => i>3 && i<12).ToArray(); // {6,7,8,10}

Yikes! What's that ToArray() doing there? It's because all of these actually create an iterator. The actual return type is IEnumerable<int>. ToArray() forces it to unroll. The other is ToList().

Select uses the new anonymous type syntax. It converts each item into something else:

var C2 = C.Select(c => new {c.name,c.typeID}); // (name:Fluffy, typeID:4)

Declaring the type as var is required, since we don't have a name for the new class. And, of course, C2 is actually an iterator. We'll want a foreach to read it. Also notice that Select is really just an ApplyToAll. This creates a list with every item doubled, and another where they're replaced with 2's:

int[] N2 = N.Select(n=>n*2).ToArray();

var Allqs = N.Select( x => 2 ); // same length, but all 2's

OrderBy is merely Sort, but with a difference. Instead of a compare function, you assign an integer to each item. It sorts low to high. This sorts C in reverse ID order:

var C3 = C.OrderBy(c => -c.ID ); // each item has sort key: -c.ID

SQL's GroupBy is something new for C#. It breaks the lists into several lists having the same key. Each sublist has the key in c.Key. There's no explicit field for the actual list -- the item's iterator is set to produce it:

var C4 = C.GroupBy(c => c.typeID, c => c);
foreach(var c in C4) {
  Write(c.Key + ": ");
  foreach(var c2 in c) Write(c2.name);

The second parameter is a required SELECT. c=>c gives no change. But we could use c=> new { c.name, c.age }

As a check, here's how it would look if we turned the iterators into actual lists. Turning the main thing into a list is one step, then changing each part is another:

var C5 = C.GroupBy(c => c.typeID,c => c).ToArray(); // to an array
Write(C5[2].Key + ": "); // testing: C5 is an array
var C5_2 = C5[2].ToArray(); // runs the iterator, gets only the list, not the key
for(int i=0; i<C5_2.Length; i++) Write(C5_2[i].name);


Programs rarely need JOIN's. In a database, each Cat knows its type only from an ID into the cat-type table. But in a program we'd have a pointer. Even so, it's fun the see how JOIN functions work:

This SQL join translates into this C# query:

SELECT Cats.name, CatTypes.val
  FROM Cats JOIN CatTypes ON Cats.type = CatTypes.ID

var q3 = Cats.Join(CatTypes, c => c.type, ct => ct.ID, (c,ct) => new { c.name, ct.val });
// Ex: {(Fluffy,persian), (Max,Tabby)}
// Semantics is:
// JOIN( 2ndList, keyFromList1, keyFromList2, selectPartsFromEither)

Notice how for inputs 2 and 3 we need functions to extract the keys used to match in the JOIN. The fourth input is once again a required SELECT. Notice how it takes both lists as the input.

Even more rare, but fun, what if our join was on two things. In SQL: ON T.taste=S.taste AND T.smell=S.smell. In our program, each creates a (taste,smell) object to match:

      p=>new {T=p.prefTaste, S=p.prefSmell }, // match (taste,smell) from f and p
      f=>new {T=f.myTaste, S=f.mySmell},
      (p,f)=> new { pName=p.name, fName=f.name } // (steve, apple)

"Fluent" style

Chaining function calls together, instead of nesting them, is fun. It's not any better than any other way, but it's fun. The style is to put each extra call on a new line, intended:

    .Join(CatTypes,c=>c.typeID, ct=>ct.ID, (c, ct)=> new { Name = c.name }
    .Where(...and so on...)

Query style

For people who don't want to use real SQL, but want to use an SQL-like syntax within C#, there's a completely different syntax. Unlike SQL, the SELECT must be at the end. As before, they use the anonymous class syntax:

// (name,ID) of all type-1 cats:
var q4 = from c in C where c.typeID == 1 select new { c.name, c.ID };

This doesn't add anything. It compile into the "method" syntax.


The Zip function is an oddball here. Many languages have a Zip which operators on pairs of items. But it's not part of SQL. Linq's Zip doesn't just make tuples, it requires a combine function:

A.Zip(B, (a, b)=>new {a, b}); // {{1,10},{2,20} ... } // basic zip

IntList1.Zip( IntList2, (i1, i2) => i1+i2 ); // adds each pair

I assume it's in Linq because it often requires an anonymous class.

Connecting to a database

Database imports can automatically create the classes and turn each Table into a List. This is done on-demand, when you try to read a query. It also creates some shortcuts. Anything listed as a Foreign Key can be turned into a link:

class Person {
  int ID;
  string name;
  int countryID; // foreign key. No need to use, since:
  EntityRef<Country> myCountry; // built from countryID
  // this tells the system to link myCountry using countryID:
  [Association(Storage = "myCountry", ThisKey = "countryID")]

You may notice how this requires a wrapper. Instead of linking to our Country, we need to use EntityRef<Country>.

More complex, entries in another table with a Foreign key to you can be converted into a list, on you. In other words, a country can get a list of its citizens:

class Country {
  int ID;
  string name;
  EntitySet<Person> Citizens; // automatically generated from the Persons list
  // this tells the system to set it up:
  [Association(Storage = "Citizens", OtherKey = "countryID")]

That one is sneaky. The [Association] tag doesn't tell us which table it comes from, but we know it's from the Persons table, since the type is EntitySet<Persons>. Altogether, our lists are pre-joined. Persons[0].myCountry.name is all set up for us.

You're expected to create a class for your database, subclassed from the generic DataContext. Person and Country classes are hopefully auto-generated from your GUI selections. This links them up:

  public partial class myDataBase : DataContext {
      public Table<Person> People;
      public Table<County> Countries;

  myDataBase DB = new myDataBase("path/DBname.mdf");
  // all tables inside of DB are now created and linked

The other way is to create naked tables and full them yourself:

DataContext DB = new DataContext("path/"name.mdf");
// now manually declare and fill all tables:
Table<Customer> Customers = db.GetTable<Customer>();

Obviously, there's a lot more. The point is that we're sucking the database into regular data structures before using it.