Linq and programming

home

 

As we know, you query a database by sending text SQL commands to the database server. As we also know, one of microsoft's design goals is to add features which only work with other microsoft products.

The "Linq" addition to C# allows database use from inside the program, which is mildly amusing. But it also allows adds shortcuts for a more programmer-y style of use. It allows programmers to query using a functional syntax, which might look a little nicer than SQL style. Finally it generalizes the commands for use on regular containers and adds some common list operations.

That last thing is a little annoying. Since we might be using a database, there are any commands which mutate you. Everything makes a copy. In fact, they make an enumerator which eventually makes the copy.

A general overview of Linq commands and database use:

The commands work on simple Lists

Linq sql commands are run from a List. They expect to possibly use several other lists: Cats.Join(Owners, ...). The connection to a database simply exposes the DB's tables as several lists. A sample command on an actual List of Cats using the Where command:

  public class Cat { public int ID; public string name; public int typeID; }
  List Cats; // filled the usual way

  // gets a list of only the type1 Cats:
  IEnumerable Cats1 = Cats.Where(c=>c.typeID==1);

Again, note how it returns an an Enumerable. They also use the trick where it's a function generating the items. You'll generally want to end them with ToArray() or ToList():

  List C1 = Cats.Where(c=>c.name[0]=='x' && c.typeID=0).ToList();

Anon classes and SELECT

Select is merely an apply-to-all. It transforms each element into whatever you say. For example this doubles every integer in N:

int[] N = {5,8,12,3};
int[] N2 = N.Select(n=>n*2).ToArray(); // {10,16,24,6}

Of course, Select was made to trim some fields away. To do that, C# added a syntax to create anonymous classes (they have tuples now, but didn't when Linq was written):

  // a1 is an anonymous <int,int> class with fields named x and y:
  var a1 = new { x=5, y=3 };
  a1.x; // 5

We can use those to build something with only the fields we need. This trims the ID from a Cat, giving only a <string,int> anon type:

  var c2=Cats.Select(c=> new { name=c.name, typeID=c.typeID });
  // converts element (1,"fluffy",3) into ("fluffy",3)

Anything copying SQL style will use a Select like this.

Join two lists making list of anon class

Joins take a second list as input. A typical one links Cat's and CatBreeds (each Cat has the int ID# for its breed -- like a database table). Since JOIN creates a new table, we require its last input describes how to make it, using the same style as a select:

  public class Cat { public int ID; public string name; public int breedID; }
  public class CatBreed { public int ID; public string breed; }
  List Cats;  List CatTypes;

  var C1 = Cats.Join(CatTypes, // the list to join ourself with
       c=>c.typeID, ct=>ct.ID, // the two items to compare for equals
       // describe how to create the result, select-style:
      (c,ct)=> new { name=c.name, breed=ct.theType });

As usual, we don't need to merely select some fields, we create whatever we want.

Fluent

They like to chain these things. You don't need to, but it's fun. They use "fluent" to describes the chaining and the indenting style:

Cats.Where(c=>c.typeID==1)
    .Join(CatTypes,c=>c.typeID, ct=>ct.ID, (c, ct)=> new { Name = c.name }
    .Where(...and so on...)
 );
 

Join tricks

Other clever bits are getting a join where several items must match. The trick is to have each table's "make my key" function create an identical tuple. This joins people with the fruits that have their preferred taste and smell:

  People.Join(
      Fruits,
      p=>new {T=p.prefTaste, S=p.prefSmell }, // both of these
      f=>new {T=f.myTaste, S=f.mySmell},    // ...have same type
      (p,f)=> new { pName=p.name, fName=f.name }
  );

Connecting to a database

Database imports can automatically create the classes and turn each Table into a List. This is done on-demand, when you try to read a query. It also creates some shortcuts. Anything listed as a Foreign Key can be turned into a link:

class Person {
  int ID;
  string name;
  int countryID; // foreign key. No need to use, since:
  
  EntityRef myCountry; // built from countryID
  // this tells the system to link myCountry using countryID:
  [Association(Storage = "myCountry", ThisKey = "countryID")]
}

You may notice how this requires a wrapper. Instead of linking to our Country, we need to use EntityRef.

More complex, entries in another table with a Foreign key to you can be converted into a list, on you. In other words, a country can get a list of its citizens:

class Country {
  int ID;
  string name;
  
  EntitySet Citizens; // automatically generated from the Persons list
  // this tell the system to set it up:
  [Association(Storage = "Citizens", OtherKey = "countryID")]
}

That one is sneaky. The [Association] tag doesn't tell us which table it comes from, but we know it's from the Persons table, since the type is EntitySet<Persons>.

This extra work means you can generally write queries the "normal" way. There's no need for a join here:

People.Where(p=>myCountry.name=="Goatland").
  Select( c=> new { Name = c.name }
 );
// using the autogenerated myCountry

You're expected to create a class for your database, subclassed from the generic DataContext. With any luck classes Person and Country were auto-generated. This links them up:

  public partial class myDataBase : DataContext {
      public Table People;
      public Table Countries;
      ...
  }

  myDataBase DB = new myDataBase("path/name.mdf");
  // all tables are inside of DB are now created and linked

The other way is to create naked tables and full them yourself:

DataContext DB = new DataContext("path/"name.mdf");
// now manually declare and fill all tables:
Table Customers = db.GetTable();

Other misc list commands

GroupBy is a fun command which slices the items into equivalence categories. 10 cats can be turned into 4 lists, each with every cat of that breed. The first input finds the category for one Cat. The next describes what goes into each sublist, and the 3rd describes how to create items in the main list.

This groups Cats by breed:

// make a a list with an entry for each breed, holding cats of that breed:
Cats.GroupBy(
    c => c.typeID, // makes groups based on each cat's breed
    c => c, // each sublist item is the actual cat
    // NOTE: the following is called at the end. typeList is a completed sublist
    (typeID, typeList) => // each main list item contains:
      new {
        Breed = typeID, // the key used to make it
        CatsOfThisType = typeList // the sublist, as is
      }
);
 

The second input could instead be c+>new{ID=c.ID, name=c.name}. That would remove the breed from each item in the Cat sublists.

Notice how you manually assemble everything. The main items don't even need to include the key, or the sublist. Suppose the 3rd input was (key, list)+>new{Count=list.Count()}. Then our end result would be, essentially, ({1},{4},{3}{2}}. How the cats divide among breeds, without knowing which ones.

OrderBy is just a sort, and Sort is better. Like the rest of these, OrderBy makes a sorted copy. Instead of giving it a comparer function, each item creates a single key:

// sort Cats in reverse breed order (sprt by 99-the breed ID):
List C = Cats.OrderBy(c=>99-c.typedID).ToList();

Zip is mostly a regular zip -- it operates on pairs of items from 2 lists using the shortest length. The difference is you have to give it a combine function. For examples:

IntList1.Zip(IntList2, (i1, i2)=>i1+i2); // adds pairs
A.Zip(B, (a, b)=>new {n1=a, n2=b}); // {{1,10},{2,20} ... } // basic zip
// NOTE: in C# 7.0, that can return a tuple

What most people call a FlatMap is SelectMany in C#. I can't test it since it's too new for my C# version.