Friday, August 3, 2007

A Perl Hacker's Foray into .NET

What Is .NET?

When something's as incredibly hyped as Microsoft's .NET project, it's hard to convince people that there's a real working technology underneath it. Unfortunately, Microsoft doesn't do itself any favors by slapping the .NET moniker on anything they can. So let's clarify what we're talking about.

.NET is applied to anything with the broad notion of "Web services" -- from the Passport and Hailstorm automated privacy-deprivation services and the Web-service-enabled versions of operating systems and application products to the C# language and the Common Language Runtime. But there is an underlying theme and it goes like this: The .NET Framework is an environment based on the Common Language Runtime and (to some extent) the C# language, for creating portable Web services.

C# EssentialsC# Essentials, 2nd Edition
By Ben Albahari, Peter Drayton, Brad Merrill
Table of Contents
Index
Sample Chapter

So for our exploration, the components of the .NET Framework that we care about are the Common Language Runtime and the C# language. And to nail it down beyond any doubt, these are things that you can download and use today. They're real, they exist and they work.

The .NET CLR

Let's begin with the CLR. The CLR is, in essence, a virtual machine for C# much like the Java VM, but which is specifically designed to allow a wide variety of languages other than C# to run on it. Does this ring any bells with Perl programmers? Yes, it's not entirely dissimilar to the idea of the Parrot VM, the host VM for Perl 6 but designed to run other languages as well.

But that's more or less where the similarity ends. For starters, while Parrot is chiefly intended to be ran as an interpreted VM but has a "bolted-on" JIT, CLR is expected to be JITted from the get-go. Microsoft seems to want to avoid the accusations of slowness leveled at Java by effectively requiring JIT compilation.

Another "surface" distinction between Parrot and CLR is that the languages supported by the CLR are primarily statically typed languages such as C#, J#, (a variant of Java) and Visual Basic .NET. The languages Parrot aims to support are primarily dynamically typed, allowing run-time compilation, symbolic variable access, (try doing ${"Package::$var"} in C#...) closures, and other relatively wacky operations.

To address these sorts of features, the Project 7 research project was set up to provide .NET ports for a variety of "academic" languages. Unfortunately, it transpires that this has highlighted some limitations of the CLR, and so almost all of the implementations have had to modify their target languages slightly or drop difficult features. For instance, the work on Mercury turned up some deficiencies in CLR's Common Type System that would also affect a Perl implementation. We'll discuss these deficiencies later when we examine how Perl and the .NET Framework can interact.

But on the other hand, let's not let this detract from what the CLR is good at - it can run a variety of different languages relatively efficiently, and it can share data between languages. Let's now take a look at C#, the native language of the CLR, and then see how we can run .NET executables on our favourite free operating systems.

C#

C# is Microsoft's new language for the .NET Framework. It shares some features with Java, and in fact looks extremely like Java at first glance. Here's a piece of C# code:


using System;

class App {
public static void Main(string[] args) {
Console.WriteLine("Hello World");
foreach (String s in args) {
Console.WriteLine("Command-line argument: " + s);
}
}
}

Naturally, the Java-like features are quite obvious to anyone who's seen much Java - everything's in a class, and there's an explicitly defined Main function. But what's this - a Perl-like foreach loop. And that using declaration seems strangely familiar.

Now, don't get me wrong. I'm not trying to claim that C# is some bastard offspring of Perl and Java, or even that C# really has that much in common with Perl; it doesn't. But it is a well-designed language that does have a bunch of "programmer-friendly" language features that traditionally made "scripting" languages like Perl or Python faster for rapid code prototyping.

Here's some more code, which forms part of a game-of-life benchmarking tool we used to benchmark the CLR against Parrot.


static String generate(String input) {
int cell, neighbours;
int len = input.Length;
String output = "";
cell = 0;
do {
neighbours = 0;
foreach (int offset in new Int32[] {-16, -15, -14, -1, 1, 14, 15, 16}) {
int pos = (offset + len + cell) % len;
if (input.Substring(pos, 1) == "*")
neighbours++;
}
if (input.Substring(cell, 1) == "*") {
output += (neighbours <> 3) ? " " : "*";
} else {
output += (neighbours == 3) ? "*" : " ";
}
} while (++cell < len);
return output;
}

This runs one generation of the game of life, taking an input playing field and building an output string. What's remarkable about this is that I wrote it after a day of looking at C# code, with no prior exposure to Java. C# is certainly easy to pick up.

What can Perl learn from C#? That's an interesting question, especially as the Perl 6 design project is ongoing. Let's have a a quick look at some of the innovations in C# and how we might apply them to Perl.

Strong Names

We'll start with an easy one, since Larry has already said that something like this will already be in Perl 6: To avoid versioning clashes and interface incompatibilities, .NET has the concept of "strong names." Assemblies -- the C# equivalent of Java's jar files -- have metadata containing their name, version number, md5sum and cryptographic signature, meaning you can be sure you're always going to get the definitions and behavior you'd expect from any third-party code you run. More generally, assemblies support arbitrary metadata that you can use to annotate their contents.

This approach to versioning and metadata in Perl 6 was highlighted in Larry's State of the Onion talk this year, and is also the solution used by JavaScript 2.0, as described by Waldemar Horwat at his LL1 presentation, so it seems to be the way the language world is going.

Properties

C# supports properties, which are class fields with explicit get/set methods. This is slightly akin to Perl's tying, but much, much slicker. Here's an example:


private int MyInt;
public int SomeInt {
get {
Console.WriteLine("I was got.\n");
return MyInt;
}
set {
Console.WriteLine("I was set.\n");
MyInt = value;
}
}

Whenever we access SomeInt, the get accessor is executed, and returns the value of the underlying MyInt variable; when we write to it, the corresponding set accessor is called. Here's one suggested way we could do something similar in Perl 6:


my $myint;
our $SomeInt :get(sub{ print "I was got!\n"; $myint })
:set(sub{ print "I was set!\n"; $myint = $^a });

C# actually takes this idea slightly further, providing "indexers", which are essentially tied arrays:


private String realString;
public String substrString[int idx] {
get {
return realString.Substring(idx, 1);
}
set {
realString = realString(0, idx) + value + realString(idx+1);
}
}

substrString[12] = "*"; // substr($string, 12, 1) = "*";

Object-Value Duality

Within the CLR type system, (CTS) there are two distinct types (as it were) of types: reference types and value types. Value types are the simple, honest-to-God values: integers, floating point numbers, strings, and so on. Reference types, on the other hand, are objects, references, pointers and the like.

Now for the twist: Each value type has an associated reference type, and you can convert values between them. So, if you've got an int counter;, then you can "box" it as an object like so: Object CounterObj = counter. More specifically, int corresponds to Int32. This gives us the flexibility of objects when we need to, for instance, call methods on them, but the speed of fixed values when we're doing tight loops on the stack.

While Perl is and needs to remain an essentially untyped language, optional explicit typing definitions combined with object-value duality could massively up Perl's flexibility as well as bringing some potential optimizations.

No comments: