Pages

Advertisement

Monday, July 9, 2007

Case sensitivity

The issue of case sensitivity in programming languages is one of those religious wars that we developers get into every now and then. Some people—the kind who were brought up on Unix, C, and all that, swear by it. Others think it is the worst travesty to befall computing since software patents. It is especially prevalent in the .NET world, where you have two main languages: C#, which is case sensitive, and VB.NET, which isn’t.

I am not particularly keen on case sensitivity myself, though most of my work is done in case sensitive languages. It could be argued that it is an advantage that it trains the mind to pay close attention to small details, but there are already so many small details to take careful note of in code that programming does that anyway, and it just adds to the burden. It doesn’t necessarily make your code look any tidier either. In fact, it can actually cause more problems than it solves. In fact, will someone please enlighten me as to exactly what problems case sensitivity is supposed to solve in the first place?

Take for instance this C# code snippet:

class Foo {
private int bar = 0;

public int Bar {
get { return bar; }
}
}

Now this is all well and good — a private field encapsulated as a read only property. This is the kind of thing that you encounter daily when you are working with C# code. When I was more inexperienced I used to use this convention all the time: camelCase for the fields and PascalCase for the properties. However, one simple typo can spell disaster:

class Foo {
private int bar = 0;

public int Bar {
get { return Bar; }
}
}

Spot the difference? The getter for Bar, rather than returning the contents of the field, will now call itself, giving unwanted recursion and a stack overflow. And do you think IntelliSense makes it any better? In your dreams. I had this problem bite me several times simply because IntelliSense sneakily changed the case of bar to Bar and I didn’t notice, before I wised up and started prepending the private fields with an underscore:



class Foo { private int _bar = 0; public int Bar { get { return _bar; } } }


Now this is a simple example, but there are other more complex ones that I could give. And because most people don’t tend to notice the exact case of identifiers, it is all too easy to end up getting the wrong one — or even, particularly if you are maintaining someone else’s code, to fail to notice that there is a wrong one to get in the first place.

Naming conventions can help with this. Both .NET and Java have specific standards, but even then there is still scope for ambiguity. You are supposed to write identifiers in PascalCase or camelCase depending on its visibility and purpose, with the first letter of each word in the identifier capitalised. However, in some cases it isn’t that clear whether you should consider some identifiers as one word or two. Do you write Filename or FileName, for instance?

It would be easy if all languages were case insensitive, like VB, Delphi or Fortran. Unfortunately, even these languages often have to communicate with other languages and platforms. In .NET in particular, you may have to slap a [CLSCompliant] attribute on your assembly one day so that it can interoperate with someone else’s code in a language on the other side of the Great Divide. When that happens, it you have a namespace called ee.cummings in one place and EE.Cummings in another, both VB.NET and C# will choke on it. Alternatively, you may need to port your code from a language that is case insensitive to one that is case sensitive, or vice versa.

The problem with case sensitivity is that it is so pervasive. All the C-like languages are case sensitive, at least in part1, as are Python and Ruby. That means nine of the current top ten programming languages. Some important cross-language protocols such as XML and SOAP are case sensitive. If you expose some methods as a web service, they are case sensitive. Some bits of URLs may or may not be case sensitive depending on the underlying operating system and the web programmer’s predilections. And then there are those times when you can’t quite remember off the top of your head.

To avoid problems, I operate with two basic guidelines, regardless of what language or platform I am using.

1. When choosing new identifier names, assume that the language is case insensitive. Don’t choose names for your identifiers that vary by case alone. This means that if the language is case insensitive, or your code has to interface with or be ported to such a language, you will avoid name collisions.

2. When using existing identifiers, assume that the language is case sensitive. Be consistent in the case you use when referencing identifiers. This means that if the language turns out to be case sensitive after all, your variables will be found correctly. IntelliSense and other similar technologies can help here, provided you have stuck to (1).

Love it or loathe it, case sensitivity is here to stay. We still have to live with it, so we might as well just get used to it. Still, it would be more bearable if programming languages and operating systems enforced both these rules, rather than just one or the other. This happens in the .NET framework when you mark your assembly with the [CLSCompliant] attribute, but this only extends to publicly visible members, so you still have scope for the problem outlined in the examples above. Even better would be for you to be able to choose for yourself which case sensitivity rules you wanted to use as a compiler or interpreter option.

1 PHP variables are case sensitive; functions and constants are case insensitive. Some implementations of JavaScript are stricter about case sensitivity than others.

No comments:

Post a Comment