If you’re not already an initiate to programming, you may not be clear what you’re getting into when you set out to learn to program, so here’s an overview.
Most obviously, you must learn a programming language, which comes down to learning four things:
a syntax: The syntax of a language is the set of rules for how code of that language is written.
a standard library: Programming in all languages is in some way modular, i.e. a piece of code can use other pieces of code. Rather than write all of a program’s code from scratch, programmers use pre-existing code from code libraries. Virtually all modern languages each have their own standard library, a library which is included as a default part of the language. There typically exist many libraries for a language, but only the language’s standard library is universally known and used by all programmers of that language.
code idioms: Given a syntax and a standard library, there then exists in a language some informal set of idioms (reoccurring patterns) of how to code in the language which programmers write over and over again. For various reasons, these idioms can’t be expressed as library code, but while they are not built into the language or officially specified in the language specification, every good programmer uses established idioms wherever possible. An idiom finds favor because it has been proven over time to be the best and/or most obvious way of doing something. (I use the term ‘idiom’ to mean a ‘small scale pattern’, something which is expressed in a single place in code. In contrast, the term ‘pattern’ is used to refer to ‘design patterns‘, a specific set of large-scale design strategies used in programming that are generally not particular to any language. Design patterns are also a very good thing to learn, but they aren’t a high priority early in your education.) You won’t hear much talk about idioms among programmers because, in all, there just aren’t that many of them in most languages, and they’re fairly natural to pick up.
a set of tools: To write their programs, programmers use tools (programs):
- programmers do most of their work typing code in text-editor programs
- code typed by the programmer needs to be translated into a form runnable by the computer; this translation is performed by compiler, interpreter, assembler, and linker programs (which of these are used depends upon the programming language)
- debugger programs allow programmers to see exactly what happens when their programs run so they can ferret out bugs
- profiler programs help programmers test the performance of their programs and see how long it takes for portions of code to execute so that they can see what portions of code perhaps need to be optimized
- in some areas of programming, code generators can sometimes help speed up development by doing some of the coding work for you
- version control systems help programmers keep track of the changes they make to their code as they develop it, making it easier to coordinate with other programmers and revert back to earlier versions of their code
- programs which integrate some or all of the functionality listed above are called IDE‘s (Integrated Developer Environments)
(By the way, don’t worry about tool costs: very high-quality tools are freely available for download for nearly all languages used today; in fact, with the notable exception of Microsoft languages—C# and Visual Basic—the preferred tools are the free and open source ones, while commercial tools are increasingly disfavored and are disappearing.)
So the question you probably have now is, ‘Which language should I learn?’ The correct answer is that you should learn more than one:
- C: I start with C mainly because: 1) it is the lingua franca of programming; 2) learners can understand what’s really going on in the computer when they run their program written in C; and 3) most features of its syntax show up nearly verbatim in other major languages, such as Java. For purely pedantic reasons, C is a language you really must know, even if you never do any serious programming in it—you’re a better programmer simply knowing what programming in C is like. On the other hand, I strongly discourage you from attempting to write anything but trivial programs in C early in your learning phase, for it takes a good amount of programming experience and knowledge to write even mildly-sophisticated end-user programs (such as programs with graphical user interfaces) in C. Once you attain a decent amount of comfort with basic C, you’re much better off focusing on some other language, such as Java or Python.
- Java is steadily replacing C as the lingua franca of programming. It is an object-oriented language with a very large standard library, so writing programs is significantly easier to do in Java than in C. Java is the most common choice of a first language to teach students today because Java protects programmers from misusing memory. I think this choice is a mistake, however, because it’s important as a programmer to understand what exactly Java is doing for you, and you can’t really understand that unless you study C. Worse, Java has a rather complex set of advanced syntactical features, and the set of tools and libraries typically used by experienced Java programmers is large and complex. These facets make it hard for learners to get up to speed with real-world code examples. Despite these flaws, Java, unlike C, is a good language to attempt writing your first real programs in.
- Python is a language with an exceptionally elegant syntax which avoids clutter and nasty ‘corner cases’ (meaning situations where the language has to have complex rules to cover all possible cases). Expressing your ideas in Python is arguably easier than in any other language out there, so writing useful programs in Python is as easy as it gets. On the downside, Python code runs very slow compared to C and Java code, so if you want to be a ‘real programmer’ that can write processing-intensive programs, knowing Python is maybe not enough.
I consider these three languages the most essential to learn, not just because of their populatiry but because of how well they suit their roles. The best order to tackle them in is either C then Java then Python or the other way around: if you’re impatient and want to write something neat fairly early in your education, start with Python; if you’re patient and insist on understand how things really work, start with C.
Here are some other languages to be aware of:
- C# (pronounced ‘C sharp’) is essentially Microsoft’s imitation of Java. Despite what the name implies, it is only tangentially related to C. While an improvement over Java in some ways, it’s really not different enough from Java to really make that much of a difference which of them you learn. (Many programmers will insist you stay away from Microsoft’s proprietary technologies, though there actually is an open source implementation of C# available called Mono.) I recommend Java because it’s the older, more established language with the broader reach and because better free tools are available for it. However, once you get comfortable with Java, taking a look at C# wouldn’t hurt because a lot of software companies out there are Microsoft shops, and most of them will be using C# by now.
- C++ is essentially C with object-oriented programming features grafted on. The result is messy but not without its virtues. Because of its performance advantages, C++ is used for end-user applications with high performance requirements, notably games. However, if you’re learning programming to make games, you’re kidding yourself if you start with C++: it’ll be years before you’ll develop the skills to make games complex enough to stress a modern system, so you should focus your learning on an easier to use language. As you develop better program-design skills, they will translate to C++ when the time comes; in the mean time, you should broaden your general programming knowledge.
- Visual Basic: The basic problem with Visual Basic is that it’s really not that basic at all any more. In its origins, VB was a language that was going to bring programming to the masses. Since then, the features of ‘real’ programming languages, like C++ and Java, have been grafted on to VB in ugly ways, resulting in a language which is just about as complex but not as elegant and therefore not used much except for writing ‘in-house’ software (software which is written for the internal use of a company/organization). The joke goes that a serious programmer doesn’t dare publish programs written in VB to the wider public lest he gets laughed at for using VB. Despite its unfair reputation as a toy language, VB can be used to do real workâ€”there’s just little reason to choose it over Java and C# on the merits. (This judgment is possibly just aesthetic bias: I and most other programmers much prefer the look of Java and C# code to that of VB.)
- Ruby has been stealing glory from Python lately, largely because of the popularity of ‘Ruby on Rails’, a framework for developing web programming (a framework is a glorified, overgrown library). In the end, the two languages are extremely similar such that learning one will make learning the other quite simple. Ruby arguably has a few feature advantages, but Python takes the edge in elegance and carefully thought out syntax. If you want to do web programming, I recommend the Django framework for Python as a replacement for Rails.
- Perl is like Python’s deformed cousin. It’s a convoluted mess dearly loved by its partisans but inhospitable to first-time programmers. Currently, Perl is in limbo as the long awaited version 6 has been years late in coming; meanwhile, Python and Ruby have been stealing Perl’s niche, and I predict Perl will never recover, eventually fading into perenial also-ran status.
- PHP, due to the popularity of the web, is the first language of a surprising number of people these days. This is unfortunate because PHP has little reason for existing except to plug a hole that shouldn’t have existed in the first place. In almost all respets, it’s otherwise an undistinguished amalgam of standard language features taken from other languages and inelegantly smashed together. Still, PHP is a popular choice for writing open source web applications, such as the blogging software that runs this site (WordPress) and the MediaWiki software that runs Wikipedia. I find this unfortunate, but PHP had the right features at the right time, and there’s no arguing with good timing.
Of course, learning to program involves quite a bit more than just learning a language. Additionally, there’s:
terminology / way of speaking
best coding practices (especially how to write readable code)
design (even if you know a language inside out and write meticulously crafted code, you may still be clueless about writing programs in the large; without learning good design practices, your programs will become unreadable and unmodifiable rats nests as they grow in size)
a bit of computer science (especially fundamental data structures and algorithms)
operating system concepts
A note about this last bullet: whatever the language, the real difficulty of learning standard libraries (aside from the fact that many of them are very large) lies in the fact that their most important parts are heavily intertwined with concepts not particular to the language, e.g. every standard library includes code for reading and writing files, but files are really an operating system concept, not a language concept. Unfortunately, standard library documentation has the tendency to assume the reader is already familiar with these underlying concepts because it assumes an audience of programmers experienced in other languages; when the documentation does make an attempt at elucidating these concepts, it tends to do a half-assed job; even when the documentation explains these underlying concepts well, this is still a suboptimal way of learning because it confuses the concepts with the particulars of that particular library and language. So you should try to learn OS concepts in their own right, independently of any language; there are a few good books about operating system design, in particular, Modern Operating Systems (2nd edition), by Andrew S. Tanenbaum and (for when you’re comfortable with C) Linux Kernel Development, by Robert Love.
This same advice—’learn concepts, not libraries’—applies to a few other domains you’ll encounter, including networking, databases, and web standards, e.g. before devoting a lot of time to learning how to work with a database using Java’s standard libraries, learn about databases independently of any language.
I should end by mentioning ‘Pidgin’, my in-the-works language intended strictly for educational purposes. (Yes, the Gaim people are using the name ‘Pidgin’, but I came up with the name earlier, besides which, it’s a name most appropriate for a language.) My LearnProgramming.tv project has been on hiatus, partly because I’ve been reconsidering its approach ever since I had the epiphany that what’s really needed is a better language for beginners. There’s nothing to show yet, but implementation shouldn’t take long, as the syntax is extremely simple (it uses prefix notation for expressions) and I’m simply translating into Python. If you’re looking to learn to program, please leave a comment—I’m looking for some guinea pigs.