Thursday, October 18, 2012

On the Harm of Object-Oriented Programming

This is an excerpt of an essay that I wrote for a class, prompted by Dijkstra's famous goto letter. I must admit that it was written in a very "stream-of-conscious" type of way, so it may not be elegant writing, but I think it's worth putting down anyways.

I believe today that the reach of computer science (which I should probably refer to as software engineering as the discussions of goto is more a practical than a theoretical) is sufficiently widespread that is has divided the industries of programming so much that it is not useful or plausible to talk of a single language feature or programming practice being useful or harmful. What is useful for a web application could be considered harmful for an embedded software application, and vice-versa. So I would limit my response to the area that I most familiar: desktop application development.

In my opinion, one of the most harmful programming features is single-paradigm object-oriented programming. Object-oriented programming has been advocated, pushed, and taught so strongly for the past 20 years that it is almost universally used to model any problem we need to tackle as programmers. However, like any tool, there are strengths and weaknesses. At the heart of object-oriented programming, the idea is to model every piece of the program as realistically as possible to a noun that as humans we can understand (which implies that we are attempting to draw a parallel between real life objects and our programs). The consequence is that we group pieces of data that compose that noun together in memory. This has significant downsides on modern desktop computers though. This grouping of data that object-oriented programming encourages is essentially the memory layout referred to as  “array of structures” (AOS). Consider an AOS layout for a list of particles being used to model an explosion effect in a video game. In this setup, you may have a geometric vector for a position, velocity, force stored in this object. Presumably, since the particle has these attributes, you need to perform a numerical integration of these attributes over time. This is a loop over your particles where at each iteration you access each attribute, perform the necessary computation, and store the result. In the AOS layout, these properties are stored contiguously in memory and further each particle is stored contiguously in memory. That means on current desktop PCs, where memory access is expensive and the hardware has a lot of builtin mechanisms to reduce the costs, the particle array will only utilize one cache line. On the contrary, let’s reconsider the system we’re building. We designed our program so that each particle is an object. But when will there only exist one particle? The answer is never. The program can be better redesigned to account for the fact that particles are only ever used in mass. So the solution is to split each attribute of a particle into an array of each attribute (the so called “structure of array” layout), and now when iterating over the particle data there will be a cache line utilized by each array, significantly reducing the amount of cache miss and increasing the performance of the application. Arguably you can still have a “ParticleManager” object, so all hope isn’t lost for object-oriented programming; but the point is that single paradigm object-oriented programming languages are so taught and widely used that the mindset we’ve learned has lead us to design our particle system blindly, thinking about real-world abstraction instead of an abstraction that makes sense for both us and the computer. On a final note, I talk of single paradigm object-oriented programming because it is those languages that force us to think in terms of object and don’t allow us to escape the paradigm when it no longer makes sense.