I Hate Object Oriented Programming

[index] [3,645 page views]

Tagged As: C, Computer Science, OOP, and Programming

Introduction

When I was first learning to program, Object Oriented Program (OOP) was the paradigm everyone was supposed to adopt. Java was the hotness and C++ was favored over C. With no personal experience as a point of reference to disregard the industry hype, I tried learning about all the promised goodness OOP was supposed to provide from encapsulation and inheritance.

My reasons for disliking OOP are less grounded in technical philosophy or academic computer science. There are plenty of well written and documented diatribes against the OOP paradigm addressing its fundamental nature and the abomination of tools programmers have to work with. For that discussion, scroll past my rant towards the bottom.

Personally, the whole OOP house of cards crashed down for me when I looked at it from an interfacing-to-assembly-language point of view. Now obviously the raw interface "wasn't OOP" ... but it did expose that after the compiler performed its magic, underneath the hype was a hidden pointer and a vector table. That was it. That's not necessarily a truly valid reason to hate OOP, though. Assembly language programmers have always been able to "do anything" and "manipulate anything" because they have such raw access to the bare metal code. I do appreciate the OOP compilers were doing a lot of work to offer high-level language features. But in the end, it hid the fact that behind the curtain was an extra function parameter - a this pointer - and a function mapping table that allowed for procedures to be inherited and overloaded.

To be fair, there are huge differences to the implementations of OOP in C++ vs Java vs Python, etc. For the sake of this particular rant, I am focused on the compiled language C++ vs C as opposed to scripted languages in order to look at whether the end result is significant. The OOP vs non-OOP supporters are extremely polar to one another and their debate will likely never be settled. OOP certainly has a place for rapid prototyping, proofs of concept, and scripted languages not focused on performance - all reasons I have used it myself despite the self loathing involved in admitting it was handy "that time."

For anyone that has never explored what happens, take these two simple C and C++ programs to define a 3D point as an example. The first uses a structure to hold the x, y, z coordinates and has separate functions to manipulate them. The second uses an object to hold the x, y, z coordinates and a mapping to the methods that manipulate them.

NOTE: Each program will be compiled using the gcc and g++ tools respectively in 32bit mode using the cdecl calling convention. Why? It's just very easy to see the difference when the compiler isn't doing a lot of casting small values against 64bit addresses or fast call register optimizations.

OOPsucks1.c


#include <stdio.h>
#include <stdlib.h>

#ifdef __GNUC__
#define __cdecl __attribute__((__cdecl__))
#endif

struct Point
{
   int x, y, z;
};

int __cdecl set (struct Point *p, int x, int y, int z)
{
   p->x = x;
   p->y = y;
   p->z = z;
   return 0;
}

int __cdecl movex (struct Point *p, int x)
{
   p->x = p->x + x;
   return 0;
}

int __cdecl movey (struct Point *p, int y)
{
   p->y = p->y + y;
   return 0;
}

int __cdecl movez (struct Point *p, int z)
{
   p->z = p->z + z;
   return 0;
}

int __cdecl show (struct Point *p)
{
   printf("Point is at (%d,%d,%d)\n", p->x, p->y, p->z);
   return 0;
}

int main(int argc, char **argv)
{
   struct Point   *p1;

   p1 = malloc(sizeof(struct Point));
   set (p1, 1, 2, 3);
   show (p1);
   movex (p1, 5);
   movey (p1, -5);
   movez (p1, 10);
   show (p1);
   free (p1);
   return 0;
}

This can be compiled into 32bit assembly using the command:

gcc -fverbose-asm -masm=intel -m32 -mpreferred-stack-boundary=2 -S OOPsucks1.c

This can be compiled into a 32bit executable using the command:

gcc -fverbose-asm -masm=intel -m32 -mpreferred-stack-boundary=2 OOPsucks1.c -o OOPsucks1

OOPsucks2.cpp


#include <iostream>
#include <sstream>
#include <string>
using namespace std;

class Point
{
   public:
    void set( int newX, int newY, int newZ );
    void show( ) const;
    void movex ( int modX );
    void movey ( int modY );
    void movez ( int modZ );

   protected:
    int x;
    int y;
    int z;
};

void Point::set( int newX, int newY, int newZ )
{
    x = newX;
    y = newY;
    z = newZ;
};

void Point::movex( int modX )
{
    x = x + modX;
};

void Point::movey( int modY )
{
    y = y + modY;
};

void Point::movez( int modZ )
{
    z = z + modZ;
};

void Point::show( ) const
{
    cout << "Point is at (" << x
         << ',' << y
         << ',' << z
         << ')' << endl;
};

int main()
{
    Point p1;

    p1.set( 1, 2, 3 );
    p1.show( );
    p1.movex( 5 );
    p1.movey( -5 );
    p1.movez( 10 );
    p1.show( );
    return 0;
}

This can be compiled into 32bit assembly using the command:

g++ -fverbose-asm -masm=intel -m32 -mpreferred-stack-boundary=2 -S OOPsucks2.cpp

This can be compiled into a 32bit executable using the command:

g++ -fverbose-asm -masm=intel -m32 -mpreferred-stack-boundary=2 OOPsucks2.cpp -o OOPsucks2

Calling Convention Analysis

The code is much longer than necessary, but serves to get a somewhat apples-to-apples comparison for functionality into something that will compile and run. For the calling analysis, we only need to look at how each program initialized the structure/object Point and set it's values.

First, look at the assembly code generated for OOPsucks1.s around lines 49-50. To recap, we call malloc() to reserve some memory for the structure, assign it to a pointer, and then call the set() function. Due to the cdecl calling convention, the parameters are pushed onto the stack in reverse order from right to left with the pointer to the structure being last.


# OOPsucks1.c:49:    p1 = malloc(sizeof(struct Point));
        push    12
        call    malloc@PLT
        add     esp, 4
        mov     DWORD PTR -8[ebp], eax
# OOPsucks1.c:50:    set (p1, 1, 2, 3);
        push    3
        push    2
        push    1
        push    DWORD PTR -8[ebp]
        call    set
        add     esp, 16

Secondly, look at the assembly code generated for OOPsucks2.s for lines 55-58. In terms of purpose, they are quite similar. The g++ compiler allocates the object as internal variable tmp92 on the stack when it is declared and sets the base pointer. The parameters are pushed in accordance with the cdeclconvention just like the example in C. The difference is how the object to manipulate is referenced. Instead of directly passing a pointer to the structure, in OOP, the object is looked up with the lea (Load Effective Address) instruction and that memory reference is pushed. Then the set() method is called which became _ZN5Point3setEiii internally to the compiler.


# OOPsucks2.cpp:55: {
        mov     eax, DWORD PTR gs:20    # tmp92, MEM[( unsigned int *)20B]
        mov     DWORD PTR -4[ebp], eax
        xor     eax, eax
# OOPsucks2.cpp:58:     p1.set( 1, 2, 3 );
        push    3
        push    2
        push    1
        lea     eax, -16[ebp]
        push    eax
        call    _ZN5Point3setEiii
        add     esp, 16

So what's the big deal? So far, other than making a wildly more obfuscated compilation to assembly, the two paradigms are not that different. A reservation was made in memory for the structure and object respectively. The x, y, z coordinates were passed as expected. The only difference is how the high level language presented the "and do something to what" for the developer. In C, a programmer very explicitly chooses a function and passes a pointer to the structure that will be modified. In C++, a programmer is taught the object receives some parameters to a method and thanks to the magic of encapsulation, internally picks the correct method (this means more in the event of inheritance or overloading) for attribute manipulation. C++ converts the "object.method" reference into a hidden pointer parameter known as this and passes it. Thus, under the covers, what is actually happening is that instead of the programmer thinking about which data structure they wanted to point at, the compiler derived that from the object and "saved the step" of explicitly passing it as a parameter. The resultant code is not much different than just knowing what you're doing and coding it yourself.

Procedure vs Method Analysis

What happens when the function / method begins execution basically reveals there is not any difference in run-time between both OOP and non-OOP. All of the difference comes about purely from how the compiler loads a pointer to the object in question and selects the appropriate method. I argue that a developer that already knows which function they want to call, should just call that function and be done with it.

First, look at the assembly code generated for the set() function for the non-OOP example.


set:
        endbr32
        push    ebp
        mov     ebp, esp
# OOPsucks1.c:15:    p->x = x;
        mov     eax, DWORD PTR 8[ebp]
        mov     edx, DWORD PTR 12[ebp]
        mov     DWORD PTR [eax], edx
# OOPsucks1.c:16:    p->y = y;
        mov     eax, DWORD PTR 8[ebp]
        mov     edx, DWORD PTR 16[ebp]
        mov     DWORD PTR 4[eax], edx
# OOPsucks1.c:17:    p->z = z;
        mov     eax, DWORD PTR 8[ebp]
        mov     edx, DWORD PTR 20[ebp]
        mov     DWORD PTR 8[eax], edx
# OOPsucks1.c:18:    return 1;
        mov     eax, 0
# OOPsucks1.c:19: }
        pop     ebp
        ret

Second, look at the assembly code generated for the set() method for the OOP example.


_ZN5Point3setEiii:
        endbr32
        push    ebp
        mov     ebp, esp
# OOPsucks2.cpp:26:     x = newX;
        mov     eax, DWORD PTR 8[ebp]
        mov     edx, DWORD PTR 12[ebp]
        mov     DWORD PTR [eax], edx
# OOPsucks2.cpp:27:     y = newY;
        mov     eax, DWORD PTR 8[ebp]
        mov     edx, DWORD PTR 16[ebp]
        mov     DWORD PTR 4[eax], edx
# OOPsucks2.cpp:28:     z = newZ;
        mov     eax, DWORD PTR 8[ebp]
        mov     edx, DWORD PTR 20[ebp]
        mov     DWORD PTR 8[eax], edx
# OOPsucks2.cpp:29: };
        nop
        pop     ebp
        ret

In the non-OOP listing, the eax register was loaded with a reference to the stack, mov eax, DWORD PTR 8[ebp] representing the pointer p. Likewise, in the OOP listing, the same line of code was used to load the compiler generated pointer this. It's exactly the same thing except that one developer believes they are an enlightened Object Oriented Programmer and the other believes they are a crusty die-hard. There is not any magic to a method and most C and Assembly Language programmer know they can just call them manually if they prep the parameters to the calling convention right.

What About Inheritance

For just the simple case example above, there's truly no reason to use OOP over the non-OOP paradigm. Looking at inheritance and polymorphism, however, is where the argument for OOP begins to emerge at the high-level. For these examples, the data regarding a point will be extended to include a velocity. Therefore, the code to manipulate and show this new data must also be modified and added.


struct MovingPoint
{
   int x, y, z;
   int velocity;
};

To "extend" the structure in non-OOP requires one of two approaches. Both require defining an entirely new structure, but one can be a clean rewrite [above] and the other nested [below]. The nested approach certainly uglifies the code, perhaps requiring saying something like p->coordinates.x instead of the more intuitive, for an extension, p->x which is more like what an OOP developer gets to do.


struct MovingPoint
{
   struct Point coordinates;
   int velocity;
};

The notion of inheritance means a developer defines a new object as an extension of a base class - thereby inheriting the base's methods and attributes. Syntactically, it's clean. The following example, adds prototypes for a new method and an additional data attribute.


class MovingPoint: public Point
{
   public:
    void set( int newX, int newY, int newZ, int newVelocity );
    void changevelocity( int modVelocity );
    void show( ) const;

   protected:
    int velocity;
};

Perhaps for the original developer this makes sense as they had to explicitly know what object they wanted to extend from. But to another developer that was not aware of the base class, is this really more obvious? While the inheritance model definitely presents a simple, extended object for attribute referencing, etc., just looking at the object definition does not make it obvious what all of the methods and attributes it now has. Remember that in OOP, these chains of inheritance can often be several nests deep requiring quite a bit of backwards research to know what your object really consists of. Furthermore, defining a subclass required writing a new definition anyway, is it really all that much more work to just define a new, independent data structure that consists of the desired fields that is self evident in itself?

Method Overloading

To the credit of OOP, three methods did not have to be rewritten at all, two were overloaded to address the additional velocity attribute, and one new method was added.

In the non-OOP code, the set(), movex(), movey(), movez(), and show() all had to be rewritten plus changevelocity() was added. Admittedly, each required a new label to differentiate which function was for which data type whereas the OOP compiler handles that for you. The various moven() functions were an annoying case because they're essentially identical code simply with a new function name. NOTE: Yes, in C it would be possible to utilize the same functions knowing that a pointer and offset operation could be used so long as the structure fields were in the same positions - but that tenet of DRY (don't repeat yourself) is equally ugly in that fashion in prone to future errors when types are altered.


int __cdecl point_movez (struct Point *p, int z)
{
   p->z = p->z + z;
   return 0;
}

int __cdecl mpoint_movez (struct MovingPoint *p, int z)
{
   p->z = p->z + z;
   return 0;
}

While method inheritance in OOP can be cleaner, as soon as methods begin to get overloaded, the difference versus just implementing what you need again erode quickly. From an under the hood perspective, the calling convention is still exactly the same as detailed above. For example, the compiler simply performed the mapping of a compiler named _ZN5Point5movexEi() method to both Point and MovingPoint name spaces. And for function overloading, there is a _ZNK5Point4showEv() for the Point object and _ZNK11MovingPoint4showEv() for the MovingPoint object.

The bottom line is a developer paying attention to what they are doing in terms of data fields and functions that are actually necessary can achieve OOP-like effects without jumping on the bandwagon creating an obscure encapsulation mess of inheritance and polymorphism. At the assembly code level, the difference between calling an OOP method and a non-OOP function for a record comes down to whether you are smart enough to pass a pointer to the data you want to manipulate or what the compiler to hide it from you.

A Lot Of People Hate OOP

In 2019, Ilya Suzdalnitski posted in Medium an essay titled, "Object Oriented Programming - The Trillion Dollar Disaster." Perhaps a bit extreme in the title, Suzdalnitski writes about reliable code requiring simplicity and that large scale OOP projects are anything but simple.

"Instead of reducing complexity, it encourages promiscuous sharing of mutable state and introduces additional complexity with its numerous design patterns. OOP makes common development practices, like refactoring and testing, needlessly hard."

The article goes on to discuss shared mutability and failures of true encapsulation being the bane of development. Whereas in non-OOP, developers track the data and manipulate it explicitly on demand, with OOP they were trained the object handles it which resulted in a more cavalier attitude towards mistakes. The nature of objects to be altered from so many vectors up and down the inheritance chain or parallel references in multi-processing lead towards unexpected and unpredictable data states. This notion, according to Suzdalnitski, is what drives many organizations to throw their hands up at sustainment and refactor projects to get towards a more predictable and reliable outcome.

A lot of the hate towards OOP really stems from the implementations of OOP offered by the languages. C++ in particular tends to receive the brunt of the displeasure, with perhaps one of the more famously quoted outbursts coming from Linus Torvalds in 2007 to the gmane.comp.version-control.git mailing list.

"C++ is a horrible language. It's made more horrible by the fact that a lot of substandard programmers use it, to the point where it's much much easier to generate total and utter crap with it." ~ Linus Torvalds

In 2016, Charles Scalfani wrote in Medium "Goodbye Object Oriented Programming," an article providing examples of where OOP simply fails. Quite glaringly, his examples were not difficult to achieve edge cases but completely simple situations that present themselves too frequently. Scalfani discusses the deeply nested inheritance problem and how upstream modifications to the parent classes can have devastating downstream effects to the behavior or inheriting children - the Fragile Base Class problem. He addresses the Diamond Problem where multiple inheritance creates conflict particularly on method selection. His continued examples are simple and commonplace. Personally, I like his cited quote from the creator of Erlang for the ultimate issue.

"The problem with object-oriented languages is they’ve got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle." ~ Joe Armstrong

Brian Will put together the YouTube video "Object-Oriented Programming is Bad" [above], visually addressing the issues of OOP. Yegor Bugayenko collected a series of quotes from prominent computer scientists in the field - all basically maligning OOP. Dijktra, Graham, Raymond, Allman, and Armstrong to name but a few. Not too long after the Suzdalnitski essay began spreading across forums, David Cassel analyzed the explosive comment threads where developers immediately began arguing for and against OOP. Certainly a lot of the proponents for OOP made the arguments the haters simply did not know how to properly use it. The "woke from OOP" revelations are hardly new as many of the same arguments were written about before like John Barker's "All Evidence Points to OOP Being Bullshit" or community edited wikis maintaining long-running enumerations cataloging OOP's failures.

NOTE: I certainly fall into the improper use of OOP pool and acknowledge my examples in the preamble are certainly not pure but were meant to demonstrate that under the hood, there was nothing special.

Alan Kay is credited as the father of OOP while working at ARPA in the 1960s. Nearly three decades later in a keynote speech to OOPSLA, he basically lamented that modern OOP, as presented in the popular programming languages, was not at all what it was meant to be. There were many things OOP was supposed to be and OOP, in his words, was supposed to better at data isolation, extreme late-binding, and intent messaging to encapsulated things.

"I invented the term Object-Oriented, and I can tell you I did not have C++ in mind." ~ Alan Kay

Bottom Line

I hate OOP and many other developers do, too. Nevertheless, in January of 2021, only two of the top ten programming languages were not OOP-centric. People are different and the problems they solve are different so there will never be a definitive conclusion on the matter and thus, the debates will rage on ad infinitum.