Tuesday, February 5, 2008

Evolving the Object Oriented Paradigm Part IV: Contracts

This piece and the next will deal with limiting the "domain" of your functions as well as your objects. It was going to be one long article but this first part is a normal length by itself. Today, we'll start off with design by contract.

Design by contract is nothing new. It's been around for a long time. And yet, many languages today hesitate about putting contracts into the language and end up never adding them (except D. D handles contracts very well).

"Design by contract" deals primarily with functions (or methods but DBC isn't necessarily an OOP feature). DBC monitors a function's pre-conditions (which typically check the incoming parameters) and post-conditions (which typically checks return values and side effects of the function). For example, let's say we're writing a function called sqrt() which takes in a float and returns its square root as a float. We made our own square root algorithm cause we're really smart. And since we're smart, we don't want a negative number to be processed and we want to verify the return value for our algorithm is correct.

Typical Solution

We could do that like this:


float sqrt(float f)
{
if(f < 0.0)
throw new Exception("sqrt error: negative number");

// insert algorithm here

if(result * result != f)
throw new Exception("sqrt error: algorithm incorrect");
else
return result;
}


We got some clutter in there from if-statements, but it's still fairly easy to understand. Right?

As it turns out we're really, really smart and we want to write two square root algorithms. And we want to have the same interface so we make a class for each algorithm and create an interface for them to share. However we want the same pre and post conditions to apply to both algorithms so we'll start with something like this:

interface SqrtAlgorithm
{
float sqrt(float f);
}

class AlgorithmA implements SqrtAlgorithm
{
public float sqrt(float f)
{
if(f < 0.0)
throw new Exception("sqrt error: negative number");

// insert algorithm A here

if(result * result != f)
throw new Exception("sqrt error: algorithm incorrect");
else
return result;
}
}

class AlgorithmB implements SqrtAlgorithm
{
public float sqrt(float f)
{
if(f < 0.0)
throw new Exception("sqrt error: negative number");

// insert algorithm B here

if(result * result != f)
throw new Exception("sqrt error: algorithm incorrect");
else
return result;
}
}


This is correct. But now I have duplicate code running amok. Duplicating something once isn't too terrible. But if we were really, really, really smart and we wrote 10 square root algorithms, the duplicated code appears too much. And in other situations, your pre- and post-conditions may change over time, causing you to update all the duplicated code for every change.

Surely, there must be an already-existing, easier way. And there is: the Template pattern.

Template Solution

The Template pattern is another design pattern that provides a skeleton of an algorithm and leaves some of the actual concrete stuff to its subclasses. So let's apply the Template pattern to our classes:

abstract class SqrtAlgorithm
{
protected abstract _the_real_sqrt(float f);

public final float sqrt(float f)
{
if(f < 0.0)
throw new Exception("sqrt error: negative number");

float result = _the_real_sqrt(f);

if(result * result != f)
throw new Exception("sqrt error: algorithm incorrect");
else
return result;
}
}

class AlgorithmA extends SqrtAlgorithm
{
protected float _the_real_sqrt(float f)
{
// insert algorithm A here
}
}

class AlgorithmB extends SqrtAlgorithm
{
protected float _the_real_sqrt(float f)
{
// insert algorithm B here
}
}


So we had to change SqrtAlgorithm from an interface to an abstract class. SqrtAlgorithm's sqrt() method must be "final" (or not overridable) to stop subclasses from overriding the template. Now we have to rename sqrt() in AlogirthmA and AlgorithmB to _the_real_sqrt() because that's the method used in the template. Also we had to hide _the_real_sqrt() from the public so its not called directly from the public and skip over our template. If you have a good IDE with good renaming/refactoring abilities, this may not be a big deal. But I've permanently attached all my algorithms to a base class when an interface was fine before which I don't like.

Design by Contract Solution

Typically, contracts are two clauses added to the function definition to state the pre-conditions and post-conditions (although D allows blocks of statements which you would fill with asserts). I'll follow the typical syntax for this model at the moment but the syntax isn't really important at the moment.

In my new model, the version of the above code using contracts would look like this:

abstract concept SqrtAlgorithm
{
interface
{
float sqrt(float f)
requires f >= 0.0
ensures result * result == f;
}
}

concept AlgorithmA : SqrtAlgorithm
{
implementation
{
// contracts inherited
public float sqrt(float f)
{
// insert algorithm A here
}
}
}

concept AlgorithmB : SqrtAlgorithm
{
implementation
{
// contracts inherited
public float sqrt(float f)
{
// insert algorithm B here
}
}
}


In this typical syntax design, the "requires" clause handles pre-conditions and "ensures" handles post-conditions with "result" automatically being the return value inside the ensures clause. The conditions are only stated once by the interface because all inheriting methods inherit the conditions as well. If either clause doesn't pass (meaning its expression evaluates to false), then an exception would automatically be thrown stopping further progress.

Much simpler right?

Contracts are an easy way to validate classes are properly implementing or overriding your methods as well as providing easy validation methods for debugging. Plus, a nice compiler would have a switch to turn on and off including contracts because you may have contracts only used for debugging but other times, you'll want to leave the constraints in.

Next article will be on type domains and constraints.

-Kaja

Monday, February 4, 2008

Evolving the Object Oriented Paradigm Part III: States

Continuing off the recent parts of this article, I'll be talking about the lack of "states" in the object-oriented paradigm as well as the State design pattern alternative and its flaws. Then I'll suggest an addition to the concept model from the last part in this series that accommodates states.

Objects should have the ability to change their behavior at run-time based on their "state". For example, a shark behaves differently when he's hungry than when he's not. Its behavior changes, but unlike typical OOP, you're not subclassing. You don't want to write a "HungryShark" class and allocate a new one to replace the non-hungry Shark. It's the same Shark. But now he's hungry. Another example is a Stack class that behaves differently depending on whether its empty, full, or other. We'll be looking at this example from three approaches: "state-less" design, the State design pattern, and my proposal.

State-less design

We'll start by looking at some code which serves as a Stack collection class that uses an internal array for storage. For this example, the Stack will have a maximum height which is passed into the constructor during initialization. And we'll show the first two designs in Java here and use my model for the end proposal.

    class Stack
{
protected Object[] items;
protected int height;

public Stack(int maxSize)
{
items = new Object[maxSize];
height = 0;
}

public void clear()
{
// only clear if not already empty
if(height != 0)
{
// memory wasting solution
items = new object[items.length];
height = 0;
}
}

public boolean empty() { return height == 0; }

public int getHeight() { return height; }

public Object peek()
{
if(height == 0)
throw new Exception("illegal peek: empty stack");

return items[height - 1];
}

public Object pop()
{
if(height == 0)
throw new Exception("illegal pop: empty stack");

object o = items[height - 1];
height--;
return o;
}

public void push(Object o)
{
if(height == items.length)
throw new Exception("illegal push: stack full");

items[height++] = o;
}
}


This is an example of "state-less design". Do you see how often we have to check the height? That's right. In almost every method. That's way too often, but unfortunately, this is a very typical solution to designing this sort of class. This design has a lot of time wasted trying to determine if the stack if empty (or full for push()) which are only the extreme cases.

The State Pattern

One of the fundamental OO design patterns is the State Pattern. The State pattern allows an object-oriented way to change an object's behavior dependent on its "state" as I mentioned in the beginning. For example, in the above Stack class, a call to pop() when the stack is empty is very different from when its not empty.

The solution to this is the State Pattern. Using the state pattern, we would have our Client maintain a "state" variable which kept track of what state it was currently. Then each State would have it own distinct class that inherits a common State interface. And in this example, we'll have a Context class which holds the state so the state classes can modify the Stack's state.

    class Stack
{
private StackContext context;

public Stack(int maxSize)
{
context = new StackContext(maxSize);
}

public void clear()
{
context.getState().clear();
}

public boolean empty()
{
return context.getState().empty();
}

public int getHeight()
{
return context.getState().getHeight();
}

public Object peek()
{
return context.getState().peek();
}

public Object pop()
{
return context.getState().pop();
}

public void push(Object o)
{
context.getState().push(o);
}
}

class StackContext
{
public Object[] items;
public int height;
protected StackState state;

public StackContext(int maxSize)
{
items = new Object[maxSize];
height = 0;
state = new EmptyState(this);
}

public void setState(StackState s)
{
state = s;
}

public StackState getState()
{
return state;
}
}

abstract class StackState
{
protected StackContext context;

public StackState(StackState s)
{
this.context = c;
}

public abstract void clear();
public abstract boolean empty();
public abstract int getHeight();
public abstract Object peek();
public abstract Object pop();
public abstract void push(Object o);
}

class EmptyState extends StackState
{
public EmptyState(StackContext context)
{
super(context);
}

public void clear()
{
// already empty. do nothing.
}

public boolean empty() { return true; }

public int getHeight() { return 0; }

public Object peek()
{
throw new Exception("illegal peek: empty stack");
}

public Object pop()
{
throw new Exception("illegal pop: empty stack");
}

public void push(Object o)
{
items[height++] = o;
context.setState(new DefaultState(context));
}
}

// We'll call the state that is non-empty and non-full the "default state"
class DefaultState extends StackState
{
public DefaultState(StackContext context) {
super(context);
}

public void clear()
{
// memory wasting solution
context.items = new object[items.length];
context.height = 0;
context.setState(new EmptyState(context));
}

public boolean empty() { return false; }

public int getHeight() { return context.height; }

public Object peek()
{
return items[context.height - 1];
}

public Object pop()
{
Object o = items[context.height - 1];
context.height--;
if(context.height == 0)
context.setState(new EmptyState(context));
return o;
}

public void push(Object o)
{
items[context.height++] = o;
if(context.height == context.items.length)
context.setState(new FullState(context));
}
}

// inherits some behavior from the "default state"
class FullState extends DefaultState
{
public FullState(StackContext context) {
super(context);
}

public Object pop()
{
Object o = super.pop();
context.setState(new DefaultState(context));
return o;
}

public void push(Object o)
{
throw new Exception("illegal push: stack full");
}
}


Wow, that is ugly, isn't it? Almost makes you want to quit programming right now to avoid it. For a pattern that's supposed to simplify dynamic behavior, that's a lot of new classes and code. I had to add an abstract StackState class was inherited by our three state classes EmptyState, DefaultState, and FullState. Also, I needed to create a StackContext class to hold the stack's state as well as move all the stack's private data into the context. This was done so that the state classes could modify the Stack's state and internals without having to them public which would allow anyone to change the Stack's state to anything at any time (and yes, in Java, I could have used inner classes to get around contexts but pretend we don't have inner classes). While some classes may allow their state publicly viewable, others will not. I chose hiding the state for this example because it shows the more complicated side of this pattern.

I like the State pattern and use it when I can but that's a lot of work to get rid of a few if-else/switch statements if you ask me. We have new "if" statements to check in state on push() and pop() but the other methods don't have to check the state too which is nice. While each of the states' methods are simpler, the whole design is a lot more complicated. This design needs six classes to represent essentially one concept that changes its behavior between 3 states. We're definitely clogging up the namespace now and typically classes that use the state pattern have more than 3 states so this isn't very nice easy to maintain if it needs to grow.

Also note that none these states add any data of their own. Each StackState class only needs to hold onto the context but adds no data of its own. And yet, every time we switch states we have to allocate a new state object. So in the State design pattern, you're not really changing your behavior, you're allocating a "behavior object" (so to speak) that's really doing the work for the actual object.

The new "state"

Now its time for the proposal. If you haven't guessed by now, the proposal is about incorporating states into the OO model. Imagine languages with builtin state support based on Cecil's "predicate objects" only simplified. Imagine something like this:

    concept Stack
{
interface
{
void clear();
bool empty();
int getHeight();
object peek();
object pop();
void push(object o);
}

implementation
{
protected Object[] items;
protected int height;

public Stack(int maxSize)
{
items = new Object[maxSize];
height = 0;

// Every object has a builtin "state".
// This class starts in the EmptyStack state.
state = EmptyStack;
}

// Basic method implementations of the interface
// are known as the "default state"
// In this example, same as previous "DefaultState" class
// A class will start in the default state if no
// state set in constructor.

public void clear()
{
// memory wasting solution
items = new object[items.length];
height = 0;
state = EmptyStack;
}

public bool empty() { return false; }

public int height() { return height; }

public Object peek()
{
return items[height - 1];
}

public Object pop()
{
object o = items[height - 1];
height--;
if(top == 0)
state = EmptyStack;
return o;
}

public void push(Object o)
{
items[height++] = o;
if(height == items.length)
state = FullStack;
}

// Formal definition of the EmptyStack state.
// states implicitly "inherits" the methods
// from the "default state"
// and can override any it wishes.
state EmptyStack
{
void clear()
{
// already empty. do nothing.
}

bool empty() { return true; }

int getHeight()
{
return 0;
}

object peek()
{
throw new Exception("illegal peek: empty stack");
}

object pop()
{
throw new Exception("illegal pop: empty stack");
}

void push(object o)
{
// Revert to the default state
state = default;

// Calls the default push() instead of
// this push() since state changed back to default
push(o);
}
}

// The stack-full state.
// Only needs to override two methods.
state FullStack
{
void push(object o)
{
throw new Exception("illegal push: stack full");
}

object pop()
{
state = default;
return pop();
}
}
}
}


We invented a solution much simpler than using the State Design Pattern by hand but still bigger and more flexible than the original Stack class design.

So how does it do this? Well it's important to note that the states belong to an implementation, therefore they can access the object's internals and state without needing a context or publicly exposing the internals (sort of like inner classes). However, states are not types. You can not declare an EmptyStack variable or call an EmptyStack constructor because they're not types. They don't even exist outside the implementation (except in an implementation that inherits it). If they were types, this:
    EmptyStack s = new EmptyStack();
s.push(new Object());


would break static typing on the second line because its no longer an EmptyStack and this would cause the world as we know it to implode (or so I've been told).

As of right now in the design, states do not have member variables (aka fields). This allows state switching to occur without any memory allocations at all. However, it would be easy to add fields to be internal to the state which would require an allocation to make room for those fields during a switch or we could follow the Cecil route and have the object size padded on allocation to fit the fields on its largest state (like the C/C++ "union"). The second option requires no memory allocations to switch but may have some unused space depending on the frequency of the state and size of all its fields. If states had fields, they would get their own constructor-esque functions called when switching to that state.

I really wish I had these now.

-Kaja

Thursday, January 31, 2008

Evolving the Object Oriented Paradigm Part II: Concepts

In the last article, I talked about problems of properly modeling conceptual types (in that case, Shape, Rectangle, Rhombus, and Square) to actual types in a typical object-oriented language. I wrote about the rigidity of its inheritance and difficulty of evolving your code or using third-party code.
In this article, I'll propose two simple changes to the existing OOP model that fix all the problems I addressed. In future articles, I'll be proposing more new features but for now I'll be focusing on just two.

The "concept"

In the last article, I used the words "concept" and "conceptually" a lot. It is my opinion that the conceptual idea/model should be as close as possible (which is a tenant of something called "concept programming" which takes this idea to a whole new level that I will not).

For this first proposal, I have thrown out classes and interfaces as the base units of the object-oriented paradigm. Now you're asking, "Thrown out? How can you have OOP without classes and interfaces?" Calm down. They're not gone. But they're no longer seperate units. Instead, classes and interfaces have been merged into a new unit called the "concept" (may be a temporary name. not to be confused with concepts from concept-oriented programming or concept programming which are also not the same thing).

A concept therefore consists of two pieces: the "interface" and the "implementation" (Not the same as the Objective-C equivalents of these words either. It's hard to come up with original names).

The "interface"

Let's take a look at a Shape concept and explain by example:

   abstract concept Shape
{
interface
{
int numSides();
float getArea();
}
}


An interface is a series of public methods that must be implemented by all concepts that inherit Shape. It is similar to interfaces in language like Java only interfaces except these are not a separate type (so you can't name the interface). A concept's interface is enclosed in a block as seen above.

An "abstract" concept either has no implementation or an incomplete one that can't be instantiated. More on implementations in a bit. Like the last article, Shape is abstract and will have no implementation so this is all we need to specify for Shape.

Concept inheritance

As with any OO language, there has to be some type of inheritance. Concept inheritance describes an "is a" relationship.

We need to declare our Rectangle concept which inherits Shape so let's declare just the interface part for now:

   concept Rectangle : Shape
{
interface
{
int getHeight();
int getWidth();
}

// implementation to be filled in later
}


(Concept inheritance is represented by ":" in this example but that's irrelevant. Used for simplicity.)

A Rectangle "is a" Shape. But concept inheritance is "interface inheritance" not class/implementation inheritance. Rectangle inheriting Shape means Rectangle is agreeing to implement Shape's interface (in addition to Rectangle's own interface).

Now some of you are asking, "No more implementation inheritance?" Don't worry. It's here. We'll get to it later.

The "implementation"

Now, let's fill in all of Rectangle:

   concept Rectangle : Shape
{
interface
{
int getHeight();
int getWidth();
}

implementation
{
protected int height, width;

public Rectangle(int h, int w)
{
height = h;
width = w;
}

public float numSides() { return 4; }
public float getHeight() { return height; }
public float getWidth() { return width; }
public float getArea() { return height * width; }
}
}


An "implementation" is the default implementation of a concept's interface within that concept. Its like the typical definition of a "class" except its bound to the concept like the interface and must implement all of its concept's interface (unless its abstract). Rectangle fulfills its inheritance requirement because it implements the two methods from Shape and the two methods in Rectangle's interface.

Going in the same order as the last article, we'll add Square next. Remember in the last article that we had to create a IRectangle interface and modified Rectangle to implement the new interface?

   concept Square : Rectangle
{
implementation
{
protected float length;

protected Square(int l)
{
length = l;
}

// Methods to implement the combined interface from Shape and Rectangle
public int numSides() { return 4; }
public float getArea() { return length * length; }
public float getHeight() { return length; }
public float getWidth() { return length; }

// We'll talk about this in a moment
public float getPerimeter() { return 4 * length; }
}
}


There's a couple things to note here. First, Square does not specify an interface block because it has nothing of its own to add to the Rectangle interface. Second, we didn't have to create any new types. A concept always has an interface and therefore can always act like one.

Some of you are asking, "Well what if a coder creates a new concept which doesn't inherit anything and doesn't specify any interface but specifies an implementation?" The answer is only the methods specified in the interface are publicly callable on a concept variable. For example, in my new Square concept, there is a public getPerimeter() method in the implementation but not in its inherited interface. Well then its not visible on the Square type. So the second line:

   Square s = new Square(6.0);
float p = s.getPerimeter();


will throw a compile time error saying getPerimeter() is not a method of Square's interface.

But sometimes you'll want to call getPerimeter() publicly. Sometimes you want to know if a variable is part of the concept just by the interface ("conceptually a Square") or whether it uses (or inherits) the concept's implementation ("physically a Square"). I'll use the type modifier "actual" to denote "physical" relationships.

   Rectangle r1 = new Rectangle(3.0, 4.0); // Legal
actual Rectangle r2 = new Rectangle(3.0, 4.0); // Legal

Rectangle r3 = new Square(6.0); // Legal
actual Rectangle r4 = new Square(6.0); // Illegal. Square does not inherit Rectangle's implementation.

Square s1 = new Square(6.0); // Legal
float p1 = s1.getPerimeter(); // Illegal. getPerimeter() not in Square interface

actual Square s2 = new Square(6.0); // Legal
float p2 = s2.getPerimeter(); // Legal


So this "actual" keyword lets the programmer distinguish between which of the two inheritance types the object really is if needed. More on implementation inheritance right after this bold header:

Implementation inheritance

Implementation inheritance is a wonderful thing. Sometimes. When it's not abused. Everyone's seen it. Blindly inheriting file streams, collections, etc. and not changing any of its behavior rather than using them normally. People overriding one method with no idea about the base class internals and what effects this will have on the object and on other methods. It happens.

With this new concept design, I'm hoping it will encourage implementing the interface as opposed to automatically inheriting. But implementation inheritance is still necessary, so of course, it's included in this little design.

But first, let's define a Rhombus concept like the last article and show the Square implementing Rhombus's interface first.

   concept Rhombus : Shape
{
interface
{
float getSideLength();
}

implementation
{
protected float length;

public Rhombus(float l)
{
length = l;
}

int numSides() { return 4; }
float getArea() { return length * length; }
float getSideLength() { return length; }
}
}

concept Square : Rectangle, Rhombus
{
implementation
{
// rest of the implementation is the same as previous Square definition

public float getSideLength() { return length; }
}
}


Notice again that we didn't have to define an IRhombus type. We have 4 actual types representing 4 conceptual types.

Also note that Square can inherit multiple concepts easily because we're just inheriting the interfaces and combining them.

Now we'll change Square to inherit the implementation of Rhombus in order to show off implementation inheritance. Implementation inheritance is much like class inheritance in other languages. To change over, all we need to write in the inheritance syntax and remove all fields and methods from Square that Rhombus already has (which leaves only the Rectangle-specific methods).

   concept Square : Rectangle, Rhombus
{
implementation : Rhombus // says inherit Rhombus' implementation
{
public Square(float l)
{
super(l);
}

public float getHeight() { return length; }
public float getWidth() { return length; }
}
}


We just switched from implementing the Square interface manually to inheriting from Rhombus to do most of the implementation for us very easily without creating any new types or making old interface types useless.

Behold the flexibility of a concept.

Factories

Up until now, I had solved most of the problems yesterday except the using of the factory pattern to ensure:

   IRectangle rect = new Rectangle(6.0, 6.0);


really became a Square. So we settled for this factory method:

   IRectangle rect = Rectangle.create(6.0, 6.0);


But why must we settle? Why can't I do this with concepts:

   Rectangle r = new Rectangle(6.0, 6.0);
Square s = (Square) r;


Well you can do this by using the factory pattern built into this little language here. First we'll add a private constructor to the Rectangle implementation to make our next step legal:

   private Rectangle() { }


Simple. Now we'll introduce the "factory constructor". This is a special type of constructor that returns an object. That object must be a non-null object of its declaring concept (in this case Rectangle).

   public factory Rectangle(float h, float w)
{
if(h == w)
{
return new Square(h);
}
else
{
actual Rectangle r = new Rectangle();
r.height = h;
r.width = w;
return r;
}
}


The factory constructor uses the keyword "factory" to distinguish itself from a typical constructor. Unlike the typical constructor, the factory constructor allocates the memory it needs during the constructor, not before. And then at the end of the factory constructor will be a built in check to make sure the returned value is not null. The private Rectangle constructor is a still typical constructor.

So the factory constructor looks like a normal constructor and is very simple to use:

   Rectangle r1 = new Rectangle(3.0, 4.0); // Rectangle
Square s1 = (Square) r1; // Type conversion error. Not a Square

Rectangle r2 = new Rectangle(6.0, 6.0); // Square
Square s2 = (Square) r2; // Works fine.


See how much that simplifies things? Because one does not plan on the factory pattern until they need it and then have to go back and change their code to use factory methods instead of constructors.

And you know what else it's great for? Object pooling/caching. For example, did you know that in Java 5, autoboxing an "int" to an "Integer" calls the static Integer.valueOf(int) method and not Integer's constructor (the other Number wrapper classes do the same thing)? Why does it do this? Because the Integer class has a static array of Integer objects (Byte, Short, and Long do this too) for values -256 to 255, and if the passed in value is in that range, you get the corresponding object from that array, otherwise it allocates a new one. So Integer.valueOf(0) always returns the same object (for that class loader). And if they had factory constructors, they could move that logic into the constructor syntax and save some memory for people who don't know to use Integer.valueOf().

Review

I did not introduce a single revolutionary thought about the object-oriented paradigm in this article. I glued two existing standalone constructs together into the new standalone construct called a concept. I added a special syntax and a tiny bit of new constructor behavior to make factory constructors. That's it. And look at all the things from the last article I solved. These two little ideas provide a great deal more flexibility than the old model.

So why has the object-oriented paradigm stopped evolving? Why are other languages not looking for things like this? Why are they just carbon copying everything?

In upcoming articles, I'll be adding more features to this new model. Some of them new. Some of them reused and modified to fit this model. But all of them about simplifying and adding flexibility to the OO model.

-Kaja

Wednesday, January 30, 2008

Evolving the Object Oriented Paradigm Part I

Object oriented programming is a great paradigm. It's biggest problem is it's dead. "What?" you ask. "Object oriented programming is the majority of the industry and continues to grow with new OO languages with new features all the time!" This is true. But the OO paradigm itself has stopped evolving. The cornerstones of OOP like classes, interfaces, inheritance, etc. are virtually all carbon copies of each other in different languages. Not many languages work towards improving these cornerstones. What was the last language to really improve upon the idea of a "class"? We hide the missing features and rigidity of OOP with design patterns, mixins, delegates, functional programming concepts, etc. which are all good to have but we shouldn't need to rely them.

Proving by example

In this article, I'm going to walk through piece by piece a supposedly simple relationship of just 3 types: Shape, Rectangle, and Square. And for the sake of this article we'll be talking mostly about statically-typed languages like Java, C#, D, etc. and all three types are immutable (meaning you can't change the shape's internal data after instantiated). Sounds easy right? Well let's see and at the end I'll give how I think OOP could be improved to easily accommodate for the problems.


So let's start with the basic type: Shape. A Shape in this example only two methods: numSides() which returns its number of sides and getArea() which returns its area. The Shape class has no data itself at the moment and can't be instantiated so the designer has to choose between an abstract Shape class or a Shape interface (to some this is an easy decision but at the point its more about personal preference).


For this example, I'll assume Shape will never have any data or method implementations so I'll make an interface:


   interface Shape
{
int numSides();
float getArea();
}

Okay. One type done. Easy right? Keep reading.

Now it's Rectangle's turn. I want rectangles to be a concrete (or non-abstract) type so I make a class:

   class Rectangle implements Shape
{
protected float height, width;

public Rectangle(float h, float w)
{
height = h;
width = w;
}

public int numSides() { return 4; }
public float getHeight() { return height; }
public float getWidth() { return width; }
public float getArea() { return height * width; }
}

NOTE: I'm using Java syntax just to be uniform but it doesn't matter.

Now you're saying, "Two types done and I fail to see a problem." Keep reading.


Now it's time for Square. Square represents a fun quirk of OOP. Conceptually, a square is a rectangle where all of its sides are equal. When represented in a programming language, it needs less internal data than its parent class. Squares really only need the side length. Now you could just have a Square constructor that takes in a length and passes it on Rectangle's constructor as both the height and width. However, keeping both the height and the width is redundant. Now, one extra float per Square is trivial on a modern machine. But imagine you need to tens of thousands of them. And imagine this example is about two other classes in the same situation only the subclass needs 1 kilobyte less data than its parent class (maybe the parent uses a 1 kB array for part of an algorithm that the sub classes doesn't use). Now you are saving megabytes of memory in your program. And that can be significant. Especially if your project has a memory usage limit placed on it by your customer (those are always fun).


Since this isn't a third-party supplied class, we can make a IRectangle interface (I hate .NET for this naming convention) that Rectangle and Square will share and we'll place getHeight() and getWidth() in there:


   interface IRectangle extends Shape
{
float getHeight();
float getWidth();
}

class Rectangle implements IRectangle
{
// Implementation is the same as previous example.
}

class Square implements IRectangle
{
protected float length;

public Square(float l)
{
length = l;
}

public int numSides() { return 4; }
public float getArea() { return length * length; }
public float getHeight() { return length; }
public float getWidth() { return length; }
}
Under Java naming conventions, you may keep the interface called "Rectangle" and call the class "RegularRectangle" or "BaseRectangle" or some other fun name. No matter the naming convention you just introduced an unnecessary type. We have 4 actual types to represent 3 conceptual types. And if you had any code before Square was created that used the Rectangle class, you have to go back and change them.

And if Rectangle was from a third-party library that we couldn't change, the best non-inheritance method would be to create an adapter/wrapper class that took in a Rectangle and implemented our IRectangle interface:


   class RectangleWrapper implements IRectangle
{
protected Rectangle rect;

public RectangleWrapper(Rectangle r)
{
rect = r;
}

public int numSides() { return rect.numSides(); }
public float getArea() { return rect.getArea(); }
public float getHeight() { return rect.getHeight(); }
public float getWidth() { return rect.getWidth(); }
}

Hooray for unnecessary classes. Don't get me wrong. The Adapter pattern is great but it's ridiculous sometimes that you have to produce things like what's above. And we introduced yet another type. It took 5 actual types to represent 3 conceptual types.


But wait. There's more.

Okay, going back to our previous 4 type solution. Think that's all of the problems? Thinking it's only one extra type, what's the big deal? Read on.

Okay, let's imagine we have our 4 types: Shape, IRectangle, Rectangle, and Square. And someone (maybe the customer) decides this little library needs a Rhombus type. Piece of cake right?

Conceptually, a rhombus is a 4 sided polygon with all sides of an equal length. By definition, a square is a rhombus. But a square is also still a rectangle. From the current setup you have two options.

The first option is to declare a IRhombus interface, a Rhombus class and make Square also implement Rhombus. This can be seen here:

   interface IRhombus extends Shape
{
float getSideLength();
}

class Rhombus implements IRhombus
{
protected float length;

public Rhombus(float l)
{
length = l;
}

public int numSides() { return 4; }
public float area() { return length * length; }
public float getSideLength() { return length; }
}

class Square implements IRectangle, IRhombus
{
// rest of the implementation is the same

public float getSideLength() { return length; }
}

This solution works. Its flexible. But had to add 2 types. Up to 6 real types representing 4 conceptual types.


The second option is to not have an IRhombus interface and just have Square inherit Rhombus cause they have the same internal data. But conceptually, is a Square a Rhombus acting like a Rectangle? Or is it a Rectangle acting a Rhombus? It's neither because a Square IS both at the same time. And since most statically types OOP languages now disallow multiple inheritance, one must sacrifice true definition for the "next best" definition. In this very simple example, its clear which to pick but that's not always the case. If these Shape classes were more in depth (maybe so they could be drawn into a screen), a Rhombus would have extra data needed like the internal angles of the shape which Square would not need.

And all we did was add a Rhombus. We completely ignored plenty of 4 sided shapes: trapezoids, parallelograms, rhomboids, kites, and quadrilaterals that don't fit into any other category. Then we need round shapes, triangles, pentagons, hexagons, etc.

It gets worse.

Imagine someone using these classes wrote the following code:

   Rectangle rect = new Rectangle(6.0, 6.0);


This is perfectly legal. It's still a valid rectangle. But it's also a square. However any type checks of this object against a Square will fail because someone declared it with Rectangle's constructor. This can happen a lot if the passed in height and width are unknown at compile time and stored in variables.

Typical solution to this? Factory pattern. The factory pattern is a either a method or a class of methods that takes in input and decides which class to allocate and return. We'll put it in the Rectangle class (cause we can't put it in IRectangle where it belongs) as a static method to avoid creating a factory class and for another reason you'll see in a moment.

   // The following is part of the Rectangle class

static public IRectangle create(float height, float width)
{
if(height == width)
return new Square(height);
else
return new Rectangle(height, width);
}

static void test()
{
IRectangle r1 = new Rectangle(3.0, 4.0); // Rectangle
IRectangle r2 = new Rectangle(6.0, 6.0); // Rectangle

IRectangle r3 = Rectangle.create(3.0, 4.0); // Rectangle
IRectangle r4 = Rectangle.create(6.0, 6.0); // Square
}



And to truly prevent the "public" and inheriting classes from using Rectangle's constructor to make Squares like r1 and r2 above, we need to make the constructor private so they only use the factory method. If we had used a seperate factory class, we would have to keep Rectangle's constructor public which takes away the added safety of the static method has. So now that we have the factory method, any old code using "new Rectangle" (and preferably "new Square" too for uniformity) needs to be changed to use our factory method.

And we need to do the same thing to Rhombus.

The Point

Why is good object-oriented design this complicated to change? Unless you design everything perfectly before you write any code (good luck with that in the real world), your code will be constantly evolving. New features and new/different/misunderstood customer requirements evolves the software past the initial design. Changes made today may be taken back out in a month.

The object-oriented paradigm needs a language that can adapt to these things. In the next article, I'll present a proposal for a simpler OOP that deals with all these problems. And it represents Shape, Rectangle, Rhombus, and Square without redundant data (and regardless of third-party or not) in exactly 4 types.

See you tomorrow for Part II.

-Kaja

Sunday, January 27, 2008

Numbers in an Object-Oriented World, Part II

Yes, I have a blog again. I need a place to rant as I design PX. Not spending money on getting my domain name back though when I don't know if this will last.

Over a year ago (on my old site), I wrote a post on representing numbers in an object oriented language. It talked about how most languages ignore the fact that every integer is a rational. Every rational is a real. Every real is complex. And all these are numbers. You could invent a hierarchy it would have to consist of interface or data-less abstract classes because an integer requires far less data than a complex number and having to inherit the complex number's two floating point fields (or whatever they are) isn't what people want to hear.

At least Java made a Number base class. A tiny step in the right direction. C# ignores number hierarchies completely (and don't even get me started on what a copout boxing and unboxing is). But Java misses out on so much more that can be put in there. Java has five integer-data classes: Byte, Short, Integer, Long, and BigInteger. What if I just wanted to know that my Number object was an integer and didn't care which precision it has? Do they share a common Integer interface? Of course not.

Operator overloading exists only to justify mathematical operators in an OO language. But really I only like operator overloading for numeric types. C++'s operator abuse is the prime reason. Operator overloading also presents a fun challenge for number classes. The seemingly simplistic type promotion for operations that we take for granted (such as converting a int to a float because adding it to another float) makes truly representing the difficult when a number's true type isn't known until run time. Also you need to account for everything. For example:

   Number x = someFunc();
Number y = x + 32.0;

If "x" is an Integer, the typical rules say it gets promoted to Float, and then added. Should the Integer operator's know this? Cause is so, Float's operator must know to promote Integers too when if the expression were reversed to "32.0 + x". Should Integer delegate to Float's operators and trust they will know what to do with Integers? Suppose x is an object from some custom number class, the Float needs to know to delegate the operation to the custom number class. And in turn, if the customer number class doesn't know how to add to floats, it may delegate back to Float and it repeats in never ending cycle. So many dilemmas from so simple of a mathematical problem.

Another unfortunate side effect of numbers in OOP is "static classes" (meaning classes with all static data and methods, no constructors, and can't be inherited). It's like the language designers thought they were done and then wondered "what the hell do we do with the math functions like pow and sin?" Good OO practice would say throw them in a normal class that can be instantiated and have a Math interface or something for usability. But the Math object wouldn't need a state so nobody wanted to make a stateless Math class that you needed to make a Math object before using a math function. So they threw them in a "static class" which serves no purpose other than being a namespace. I mean come on. A Java book I own defines a class as "a group of objects that share common state and behavior. A class is an abstraction or description of an object." Well, the Math class can have no objects. It fails this definition on so many levels. Is really that important that every method be in a class? Couldn't they settle for modules?

So why am I ranting? Why is this important to me? Because in my PX design decisions, this is what I'm thinking about. How can I make a (statically-typed) language flexible enough to support true object oriented numbers.

In the future, I'll probably be showing off various PX ideas like replacing the typical Java/C# "class" and "interface" with a more flexible model.

-Kaja