Jan 26, 2007

You're being lied to

If you're among the crowd who have migrated an OOP based application from PHP4 to PHP5, then I'm sure you've heard the expression "Objects are copied by reference by default in PHP5". Whoever told you that, was lying.

Now, to be fair, it's an innocent lie, since objects do behave in a reference-like manner, but references are NOT what they are. Let's start with a simple illustration proving that they aren't references:

<?php
$a = new stdClass;
$b = $a;
$a->foo = 'bar';
var_dump($b);
/* Notice at this point, that $a and $b are,
* indeed sharing the same object instance.
* This is their reference-like behavior at work.
*/

$a = 'baz';
var_dump($b);

/* Notice now, that $b is still that original object.
* Had it been an actual reference with $a,
* it would have changed to a simple string as well.
*/
?>

What's going on here? Well, the answer is easiest to explain by explaining what the underlying structure of objects are. In PHP5, a variable containing an object identifies the instance by storing a simple numeric value. When an action is going to be performed on an object, that numeric value is used with a lookup table to retreive the actual instance. In PHP4, by contrast, a variable containing an array identifies that object by carrying around the actual properties table itself. What this means in practice is that when you assign (not by reference) a PHP5 object to a new variable, that integer handle is copied into the new variable, but it still points at the same instance, because it's still the same number. Assigning a PHP4 object however, means copying all the properties, effectively generating a new instance, since changes to one will not effect the other.

To put this another way, PHP4 objects are basically Arrays with functions associated with them, PHP5 objects are basicly Resources (a la MySQL result handles, or file pointers) again with functions loosely associated to them. Consider the following code in PHP4 (or any version):

<?php
$fp = fopen('foo.txt', 'r');
$otherVar = $fp;
fwrite($fp, "One\n");
fwrite($otherVar, "Two\n");
fclose($fp);

/* This fails, because the file is closed */
fwrite($otherVar, "Three\n");
?>

You'd fully expect data to be written to the same file, as though you'd used $fp everywhere, rather than interchanging the variables right? Well, PHP5 objects are the same. The instance itself isn't duplicated when you assign to a new variable, just the unique identifier.


I'm lying to you also

"Copying" a variable doesn't exactly mean copying. Take the following code block:

<?php
$a = 'foo';
$b = $a;
$a = 'bar';
?>

Now, you know PHP well enough to know that by the end of this code block, the value of $b will still be 'foo'. What you may not know, is that the original copy of 'foo' that was in $a, was never actually duplicated.

To understand what PHP is doing, you need to understand the internal structure of the variable and how it relates to userspace visible variable names ('a' and 'b' in this case). First off, the actual contents of a variable (known as a zval) consists of four parts: type (e.g. NULL, Boolean, Integer, Float, String, Array, Resource, Object), a specific value (e.g. 123, 3.1415926535, etc...), is_ref - a flag indicating if the value is a reference or not, and refcount which tells how many times this value is being shared.

What you think of as a variable (e.g. $x) is actually just a label, that label ('x' in this case) is used as a lookup to find the zval which conatins the actual value. These are just like keys in an associative array, in fact, the mechanisms are identical.

With me so far? Good. Now, when you first create a variable (e.g. $x = 123;, PHP allocates a new zval for it, stores the specific value, and associates the label with the value:

  'x' => zval ( type       => IS_LONG,
value.lval => 123,
is_ref => 0,
refcount => 1 )

So far, refcount is 1 since the zval value is only being referenced by one label. If we now put this value into a full-reference set using $y =& $x;, the same zval is reused. It's simply associated with a new label and it's reference counters are adjusted properly.

  'x' => zval ( type       => IS_LONG,
| value.lval => 123,
| is_ref => 1,
| refcount => 2 )
'y' /

This way, when you later change the value of $x, $y appears to change as well because it's looking at the same internal value. But what if we hadn't done a reference assignment, what if we'd done a normal assignment: $y = $x;, surprisingly, the result would be almost the same.

  'x' => zval ( type       => IS_LONG,
| value.lval => 123,
| is_ref => 0,
| refcount => 2 )
'y' /

Again, the original zval associated with $x is reused, the only difference this time is that is_ref is not set to 1. This is known as a copy-on-write reference set (as opposed to the full-reference set described above). This 0 flag tells the engine that if anyone tries to change this value (regardless of which label they use to reach it), any other references to it should be left alone. Here's what happens if we take that current state and do $x = 456;

  'y' => zval ( type       => IS_LONG,
value.lval => 123,
is_ref => 0,
refcount => 1 )
'x' => zval ( type => IS_LONG,
value.lval => 456,
is_ref => 0,
refcount => 1 )

$x has been disassociated from the original zval (thus dropping its refcount back to 1), and new zval has been created for it.


Why referencing when you don't have to is a bad idea.

Let's consider one more situation, take a look at this code block:

<?php
$a = 'foo';
$b = $a;
$c = &$a;
?>

At the first instruction, a single zval is created, associated to a single label:

  'a' => zval ( type => IS_STRING, value.str.val = 'foo', is_ref = 0, refcount = 1 )

At the second intstruction, that zval is associated to a second label, so far so good:

  'a' => zval ( type          => IS_STRING,
| value.str.val => 'foo',
| is_ref => 0,
| refcount => 2 )
'b' /

At the third intstruction, however, we run into problems. Since this zval is already tied up in a copy-on-write reference set which include $b, that zval can't be simply promoted to is_ref==1. Doing so would drag $b into $a and $c's full-reference set, and that would be wrong. In order to resolve this, the engine is forced to duplicate that zval into two identical copies, from which it can begin to shuffle around reference flags and counts:

  'b' => zval ( type          => IS_STRING,
value.str.val =>'foo',
is_ref => 0,
refcount => 1 )
'a' => zval ( type => IS_STRING,
| value.str.val => 'foo',
| is_ref => 1,
| refcount => 2 )
'c' /

Now you've got two copies of the same literal value, so you're wasting memory for the storage, and processing time required to actually make the duplication. Since a LOT of events lead to copy-on-write uses (including simply passing an argument to a function), this sort of forced duplication actually happens very commonly when you start involving actual references.


The moral of the story

Assigning values by references when you don't need to (in order to later modify the original value through a different label) is NOT a case of you outsmarting the silly engine and gaining speed and performance. It's the opposite, it's you TRYING to outsmart the engine and failing, because the engine is already doing a better job than you think.

How does this reflect on objects? They're not special. They're not different from other variables. They are not pretty snowflakes. In this code block:

<?php
$a = new stdClass;
$b = $a;
?>

The labels are still placed into copy-on-write reference sets. What's important, is that even when a duplication does occur, (A) only that unique integer is copied (which is cheap), and (B) the duplicated integer still points to the same place. Hence you get reference-like behavior, but not an actual reference by default.

Hungry for more? Check out my coverage of the zval.

12 comments:

  1. Hey Sara,

    This is great stuff, thanks. I've been coding in PHP for 4 years now after 15+ years in other languages and while I knew the ins and outs of the other key languages I coded in I simply have not ever learned the details of PHP and reference assignment but recently I've really needed to know it. So thanks!

    -Mike

    ReplyDelete
  2. Wouldn't this be read only, and thus not allow any use of fwrite... in your example above?

    $fp = fopen('foo.txt', 'r');

    ReplyDelete
  3. This is the best of all article that I have read till now about php references.

    ReplyDelete
  4. Hi Sara,

    This is really a great article about how PHP does handles variables and references.
    But, unfortunatly, your "Check out my coverage of the zval." link appears to be dead:

    >Error: 404
    >Sorry, no posts matched your criteria.

    If you still have a copy, could you please either post it on your blog or renew link in the article?

    ReplyDelete
  5. cool article, you should blog more about this stuff! and i see you haven't blogged recently:(

    ReplyDelete
  6. The updated link to "Check out my coverage of the zval":
    http://devzone.zend.com/article/1022
    that redirects to:
    http://devzone.zend.com/317/extension-writing-part-ii-parameters-arrays-and-zvals/

    ReplyDelete
  7. Amazing, you should definitely blog more!

    Thanks a lot.

    ReplyDelete
  8. I have a question about it.

    If I do:

    class MyClass {

    private $someObject;

    public function __construct( SomeObject $someObject ) {

    $this -> someObject =& $someObject;
    }
    }

    Am I using for class property the same allocated space in this huge table designed to $someObject?

    ReplyDelete
  9. The very first example seems to contradict your premise. That IS how references work... If you have two variables referencing the same thing, and you change one to reference something else, the first does not (and should not) change to point to the new thing also. That would completely defeat the purpose of being able to use multiple handles to refer to the same thing.

    ReplyDelete
  10. I'm hungry for more! But the "coverage of the zval" link at the bottom of the page appears to be broken, unfortunately.

    ReplyDelete
  11. Excellent!
    Thanks. I will read your zval article tomorrow...

    ReplyDelete
  12. Great great article that made it look so easy to understand the zvals and how php works internally. I had read about zvals elsewhere but none made much sense. Thanks for super article. You must keep on writing. Thanks

    ReplyDelete

Note: Only a member of this blog may post a comment.