Why collections?

6 minute read

Hack has introduced some new object to replace the PHP array. These collections are Vector, Set, Map, Pair and some more. If you are familiar with Java or C# you probably know the benefits of each of these collections already. You might be thinking:

Why would you want to replace the PHP array? I use it for everything...

That is the problem. PHP arrays are used everywhere and for everything. The keys might be ordered, unordered, strings, integers. A value might be a boolean, string or another array. It is impossible to predict the behavior of an array. PHP arrays are really “one size fits all”.

So, why collections?

Each collection object solves a specific problem. When you need to have a collection of unique values you should use a Set. If you need a mini data storage that you refer to with keys, you probably should use a Map.

Using collection objects have multiple benefits over the PHP array:

  • Better performance because the collection objects can be cached more aggressively.
  • PHP arrays are not objects they will be copied each time you pass them as a function parameter. Objects are passed by reference.
  • You may use the type checker with the collection objects
  • Collection object have function like filter(), map(), contains() etc to make them easier to work with.

Immutable instances

Most of the collection objects have an immutable version. Say that you have a Map that you have populated with some data. At some point in the Map object’s lifecycle you know that it should not be changed any more. Then you may create an ImmMap that does not allow any changes.

function getFruits()
{
$basket = Map {'apples'=>4, 'oranges'=>8, 'peaches'=>2};
$basket['bananas']=12;


return $basket;
}


function askAStrangerToHoldBasket($basket)
{
// ...
return $basket;
}


$basket=getFruits();
$returnedBasket = askAStrangerToHoldBasket($basket->toImmMap());


//you can be sure that the stranger has not added or removed anything.

Immutable objects can be cached more aggressively because we know for sure that they will not change its state. They are also thread safe. But the most obvious reason is simplicity. They are easier to understand and predict.

If you design a card game, you could have each card being a immutable object since you know the cards are not going to change their values or suits… You will, however, never encounter a situation where you are required to use immutable objects. You could easily design the card game with mutable objects.

Vector

A Vector is an object with keys and values. The keys are integers starting from zero. Like in Java and C# you may access each element in the array on O(1) (constant time). You may also insert elements in the end of the vector at O(1).

function foobar()
{
$v = Vector {'Foo', 'Bar'};
$v[]='Baz';


echo $v[1]."\n\n";


$v->add('Biz');
var_dump($v);
}


foobar();

Output:

Bar


object(HH\Vector)#1 (4) {
[0]=>
string(3) "Foo"
[1]=>
string(3) "Bar"
[2]=>
string(3) "Baz"
[3]=>
string(3) "Biz"
}

As you can see you may use square bracket syntax or the get/set functions. The square bracket syntax is preferable because it is faster and they are familiar to PHP developers.

Use a vector when you need any collection of elements where the index key is not important. This is the common replacement for a non-associative PHP array.

Be warned though. When removing elements from the Vector, the elements will get updated keys. If you try to access an element with a key that do not exist you will get an OutOfBoundsException.

function foobar()
{
$v = Vector {'Foo', 'Bar', 'Baz'};
echo $v[2]."\n\n";
var_dump($v);


echo "\n -- \n";


$v->removeKey(0);
var_dump($v);


//Throws an OutOfBoundsException
echo $v[2]."\n";
}


foobar();

Outputs:

Baz


object(HH\Vector)#1 (3) {
[0]=>
string(3) "Foo"
[1]=>
string(3) "Bar"
[2]=>
string(3) "Baz"
}


--
object(HH\Vector)#1 (2) {
[0]=>
string(3) "Bar"
[1]=>
string(3) "Baz"
}

Map

A Map is like a Vector but you are in control of the keys. At the time of writing you may only use strings and integers as keys but you will eventually be able to use any object. Accessing an element in a Map is slower than in a Vector. Hack internal has to search for the key and then access the value. It has a time complexity of O(lg n).

function foobar()
{
$v = Map {'Foo'=>'Good', 'Bar'=>'Better'};
$v['Biz'] = 'Okey';
var_dump($v);
echo "\n\n";


$v->removeKey('Foo');
var_dump($v);


//Throws an OutOfBoundsException
//echo $v['Baz']."\n";
}


foobar();

Outputs:

object(HH\Map)#1 (3) {
["Foo"]=>
string(4) "Good"
["Bar"]=>
string(6) "Better"
["Biz"]=>
string(4) "Okey"
}


object(HH\Map)#1 (2) {
["Bar"]=>
string(6) "Better"
["Biz"]=>
string(4) "Okey"
}

Set

A Set is a group of values without any keys. You may only store strings and integers more data types will be implemented later. Access in a Set has the time complexity of O(lg n) because Hack has to do a search for the element within the Set. You may create a Set from a Map or Vector by running the function toSet().

The special feature with Sets is that it can’t contain doublets. Each value is unique. If you try to assign the same value twice it will just be ignored. You might have used an array like this before:

//php
function getFruits(array $stores)
{
$fruits=array();
foreach ($stores as $store) {
foreach ($store->getFruits() as $fruit) {
if (/* condition */) {
$fruits[$fruit->getName()]=true;
}
}


//return a list of unique fruit names that will meet the conditions
return array_keys($fruits);
}

This is where you should use a Set.

//hack
function getFruits(array $stores)
{
$fruits=Set{};
foreach ($stores as $store) {
foreach ($store->getFruits() as $fruit) {
if (/* condition */) {
$fruits[]=$fruit->getName();
}
}


//return a list of unique fruit names that will meet the conditions
return $fruits;
}

When you are working with Sets you may use intersect.

function foobar()
{
$foo=Set{'A', 'B', 'C'};
$bar=Set{'B', 'D', 'A'};


$baz=array_intersect($foo, $bar);
var_dump($baz);
}


foobar();

Outputs:

array(2) {
["A"]=>
string(1) "A"
["B"]=>
string(1) "B"
}

Pair

A Par is like an immutable vector with 2 elements. I’ve not jet figured out any good use case for a Pair. In many cases it will be better to use a tuple or a shape.

Categories:

Updated:

Leave a Comment