Bohuslav Šimek

A guide to practical usage of PHP FFI

December 8, 2023 17 min read
image

PHP 7.4 introduced many exciting features, such as typed properties, preloading, and the foreign function interface (FFI). Some of these features have been highly anticipated, like typed properties, while others were more surprising. Regardless, it is clear that out of all the features mentioned, FFI is the least self-explanatory.

So, what is FFI? According to Wikipedia, it is "a mechanism by which a program written in one programming language can call routines or make use of services written in another." However, this is just a book definition and it doesn't tell us what real-world problems it can solve. Therefore, a more relevant question to ask might be: What problems does FFI solve in the real world?

FFI can be helpful if you want to reuse code that has been written in a different programming language. For example, you might have a connector to a database that has not yet been ported to PHP. Another potential use of FFI is to speed up certain parts of your code. For instance, you might write an algorithm in a programming language that provides better performance for a specific task, such as performing complicated scientific calculations. Additionally, FFI can allow you to do things that are not supported in your language, such as communicating with hardware or directly accessing memory.

You may be wondering, "Can't we already do all of this with PHP extensions?" The answer is yes. We can accomplish all of the tasks mentioned above using PHP extensions. However, FFI still has its advantages, such as easier usage, maintenance, and deployment, as well as better portability across PHP versions. The main reason for this is that everything can be done in plain PHP, so there is no need to set up a compilation toolkit or change deployment procedures.

Note: All of the provided code samples require a minimum version of PHP 8.1 and an operating system of Linux or Windows with WSL. They cannot be run on standard Windows or Mac OS. The complete source code is available at https://github.com/kambo-1st/php-ffi-duckdb-article, along with a prepared Docker image.

First steps

We can illustrate the basic usage of FFI by rewriting the abs() function, which returns the absolute value of a given number:

<?php

$ffi = FFI::cdef(
    'int abs(int j);', // function declaration in C language
    'libc.so.6' // library from which the function will be called
);

var_dump($ffi->abs(-42)); // int(42)

The first step is to create a proxy object between the library and PHP. One of the main challenges with FFI is mapping functions from one programming language to another. The authors of FFI in PHP chose to handle this by parsing a raw C function definition. Therefore, the first argument of the cdef() function contains a function declaration in C language. The second parameter is the name of the library from which the function will be called. In this case, we are using functions from the standard C library.

But what if we want to use more functions? Do we have to put them all in the cdef() call? No, there is another way to load these definitions, which is by using the FFI::load() function. This function loads a file with all of the definitions and returns an instance of the FFI object. It expects just one parameter: the path to the so-called header file.

Header files are a concept that does not exist in PHP. Header files contain functions, declarations, structures, and macro definitions to be shared between multiple source files. This is important because C requires a forward declaration for all used functions, structures or enums.

Function signatures in PHP and C are quite similar, as PHP is a C-style language. However, there are some differences:

  • Variable types must always be defined.
  • The return type is declared before the function name.
  • Some data types are not supported in PHP and vice versa.

The function is then called through a method with the same name on the proxy object. Simple data types like int, float, and bool are automatically converted during the function call.

A header file can also contain macro definitions. In C, a macro is a fragment of code that has been given a name. Before compilation, the macro is evaluated and replaced by the code it represents. Macros are often used to define constants and create short, simple functions. However, FFI does not support C preprocessor directives and they cannot be included in the header file loaded by FFI.

All of these things may seem complicated and a bit dry, so it would be better to demonstrate them using a practical example. One such example is DuckDB.

DuckDB

DuckDB is an in-process, column-based, SQL OLAP database written in C++ that has no dependencies and is distributed as a single file. It can also be described as "The SQLite for Analytics" with a PostgreSQL flavor. Its main strength is its columnar-vectorized query execution engine, which makes it particularly well-suited for performing analytical queries. However, DuckDB does not have direct support for PHP, making it an ideal use case for the PHP Foreign Function Interface (FFI).

Figure 1: Comparison between different databases.

DuckDB and FFI

If we are trying to use a new library, it is usually best to start with simple examples. This is also the case with DuckDB. In this example, we will:

  1. create a database in memory,
  2. insert some data and
  3. query stored data.

The first step is to download a prebuilt library for Linux from the DuckDB website (https://duckdb.org/docs/installation/ or direct link https://github.com/duckdb/duckdb/releases/download/v0.6.1/libduckdb-linux-amd64.zip). The library archive will also include a header file with all function signatures and structure definitions. Unfortunately, we cannot use this header file directly, as it contains macros that are not currently supported by PHP FFI. We could manually evaluate these macros, but it would be more efficient to use a C++ compiler for that. Unfortunately, we still need to make a few changes. Inside the header file, there are three include directives:

#include <stdbool.h>
#include <stdint.h>
#include <stdlib.h>

All of these directives must be commented out or removed from the file. If we don’t do this, the content of these files will be copied inside our header file. However, copying these files will not work, as they contain not only function signatures but also complete functions for inlining. After that, we are ready to let the C++ compiler resolve our dependencies. One of the most common compilers in the Linux world is GCC, so the example will use it. There is no need to install this compiler; it is preloaded in the provided docker image.

The first step is to prefix the header file with the #define FFI_LIB "./libduckdb.so" directive, which tells PHP the location of the library - the current folder. We can use the following command:

echo '#define FFI_LIB "./libduckdb.so"' >> duckdb-ffi.h

Then we can use the following command for resolving the macros:

cpp -P -C -D"attribute(ARGS)=" duckdb.h >> duckdb-ffi.h

Now we are ready to call DuckDB! We will load our header file by calling the load() method and create two structures: database and connection - by calling the new() method.

$duckDbFFI = FFI::load('duckdb-ffi.h');

$database   = $duckDbFFI->new("duckdb_database");
$connection = $duckDbFFI->new("duckdb_connection");

$result = $duckDbFFI->duckdb_open(null, FFI::addr($database));

if ($result === $duckDbFFI->DuckDBError) {
    $duckDbFFI->duckdb_disconnect(FFI::addr($connection));
    $duckDbFFI->duckdb_close(FFI::addr($database));
    throw new Exception('Cannot open database');
}

Structures are like objects, but without methods. We will discuss their specific usage in more detail later. However, we should talk about data types. Since we are calling the library with the C API, we need to use C data types. They can be roughly divided into three categories:

  1. Basic data types (integer, float, etc.)
  2. Extended data types (arrays, strings, and pointers)
  3. User-defined types (enums, structures)

Basic data types are subject to automatic conversion and do not require any further action. Extended data types require more attention. They must be created with the new() method on the FFI instance. Finally, there are user-defined types, which must be defined in advance using the cdef() function or in the header files. We can then create them with the new() method, as we can already see in the case of the duckdb_database and duckdb_connection structures. Enums can be a little special, as they can be directly accessed on instances of FFI objects. In the case of DuckDB, we will encounter the duckdb_state enum with two possible values: one representing success and one representing failure.

typedef enum { DuckDBSuccess = 0, DuckDBError = 1 } duckdb_state;

Both of the values can be accessed on an FFI instance as dynamic properties. This will come in handy when calling DuckDB methods - we can just compare the returned value against the DuckDBError or DuckDBSuccess enum value.

Now we have all the information we need to open a connection to the database. We will do this using the duckdb_open() method. The method has the following signature:

duckdb_state duckdb_open(const char *path, duckdb_database *out_database);

The method will return an enum value from duckdb_state representing the operation status. Fortunately, we now know how to handle this. The more interesting parameters are path and database. The first parameter, path, represents the file path to the database. Since we are creating our database in memory, we can safely ignore it and pass null as a value.

The second parameter is more interesting. It clearly expects a duckdb_database structure, but the name of the variable is prefixed by an asterisk. This means that the value is expected as a pointer to the duckdb_database structure. A pointer is a variable that stores the memory address of another variable located in computer memory. Unsurprisingly PHP does not have pointers. The closest equivalent in the PHP world is a reference, but they are not the same. There is a whole page in the documentation about this: https://www.php.net/manual/en/language.references.arent.php.

$result = $duckDbFFI->duckdb_connect($database, FFI::addr($connection));
if ($result === $duckDbFFI->DuckDBError) {
    $duckDbFFI->duckdb_disconnect(FFI::addr($connection));
    $duckDbFFI->duckdb_close(FFI::addr($database));
    throw new Exception('Cannot connect to database');
}

$result = $duckDbFFI->duckdb_query(
    $connection,
    'CREATE TABLE integers(i INTEGER, j INTEGER);',
    null
);
if ($result === $duckDbFFI->DuckDBError) {
   // Error handling, memory clean up
}

$result = $duckDbFFI->duckdb_query(
    $connection,
    'INSERT INTO integers VALUES (3,4), (5,6), (7, NULL) ',
    null
);
if ($result === $duckDbFFI->DuckDBError) {
  // Error handling, memory clean up
}

In C, arrays and strings are always passed to functions as pointers. In the PHP implementation of FFI, pointers can be obtained only for C data types such as arrays, strings, or structures by calling the static addr() method. With all of this in mind, this is how the invocation of duckdb_open() should look:

$result = $duckDbFFI->duckdb_open(null, FFI::addr($database));

The next few lines are somewhat similar. After opening the database, we have to connect to it by calling the duckdb_connect() method with a pointer to the opened database. Now we can finally execute queries on the database with the duckdb_query() method. This method expects a connection structure, not a pointer, which can be rather surprising.

duckdb_state duckdb_query(duckdb_connection connection, const char *query, duckdb_result *out_result);

Another interesting thing is the data type of the second parameter, const char *query, which represents an immutable string containing a database query. Working with strings in C is not as straightforward as it is in PHP. Strings are passed to functions as pointers to the char type and are implemented as an array of characters with a fixed size. PHP FFI will automatically convert PHP strings into C strings, but only for function parameters. Returned strings are not automatically converted.

$queryResult = $duckDbFFI->new('duckdb_result');

$result = $duckDbFFI->duckdb_query($connection, 'SELECT * FROM integers; ', FFI::addr($queryResult));

if ($result === $duckDbFFI->DuckDBError) {
    $error = "Error in query: ".$duckDbFFI->duckdb_result_error(FFI::addr($queryResult));
    $duckDbFFI->duckdb_destroy_result(FFI::addr($queryResult));
    $duckDbFFI->duckdb_disconnect(FFI::addr($connection));
    $duckDbFFI->duckdb_close(FFI::addr($database));
    throw new Exception($error);
}

Thanks to this, there is no need for any further action when creating the integers table in DuckDB:

$result = $duckDbFFI->duckdb_query($connection, 'CREATE TABLE integers(i INTEGER, j INTEGER);', null);

Inserting values will be pretty much the same, only the query should be changed to INSERT. So far, we have been just executing queries and not caring about any returned values. This will change with the next query, in which we try to select the inserted values. We will have to provide a pointer to the duckdb_result structure as the third parameter. After the execution of duckdb_query(), the duckdb_result structure will contain the result data and additional information about the result.

This brings us back to structures in PHP FFI. As we already mentioned, they are like objects, but without methods. They have to be defined in advance, and their main purpose is to combine data items of different kinds. Properties of the structure can be accessed in the same way as an object property in regular PHP.

typedef struct {
    // deprecated, use duckdb_column_count
    idx_t __deprecated_column_count;
    // deprecated, use duckdb_row_count
    idx_t __deprecated_row_count;
    // deprecated, use duckdb_rows_changed
    idx_t __deprecated_rows_changed;
    // deprecated, use duckdb_column_ family of functions
    duckdb_column *__deprecated_columns;
    // deprecated, use duckdb_result_error
    char *__deprecated_error_message;
    void *internal_data;
} duckdb_result;

To view the properties of a duckdb_result, we can refer to the header file and search for the definition of duckdb_result. Unfortunately, all the interesting properties have been deprecated in DuckDB version 0.3.2 in favor of individual functions. However, for the sake of example, we can still get the number of columns in the result by accessing the property __deprecated_column_count of the duckdb_result object: $queryResult->__deprecated_column_count. There is no need to take any additional action when accessing this property; it will simply return the number two, as there are two columns. As mentioned before, simple data types are automatically converted.

Now, it's time to correctly retrieve the row and column counts, as we will need them eventually. The number of rows can be obtained using the function duckdb_row_count() and the number of columns can be obtained using the function duckdb_column_count(). Both of these functions expect a pointer to a duckdb_result object and will return integers. There is no need for any type conversion.

Another interesting function for working with the result structure is duckdb_result_error. It returns the error message in a human-readable format. We can try changing the SELECT statement into something invalid and see how DuckDB will inform us about the invalid query.

echo "Number of columns: ".$queryResult->__deprecated_column_count."\n";

$rowCount = $duckDbFFI->duckdb_row_count(FFI::addr($queryResult));
$columnCount = $duckDbFFI->duckdb_column_count(FFI::addr($queryResult));

for ($row = 0; $row < $rowCount; $row++) {
    for ($column = 0; $column < $columnCount; $column++) {
        $value = $duckDbFFI->duckdb_value_varchar(FFI::addr($queryResult), $column, $row);
        echo ($value !== null ? FFI::string($value)  : '')." ";
        $duckDbFFI->duckdb_free($value);
    }

    echo "\n";
}

$duckDbFFI->duckdb_destroy_result(FFI::addr($queryResult));
$duckDbFFI->duckdb_disconnect(FFI::addr($connection));
$duckDbFFI->duckdb_close(FFI::addr($database));

After populating the duckdb_result structure, we can finally print the data. However, we cannot directly traverse the result; we have to use specific functions again. We can get the data by using the duckdb_value_varchar() function. This function takes three parameters: a pointer to the result, the column position, and the row position.

The result of the duckdb_value_varchar function is a string that must be converted into a regular PHP string using the FFI::string() function and, more importantly, must be freed from memory by calling $duckDbFFI->duckdb_free().

One of the biggest differences between PHP and C is the need to call a free function. This is the most complicated area of PHP FFI, as it is often not immediately apparent due to the different memory management models in the two languages. Please note that this article does not aim to provide a comprehensive description of the C memory model, as it is beyond the scope of this article.

PHP is a memory-safe language with a garbage collector, so we don't need to worry about how memory is allocated and freed. The only thing we usually need to consider is the memory limit defined in the PHP configuration. This is not the case with C, where we have to manage memory manually, use pointers, and manipulate memory directly.

Because FFI is just a bridge between PHP and C, we will also have to deal at least with some aspects of the C memory model. This is especially true for passing or returning arrays/strings to C functions. PHP FFI is trying, to some degree, to manage memory for us. For example, all complex data types created by FFI new() method are by default managed by PHP (in the FFI terminology, they are “owned”). So we do not have to care about their life cycle.

However, sometimes this is not enough and we need to be especially careful when dealing with pointers obtained using the FFI::addr() method, individual elements of C arrays and structures, and most data structures returned by C functions.

C pointers are always "non-owned." This can easily lead to the creation of "dangling pointers," which are pointers that do not point to a valid variable. This can occur when the variable being referenced is destroyed, but the pointer still points to its memory location.

One of the most complicated topics is probably handling complex data types returned by C functions. In C, values can be returned in two ways: by value or as a pointer. As previously mentioned, arrays and strings in C are always passed or returned as pointers. The most important question when working with these types is: Who is responsible for allocating and deallocating the memory? In most cases, it is PHP FFI, but sometimes it is the called function.

The DuckDB function duckdb_value_varchar allocates memory without any cooperation from PHP, so after we use it, we have to free the allocated memory using the function duckdb_free: $duckDbFFI->duckdb_free($value);. Why do we have to do this? We don't know how long the returned string will be, so PHP cannot allocate the necessary memory ahead of time. This is the sole responsibility of the duckdb_value_varchar() function. Function duckdb_free() is the only reliable way to free this memory, as it is provided by the authors of DuckDB. Information about how memory is handled is usually described in the function documentation provided by the library author, and DuckDB is no exception. We can find information about memory handling in DuckDB documentation, specifically here: duckdb_value_varchar. If we forget to call this function, we will create a memory leak.

After closely inspecting the documentation, we will quickly realize that the duckdb_value_varchar() function is not the only one that needs to be cleaned up after use. Some other steps must be taken for query results, as well as for structures representing a connection and database. Therefore, our example should end with calls to the duckdb_destroy_result() function to remove the result structure from memory, and to the duckdb_disconnect() and duckdb_close() functions. This concludes our short example.

Ending notes

In this post, we have demonstrated that PHP FFI can be a powerful tool when interacting with code written in C. With just a few lines of code, we were able to connect to a database, insert data, and perform a full-fledged query. Previously, we would have had to create a C extension with a lot of boilerplate code to achieve this. However, using FFI does come with some challenges. We must be mindful of the differences in how PHP and C handle memory, and we must also be aware of the similarities between the two languages, as they can create a false sense of security.

It may come as a surprise to PHP programmers, but C is extremely unforgiving. We can easily create problems for ourselves in a variety of creative ways, such as creating memory leaks, crashing the script, or bypassing most of PHP's security measures. Using FFI, and especially FFI pointers, can even fundamentally change the way PHP works and lead to wildly unpredictable situations. Therefore, we must exercise caution when attempting to communicate with C code and be sure of what we are doing. It may be a cliche, but it is completely reasonable to say that with great power comes great responsibility.