This library implements a fuzzer for PHP, which can be used to find bugs in libraries

Nikita Popov

Last update: Dec 25, 2022

Related tags

Miscellaneous PHP-Fuzzer

Overview

PHP Fuzzer

This library implements a fuzzer for PHP, which can be used to find bugs in libraries (particularly parsing libraries) by feeding them "random" inputs. Feedback from edge coverage instrumentation is used to guide the choice of "random" inputs, such that new code paths are visited.

Installation

Phar (recommended): You can download a phar package of this library from the releases page. Using the phar is recommended, because it avoids dependency conflicts with libraries using PHP-Parser.

Composer: composer global require nikic/php-fuzzer

Usage

First, a definition of the target function is necessary. Here is an example target for finding bugs in microsoft/tolerant-php-parser:

 // target.php

/** @var PhpFuzzer\Fuzzer $fuzzer */

require 'path/to/tolerant-php-parser/vendor/autoload.php';

// Required: The target accepts a single input string and runs it through the tested
//           library. The target is allowed to throw normal Exceptions (which are ignored),
//           but Error exceptions are considered as a found bug.
$parser = new Microsoft\PhpParser\Parser();
$fuzzer->setTarget(function(string $input) use($parser) {
    $parser->parseSourceFile($input);
});

// Optional: Many targets don't exhibit bugs on large inputs that can't also be
//           produced with small inputs. Limiting the length may improve performance.
$fuzzer->setMaxLen(1024);
// Optional: A dictionary can be used to provide useful fragments to the fuzzer,
//           such as language keywords. This is particularly important if these
//           cannot be easily discovered by the fuzzer, because they are handled
//           by a non-instrumented PHP extension function such as token_get_all().
$fuzzer->addDictionary('example/php.dict');

The fuzzer is run against a corpus of initial "interesting" inputs, which can for example be seeded based on existing unit tests. If no corpus is specified, a temporary corpus directory will be created instead.

# Run without initial corpus
php-fuzzer fuzz target.php
# Run with initial corpus (one input per file)
php-fuzzer fuzz target.php corpus/

If fuzzing is interrupted, it can later be resumed by specifying the same corpus directory.

Once a crash has been found, it is written into a crash-HASH.txt file. It is provided in the form it was originally found, which may be unnecessarily complex and contain fragments not relevant to the crash. As such, you likely want to reduce the crashing input first:

php-fuzzer minimize-crash target.php crash-HASH.txt

This will product a sequence of successively smaller minimized-HASH.txt files. If you want to quickly check the exception trace produced for a crashing input, you can use the run-single command:

php-fuzzer run-single target.php minimized-HASH.txt

Finally, it is possible to generate a HTML code coverage report, which shows which code blocks in the target are hit when executing inputs from a given corpus:

php-fuzzer report-coverage target.php corpus/ coverage_dir/

Additionally configuration options can be shown with php-fuzzer --help.

Bug types

The fuzzer by default detects three kinds of bugs:

Error exceptions thrown by the fuzzing target. While Exception exceptions are considered a normal result for malformed input, uncaught Error exceptions always indicate programming error. They are most commonly produced by PHP itself, for example when calling a method on null.
Thrown notices and warnings (unless they are suppressed). The fuzzer registers an error handler that converts these to Error exceptions.
Timeouts. If the target runs longer than the specified timeout (default: 3s), it is assumed that the target has gone into an infinite loop. This is realized using pcntl_alarm() and an async signal handler that throws an Error on timeout.

Notably, none of these check whether the output of the target is correct, they only determine that the target does not misbehave egregiously. One way to check output correctness is to compare two different implementations that are supposed to produce identical results:

$fuzzer->setTarget(function(string $input) use($parser1, $parser2) {
    $result1 = $parser1->parse($input);
    $result2 = $parser2->parse($input);
    if ($result1 != $result2) {
        throw new Error('Results do not match!');
    }
});

Technical

Many of the technical details of this fuzzer are based on libFuzzer from the LLVM project. The following describes some of the implementation details.

Instrumentation

To work efficiently, fuzzing requires feedback regarding the code-paths that were executed while testing a particular fuzzing input. This coverage feedback is collected by "instrumenting" the fuzzing target. The include-interceptor library is used to transform the code of all included files on the fly. The PHP-Parser library is used to parse the code and find all the places where additional instrumentation code needs to be inserted.

Inside every basic block, the following code is inserted, where BLOCK_INDEX is a unique, per-block integer:

$___key = (\PhpFuzzer\FuzzingContext::$prevBlock << 28) | BLOCK_INDEX;
\PhpFuzzer\FuzzingContext::$edges[$___key] = (\PhpFuzzer\FuzzingContext::$edges[$___key] ?? 0) + 1;
\PhpFuzzer\FuzzingContext::$prevBlock = BLOCK_INDEX;

This assumes that the block index is at most 28-bit large and counts the number of (prev_block, cur_block) pairs that are observed during execution. The generated code is unfortunately fairly expensive, due to the need to deal with uninitialized edge counts, and the use of static properties. In the future, it would be possible to create a PHP extension that can collect the coverage feedback much more efficiently.

In some cases, basic blocks are part of expressions, in which case we cannot easily insert additional code. In these cases we instead insert a call to a method that contains the above code:

if ($foo && $bar) { ... }
// becomes
if ($foo && \PhpFuzzer\FuzzingContext::traceBlock(BLOCK_INDEX, $bar)) { ... }

In the future, it would be beneficial to also instrument comparisons, such that we can automatically determine dictionary entries from comparisons like $foo == "SOME_STRING".

Features

Fuzzing inputs are considered "interesting" if they contain new features that have not been observed with other inputs that are already part of the corpus. This library uses course-grained edge hit counts as features:

ft = (approx_hits << 56) | (prev_block << 28) | cur_block

The approximate hit count reduces the actual hit count to 8 categories (based on AFL):

0: 0 hits
1: 1 hit
2: 2 hits
3: 3 hits
4: 4-7 hits
5: 8-15 hits
6: 16-127 hits
7: >=128 hits

As such, each input is associated with a set of integers representing features. Additionally, it has a set of "unique features", which are features not seen in any other corpus inputs at the time the input was tested.

If an input has unique features, then it is added to the corpus (NEW). If an input B was created by mutating an input A, but input B is shorter and has all the unique features of input A, then A is replaced by B in the corpus (REDUCE).

Mutation

On each iteration, a random input from the current corpus is chosen, and then mutated using a sequence of mutators. The following mutators (taken from libFuzzer) are currently implemented:

EraseBytes: Remove a number of bytes.
InsertByte: Insert a new random byte.
InsertRepeatedBytes: Insert a random byte repeated multiple times.
ChangeByte: Replace a byte with a random byte.
ChangeBit: Flip a single bit.
ShuffleBytes: Shuffle a small substring.
ChangeASCIIInt: Change an ASCII integer by incrementing/decrementing/doubling/halving.
ChangeBinInt: Change a binary integer by adding a small random amount.
CopyPart: Copy part of the string into another part, either by overwriting or inserting.
CrossOver: Cross over with another corpus entry with multiple strategies.
AddWordFromManualDictionary: Insert or overwrite with a word from the dictionary (if any).

Mutation is subject to a maximum length constrained. While an overall maximum length can be specified by the target (setMaxLength()), the fuzzer also performs automatic length control (--len-control-factor). The maximum length is initially set to a very low value and then increased by log(maxlen) whenever no action (NEW or REDUCE) has been taken for the last len_control_factor * log(maxlen) runs.

The higher the length control factor, the more aggressively the fuzzer will explore short inputs before allowing longer inputs. This significantly reduces the size of the generated corpus, but makes initial exploration slower.

Findings

Comments

Uncaught error on unlink (temporary corpus file does not exist)

NEW    run: 15248 (17734/s), ft: 1229 (1429/s), corp: 196 (972b), len:  9/11, t: 1s, mem: 11mb
NEW    run: 15452 (17690/s), ft: 1233 (1412/s), corp: 197 (982b), len: 10/11, t: 1s, mem: 11mb
PHP Fatal error:  Uncaught Error: [2] unlink(/tmp/corpus-368695453/bc38758e6172b0db93523ee51ee8551c.txt): No such file or directory in phar:///home/yanmii/GitHub/php-fuzzer.phar/vendor/nikic/include-interceptor/src/Stream.php on line 246 in phar:///home/yanmii/GitHub/php-fuzzer.phar/src/Fuzzer.php:433
Stack trace:
#0 [internal function]: _HumbugBox4cfab3638a0b\PhpFuzzer\Fuzzer->_HumbugBox4cfab3638a0b\PhpFuzzer\{closure}()
#1 phar:///home/yanmii/GitHub/php-fuzzer.phar/vendor/nikic/include-interceptor/src/Stream.php(246): unlink()
#2 phar:///home/yanmii/GitHub/php-fuzzer.phar/vendor/nikic/include-interceptor/src/Stream.php(44): _HumbugBox4cfab3638a0b\Nikic\IncludeInterceptor\Stream->_HumbugBox4cfab3638a0b\Nikic\IncludeInterceptor\{closure}()
#3 phar:///home/yanmii/GitHub/php-fuzzer.phar/vendor/nikic/include-interceptor/src/Stream.php(247): _HumbugBox4cfab3638a0b\Nikic\IncludeInterceptor\Stream->runUnwrapped()
#4 [internal function]: _HumbugBox4cfab3638a0b\Nikic\IncludeInterceptor\Stream->unlink()
#5 phar:///home/yanmii/Gi in phar:///home/yanmii/GitHub/php-fuzzer.phar/src/Fuzzer.php on line 433

I wasn't able to reproduce this again, maybe caused by some race condition. I verified that that file in /tmp doesn't exist after the crash.

Edit: Randomly happened again:

REDUCE run: 611548 (6376/s), ft: 3602 (38/s), corp: 893 (34kb), len:  24/243, t: 96s, mem: 19mb
PHP Fatal error:  Uncaught Error: [2] unlink(/tmp/corpus-1784235601/d286bf815c4f8f3edfa26afc05978079.txt)

opened by yanmii-is 3

Windows is not supported due to pcntl

If it can't be supported, this could be mentioned in the readme.

PHP Fatal error:  Uncaught Error: Call to undefined function pcntl_signal() in phar://php-fuzzer.phar/src/Fuzzer.php:415

opened by xPaw 2

The latest phar is broken (0.0.6)

I installed php-fuzzer with phive (on PHP 7.4, 8.0, 8.1):

Phive 0.15.2 - Copyright (C) 2015-2022 by Arne Blankerts, Sebastian Heuer and Contributors
Downloading https://api.github.com/rate_limit
Downloading https://api.github.com/repos/nikic/php-fuzzer/releases?per_page=100
Downloading https://github.com/nikic/PHP-Fuzzer/releases/download/v0.0.6/php-fuzzer.phar
Linking /home/runner/work/toolbox/toolbox/build/tools/.phive/phars/nikic/php-fuzzer-0.0.6.phar to /home/runner/work/toolbox/toolbox/build/tools/.phive/tmp/7a7e997c6f3cb8ddda592806470aa30d/php-fuzzer

php-fuzzer --help fails with:

PHP Fatal error:  Uncaught Error: Class "_HumbugBox74f46bc5bdc1\PhpParser\Parser\Php7" not found in phar:///home/runner/work/toolbox/toolbox/build/tools/.phive/phars/nikic/php-fuzzer-0.0.6.phar/src/Instrumentation/Instrumentor.php:19
Stack trace:
#0 phar:///home/runner/work/toolbox/toolbox/build/tools/.phive/phars/nikic/php-fuzzer-0.0.6.phar/src/Fuzzer.php(48): _HumbugBox74f46bc5bdc1\PhpFuzzer\Instrumentation\Instrumentor->__construct()
#1 phar:///home/runner/work/toolbox/toolbox/build/tools/.phive/phars/nikic/php-fuzzer-0.0.6.phar/bin/php-fuzzer(14): _HumbugBox74f46bc5bdc1\PhpFuzzer\Fuzzer->__construct()
#2 /home/runner/work/toolbox/toolbox/build/tools/.phive/phars/nikic/php-fuzzer-0.0.6.phar(14): require('...')
#3 {main}
  thrown in phar:///home/runner/work/toolbox/toolbox/build/tools/.phive/phars/nikic/php-fuzzer-0.0.6.phar/src/Instrumentation/Instrumentor.php on line 19

opened by jakzal 1

$Crashes on php8.1: Node\FunctionLike has getStmts(), not stmts property$

Crashes on php8.1: Node\FunctionLike has getStmts(), not stmts property

https://github.com/nikic/PHP-Fuzzer/blob/929b09c27cab5492b0d2a6cd99b4ce9afdf994db/src/Instrumentation/Visitor.php#L25

It's crashing on php 8.1:

PHP Fatal error: Uncaught Error: [2] Undefined property: PhpParser\Node\Expr\ArrowFunction::$stmts in PHP-Fuzzer\src\Instrumentation\Visitor.php on line 33

opened by xPaw 0
the main Fuzzer class should not be scoped to a generated namespace in the phar

The fuzzer class is the API used in userland for the $fuzzer variable in the target file. Scoping it makes it impossible for projects to define a type for that $fuzzer variable, as the variable will have a different type when using the phar or when using composer.

opened by stof 1
[Idea] Configure the whitelisted exception base class

Currently, the fuzzer accepts any Exception being thrown. However, for parsing libraries, it is common to define a contract about throwing a particular kind of exception (for instance, in scssphp/scssphp, we say that any failure will be reported by an exception implement ScssPhp\ScssPhp\Exception\SassException). It would be great to be able to configure this, so that any Exception throw that is not a subtype of the whitelisted exception type would also be considered as a bug in the library. Note that this needs supporting passing an interface as the whitelist, not only the name of a class exending \Exception (Our SassException is actually an interface implemented by our internal exception classes).

opened by stof 3

Releases(v0.0.6)

v0.0.6(Aug 9, 2022)
Upgrade to the 4.x release of ulrichsg/getopt-php to fix PHP 8.1 compatibility

Fix instrumentation of arrow functions

Generate coverage overview

Source code(tar.gz)
Source code(zip)
php-fuzzer.phar(1.29 MB)
v0.0.5(Sep 12, 2020)
Fixed unlink errors that would occasionally abort fuzzing (#5).

Added shutdown handler to catch fatal errors during fuzzing.

Source code(tar.gz)
Source code(zip)
php-fuzzer.phar(1.22 MB)
v0.0.4(Dec 30, 2019)
Make pcntl optional, allowing PHP-Fuzzer to be used on Windows.

Update include-interceptor dependency for Windows fixes.

Disable interception of phar to avoid a PHP bug.

Source code(tar.gz)
Source code(zip)
php-fuzzer.phar(1.18 MB)
v0.0.3(Dec 29, 2019)
Remove stray var_dump().

Add mutator for binary integers.

Make corpus argument optional. A temporary directory will be used if not provided.

Switch to nikic/include-interceptor to fix include interception bugs.

Source code(tar.gz)
Source code(zip)
php-fuzzer.phar(1.18 MB)
v0.0.2(Dec 26, 2019)
Handle timeouts as crashes using pcntl.

Handle notices/warnings as crashes with a custom error handler.

Make instrumentation line-number preserving.

Fix instrumentation in the phar version.

Source code(tar.gz)
Source code(zip)
php-fuzzer.phar(1.18 MB)
v0.0.1(Dec 25, 2019)

Initial release and a place to put the phar.
Source code(tar.gz)
Source code(zip)
php-fuzzer.phar(1.18 MB)

Owner

Nikita Popov

GitHub

This package implements 0-1 Knapsack Problem algorithm i.e. allows to find the best way to fill a knapsack of a specified volume with items of a certain volume and value.

This package implements "0-1 Knapsack Problem" algorithm i.e. allows to find the best way to fill a knapsack of a specified volume with items of a certain volume and value.

9 Sep 8, 2022

JSONFinder - a library that can find json values in a mixed text or html documents, can filter and search the json tree, and converts php objects to json without 'ext-json' extension.

2 Jul 31, 2022

Magento 2 module which can find potential url related problems in your catalog data

Url data integrity checker module for Magento 2 Purpose The purpose of this module is to give store owners of a Magento 2 shop insight into what url-r

218 Jan 1, 2023

A PHP library that can be used manually as well as a CLI script that you can just run on your file

Run phpcs on files and only report new warnings/errors compared to the previous version. This is both a PHP library that can be used manually as well

20 Aug 4, 2022

PHP Library that implements several messaging patterns for RabbitMQ

Thumper Thumper is a PHP library that aims to abstract several messaging patterns that can be implemented over RabbitMQ. Inside the examples folder yo

276 Nov 20, 2022

[virion] It Implements Simple Using Form Library System

SimpleForm [virion] It Implements Simple Using Form Library System How To Use First, declare the use statement. use AidenKR\SimpleForm\SimpleForm; use

2 Sep 18, 2021

Iran decoration platform is an open source Php web application where you can find your job as a freelancer working in people home in decoration positions and others.

Iran-Decoration Platform Iran decoration platform is an open source Php web application where you can find your job as a freelancer working in people

8 Dec 14, 2022

As many Magento patches as I can find!

Magento Resources and Links I have been looking for a good repository for all resources for Magento and I thought I will start putting them here for n

271 Dec 22, 2022

InventoryUI - the PocketMine virion that implements the dummy inventory

InventoryUI This is the PocketMine virion that implements the dummy inventory. Differences from previous APIs Chests and other blocks are not placed.

10 Nov 16, 2022

A tool that can be used to verify BC breaks between two versions of a PHP library.

Roave Backward Compatibility Check A tool that can be used to verify BC breaks between two versions of a PHP library. Pre-requisites/assumptions Your

530 Dec 27, 2022

KLua is a FFI-based Lua5 library that can be used in both PHP and KPHP

KLua KLua is a FFI-based Lua5 library that can be used in both PHP and KPHP. Installation Since this is a FFI library, it needs a dynamic library avai

7 Nov 4, 2022

Thin assertion library for use in libraries and business-model

Assert A simple php library which contains assertions and guard methods for input validation (not filtering!) in business-model, libraries and applica

2.3k Dec 23, 2022

This library can be used, among other things, to retrieve the classes, interfaces, traits, enums, functions and constants declared in a file

marijnvanwezel/reflection-file Library that allows reflection of files. This library can be used, among other things, to retrieve the classes, interfa

5 Apr 17, 2022

Simple library that abstracts different metrics collectors. I find this necessary to have a consistent and simple metrics (functional) API that doesn't cause vendor lock-in.

Metrics Simple library that abstracts different metrics collectors. I find this necessary to have a consistent and simple metrics API that doesn't cau

311 Nov 20, 2022

Skosmos is a web-based tool providing services for accessing controlled vocabularies, which are used by indexers describing documents and searchers looking for suitable keywords.

195 Dec 24, 2022

This library implements a fuzzer for PHP, which can be used to find bugs in libraries

Related tags

Overview

PHP Fuzzer

Installation

Usage

Bug types

Technical

Instrumentation

Features

Mutation

Findings

Comments

Uncaught error on unlink (temporary corpus file does not exist)

Windows is not supported due to pcntl

The latest phar is broken (0.0.6)

Crashes on php8.1: Node\FunctionLike has getStmts(), not stmts property

the main Fuzzer class should not be scoped to a generated namespace in the phar

[Idea] Configure the whitelisted exception base class

Releases(v0.0.6)

v0.0.6(Aug 9, 2022)

v0.0.5(Sep 12, 2020)

v0.0.4(Dec 30, 2019)

v0.0.3(Dec 29, 2019)

v0.0.2(Dec 26, 2019)

v0.0.1(Dec 25, 2019)

Owner

Nikita Popov

This package implements 0-1 Knapsack Problem algorithm i.e. allows to find the best way to fill a knapsack of a specified volume with items of a certain volume and value.

JSONFinder - a library that can find json values in a mixed text or html documents, can filter and search the json tree, and converts php objects to json without 'ext-json' extension.

Magento 2 module which can find potential url related problems in your catalog data

A PHP library that can be used manually as well as a CLI script that you can just run on your file

PHP Library that implements several messaging patterns for RabbitMQ

[virion] It Implements Simple Using Form Library System

Iran decoration platform is an open source Php web application where you can find your job as a freelancer working in people home in decoration positions and others.

As many Magento patches as I can find!

InventoryUI - the PocketMine virion that implements the dummy inventory

A tool that can be used to verify BC breaks between two versions of a PHP library.

KLua is a FFI-based Lua5 library that can be used in both PHP and KPHP

Thin assertion library for use in libraries and business-model

This library can be used, among other things, to retrieve the classes, interfaces, traits, enums, functions and constants declared in a file

Simple library that abstracts different metrics collectors. I find this necessary to have a consistent and simple metrics (functional) API that doesn't cause vendor lock-in.

PhpCodeAnalyzer scans codebase and analyzes which non-built-in php extensions used

PhpCodeAnalyzer scans codebase and analyzes which non-built-in php extensions used

This project is very diverse and based upon many languages and libraries such as C++, Python, JavaScript, PHP and MQTT

Hoa is a modular, extensible and structured set of PHP libraries

Skosmos is a web-based tool providing services for accessing controlled vocabularies, which are used by indexers describing documents and searchers looking for suitable keywords.