🐼 Framework agnostic package using asynchronous HTTP requests and PHP generators to load paginated items of JSON APIs into Laravel lazy collections.

Overview

🐼 Lazy JSON Pages

Author PHP Version Laravel Version Octane Compatibility Build Status Coverage Status Quality Score Latest Version Software License PSR-7 PSR-12 Total Downloads

Framework agnostic package using asynchronous HTTP requests and generators to load paginated items of JSON APIs into Laravel lazy collections.

Need to load heavy JSON with no pagination? Consider using Lazy JSON instead.

Install

In a Laravel application, all you need to do is requiring the package:

composer require cerbero/lazy-json-pages

Otherwise, you also need to register the lazy collection macro:

use Cerbero\LazyJsonPages\Macro;
use Illuminate\Support\LazyCollection;

LazyCollection::macro('fromJsonPages', new Macro());

Usage

Loading paginated items of JSON APIs into a lazy collection is possible by calling the collection itself or the included helper:

$items = LazyCollection::fromJsonPages($source, $path, $config);

$items = lazyJsonPages($source, $path, $config);

The source which paginated items are fetched from can be either a PSR-7 request or a Laravel HTTP client response:

// the Guzzle request is just an example, any PSR-7 request can be used as well
$source = new GuzzleHttp\Psr7\Request('GET', 'https://paginated-json-api.test');

// Lazy JSON Pages integrates well with Laravel and supports its HTTP client responses
$source = Http::get('https://paginated-json-api.test');

Lazy JSON Pages only changes the page query parameter when fetching pages. This means that if the first request was authenticated (e.g. via bearer token), the requests to fetch the other pages will be authenticated as well.

The second argument, $path, is the key within JSON APIs holding the paginated items. The path supports dot-notation so if the key is nested, we can define its nesting levels with dots. For example, given the following JSON:

{
    "data": {
        "results": [
            {
                "id": 1
            },
            {
                "id": 2
            }
        ]
    }
}

the path to the paginated items would be data.results. All nested JSON keys can be defined with dot-notation, including the keys to set in the configuration.

APIs are all different so Lazy JSON Pages allows us to define tailored configurations for each of them. The configuration can be set with the following variants:

// assume that the integer indicates the number of pages
// to be used when the number is known (e.g. via previous HTTP request)
lazyJsonPages($source, $path, 10);

// assume that the string indicates the JSON key holding the number of pages
lazyJsonPages($source, $path, 'total_pages');

// set the config with an associative array
// both snake_case and camelCase keys are allowed
lazyJsonPages($source, $path, [
    'items' => 'total_items',
    'per_page' => 50,
]);

// set the config through its fluent methods
use Cerbero\LazyJsonPages\Config;

lazyJsonPages($source, $path, function (Config $config) {
    $config->items('total_items')->perPage(50);
});

The configuration depends on the type of pagination. Various paginations are supported, including length-aware and cursor paginations.

Length-aware paginations

The term "length-aware" indicates all paginations that show at least one of the following numbers:

  • the total number of pages
  • the total number of items
  • the number of the last page

Lazy JSON Pages only needs one of these numbers to work properly. When setting the number of items, we can also define the number of items shown per page (if we know it) to save some more memory. The following are all valid configurations:

// configure the total number of pages:
$config = 10;
$config = 'total_pages';
$config = ['pages' => 'total_pages'];
$config->pages('total_pages');

// configure the total number of items:
$config = ['items' => 500];
$config = ['items' => 'total_items'];
$config = ['items' => 'total_items', 'per_page' => 50];
$config->items('total_items');
$config->items('total_items')->perPage(50);

// configure the number of the last page:
$config = ['last_page' => 10];
$config = ['last_page' => 'last_page_key'];
$config = ['last_page' => 'https://paginated-json-api.test?page=10'];
$config->lastPage(10);
$config->lastPage('last_page_key');
$config->lastPage('https://paginated-json-api.test?page=10');

Depending on the APIs, the last page may be indicated as a number or as a URL, Lazy JSON Pages supports both.

By default this package assumes that the name of the page query parameter is page and that the first page is 1. If that is not the case, we can update the defaults by adding this configuration:

$config->pageName('page_number')->firstPage(0);
// or
$config = [
    'page_name' => 'page_number',
    'first_page' => 0,
];

When dealing with a lot of data, it's a good idea to fetch only 1 item (or a few if 1 is not allowed) on the first page to count the total number of pages/items without wasting memory and then fetch all the calculated pages with many more items.

We can do that with the "per page" setting by passing:

  • the new number of items to show per page
  • the query parameter holding the number of items per page
$source = new Request('GET', 'https://paginated-json-api.test?page_size=1');

$items = lazyJsonPages($source, $path, function (Config $config) {
    $config->pages('total_pages')->perPage(500, 'page_size');
});

Some APIs do not allow to request only 1 item per page, in these cases we can specify the number of items present on the first page as third argument:

$source = new Request('GET', 'https://paginated-json-api.test?page_size=5');

$items = lazyJsonPages($source, $path, function (Config $config) {
    $config->pages('total_pages')->perPage(500, 'page_size', 5);
});

As always, we can either set the configuration through the Config object or with an associative array:

$config = [
    'pages' => 'total_pages',
    'per_page' => [500, 'page_size', 5],
];

From now on we will just use the object-oriented version for brevity. Also note that the "per page" strategy can be used with any of the configurations seen so far:

$config->pages('total_pages')->perPage(500, 'page_size');
// or
$config->items('total_items')->perPage(500, 'page_size');
// or
$config->lastPage('last_page_key')->perPage(500, 'page_size');

Cursor and next-page paginations

Some APIs show only the number or cursor of the next page in all pages. We can tackle this kind of pagination by indicating the JSON key holding the next page:

$config->nextPage('next_page_key');

The JSON key may hold a number, a cursor or a URL, Lazy JSON Pages supports all of them.

Fine-tuning the pages fetching process

Lazy JSON Pages provides a number of settings to adjust the way HTTP requests are sent to fetch pages. For example pages can be requested in chunks, so that only a few streams are kept in memory at once:

$config->chunk(3);

The configuration above fetches 3 pages concurrently, loads the paginated items into a lazy collection and proceeds with the next 3 pages. Chunking benefits memory usage at the expense of speed, no chunking is set by default but it is recommended when dealing with a lot of data.

To minimize the memory usage Lazy JSON Pages can fetch pages synchronously, i.e. one by one, beware that this is also the slowest solution:

$config->sync();

We can also set how many HTTP requests we want to send concurrently. By default 10 pages are fetched asynchronously:

$config->concurrency(25);

Every HTTP request has a timeout of 5 seconds by default, but some APIs may be slow to respond. In this case we may need to set a higher timeout:

$config->timeout(15);

When a request fails, it has up to 3 attempts to succeed. This number can of course be adjusted as needed:

$config->attempts(5);

The backoff strategy allows us to wait some time before sending other requests when one page fails to be loaded. The package provides an exponential backoff by default, when a request fails it gets retried after 0, 1, 4, 9 seconds and so on. This strategy can also be overridden:

$config->backoff(function (int $attempt) {
    return $attempt ** 2 * 100;
});

The above backoff strategy will wait for 100, 400, 900 milliseconds and so on.

Putting all together, this is one of the possible configurations:

$source = new Request('GET', 'https://paginated-json-api.test?page_size=1');

$items = lazyJsonPages($source, 'data.results', function (Config $config) {
    $config
        ->pages('total_pages')
        ->perPage(500, 'page_size')
        ->chunk(3)
        ->timeout(15)
        ->attempts(5)
        ->backoff(fn (int $attempt) => $attempt ** 2 * 100);
});

$items
    ->filter(fn (array $item) => $this->isValid($item))
    ->map(fn (array $item) => $this->transform($item))
    ->each(fn (array $item) => $this->save($item));

Handling errors

As seen above, we can mitigate potentially faulty HTTP requests with backoffs, timeouts and retries. When we reach the maximum number of attempts and a request keeps failing, an OutOfAttemptsException is thrown.

When caught, this exception provides information about what went wrong, including the actual exception that was thrown, the pages that failed to be fetched and the paginated items that were loaded before the failure happened:

use Cerbero\LazyJsonPages\Exceptions\OutOfAttemptsException;

try {
    $items = lazyJsonPages($source, $path, $config);
} catch (OutOfAttemptsException $e) {
    // the actual exception that was thrown
    $e->original;
    // the pages that failed to be fetched
    $e->failedPages;
    // a LazyCollection with items loaded before the error
    $e->items;
}

Change log

Please see CHANGELOG for more information on what has changed recently.

Testing

composer test

Contributing

Please see CONTRIBUTING and CODE_OF_CONDUCT for details.

Security

If you discover any security related issues, please email [email protected] instead of using the issue tracker.

Credits

License

The MIT License (MIT). Please see License File for more information.

You might also like...
Provides an easy interface for performing Hyper-Text Transfer Protocol (HTTP) requests

laminas-http provides the HTTP message abstraction used by laminas-mvc, and also provides an extensible, adapter-driven HTTP client library.

Express.php is a new HTTP - Server especially made for RESTful APIs written in PHP.

express.php Express.php is a new HTTP - Server especially made for RESTful APIs written in PHP. Features Fast The Library is handles requests fast and

A simple script i made that generate a valid http(s) proxy in json format with its geo-location info

Gev Proxy Generator GPG is a simple PHP script that generate a proxy using free services on the web, the proxy is HTTP(s) and it generate it in json f

Composer package providing HTTP Methods, Status Codes and Reason Phrases for PHP

HTTP Enums For PHP 8.1 and above This package provides HTTP Methods, Status Codes and Reason Phrases as PHP 8.1+ enums All IANA registered HTTP Status

Zenscrape package is a simple PHP HTTP client-provider that makes it easy to parsing site-pages

Zenscrape package is a simple PHP HTTP client-provider that makes it easy to parsing site-pages

This package provides the database factory experience to fake Http calls in your testsuite.
This package provides the database factory experience to fake Http calls in your testsuite.

This package provides the database factory experience to fake Http calls in your testsuite

A simple PHP Toolkit to parallel generate combinations, save and use the generated terms to brute force attack via the http protocol.
A simple PHP Toolkit to parallel generate combinations, save and use the generated terms to brute force attack via the http protocol.

Brutal A simple PHP Toolkit to parallel generate combinations, save and use the generated terms to apply brute force attack via the http protocol. Bru

HTTP header kit for PHP 7.1+ (incl. PHP 8) based on PSR-7

HTTP header kit for PHP 7.1+ (incl. PHP 8) based on PSR-7 Installation composer require sunrise/http-header-kit How to use? HTTP Header Collection Mor

Record your test suite's HTTP interactions and replay them during future test runs for fast, deterministic, accurate tests.
Record your test suite's HTTP interactions and replay them during future test runs for fast, deterministic, accurate tests.

This is a port of the VCR Ruby library to PHP. Record your test suite's HTTP interactions and replay them during future test runs for fast, determinis

Comments
  • Offset Support

    Offset Support

    Any plans to support offset based pagination in the future? One of the APIs we're working with requires you to pass a start number and paginate by offset vs passing a page number.

    Thanks!

    opened by LukeAbell 2
  • Only the first HTTP request is logged in Telescope

    Only the first HTTP request is logged in Telescope

    Detailed description

    When debugging the HTTP requests made using Telescope or Ray, only the first request is actually logged. Any subsequent calls to the API to retrieve more pages do not appear in the logs.

    Context

    Why is this change important to you? How would you use it?

    Would be very useful to know how many requests are actually made and to see their information to able to debug issues.

    Possible implementation

    All requests are logged.

    Though not sure how to enable this. I digged into the code and saw the package wraps the original call in a SourceWrapper. So it's possible the package checks the result of that call (since it's already being sent as we called e.g. ->get()), but it doesn't make requests using the built-in client?

    Your environment

    Include as many relevant details about the environment you experienced the bug in and how to reproduce it.

    PHP 8 Laravel 8.60 v1 of this package macOS Laravel Valet

    opened by sebastiaanluca 5
Owner
Andrea Marco Sartori
Andrea Marco Sartori
PHP Curl Class makes it easy to send HTTP requests and integrate with web APIs

PHP Curl Class: HTTP requests made easy PHP Curl Class makes it easy to send HTTP requests and integrate with web APIs. Installation Requirements Quic

null 3.1k Jan 5, 2023
A simple yet powerful HTTP metadata and assets provider for NFT collections using Symfony

Safe NFT Metadata Provider A simple yet powerful HTTP metadata and assets provider for NFT collections using Symfony.

HashLips Lab 66 Oct 7, 2022
PHP Curl - This package can send HTTP requests to a given site using Curl.

PHP Curl This package can send HTTP requests to a given site using Curl. It provides functions that can take several types of parameters to configure

Mehmet Can 1 Oct 27, 2022
Requests for PHP is a humble HTTP request library. It simplifies how you interact with other sites and takes away all your worries.

Requests for PHP Requests is a HTTP library written in PHP, for human beings. It is roughly based on the API from the excellent Requests Python librar

null 3.5k Dec 31, 2022
Requests for PHP is a humble HTTP request library. It simplifies how you interact with other sites and takes away all your worries.

Requests for PHP Requests is a HTTP library written in PHP, for human beings. It is roughly based on the API from the excellent Requests Python librar

null 3.5k Dec 31, 2022
Application for logging HTTP and DNS Requests

Request Logger Made by Adam Langley ( https://twitter.com/adamtlangley ) What is it? Request logger is a free and open source utility for logging HTTP

null 13 Nov 28, 2022
librestful is a virion for PocketMine servers that make easier, readable code and for async http requests.

librestful is a virion for PocketMine servers that make easier, readable code for async rest requests.

RedMC Network 17 Oct 31, 2022
↪️ Bypass for PHP creates a custom HTTP Server to return predefined responses to client requests

Bypass for PHP provides a quick way to create a custom HTTP Server to return predefined responses to client requests.Useful for tests with Pest PHP or PHPUnit.

CiaReis 101 Dec 1, 2022
Requests - a HTTP library written in PHP, for human beings

Requests is a HTTP library written in PHP, for human beings. It is roughly based on the API from the excellent Requests Python library. Requests is ISC Licensed (similar to the new BSD license) and has no dependencies, except for PHP 5.6+.

WordPress 3.5k Jan 6, 2023
Declarative HTTP Clients using Guzzle HTTP Library and PHP 8 Attributes

Waffler How to install? $ composer require waffler/waffler This package requires PHP 8 or above. How to test? $ composer phpunit Quick start For our e

Waffler 3 Aug 26, 2022