JsonCollectionParser - Event-based parser for large JSON collections (consumes small amount of memory)

Overview

JsonCollectionParser

Build Status Scrutinizer Code Quality Code Climate Coverage Status SensioLabsInsight

GitHub tag Packagist Minimum PHP Version License

Event-based parser for large JSON collections (consumes small amount of memory). Built on top of JSON Streaming Parser

This package is compliant with PSR-4 and PSR-12 code styles and supports parsing of PSR-7 message interfaces. If you notice compliance oversights, please send a patch via pull request.

Installation

You will need Composer to install the package

composer require maxakawizard/json-collection-parser:~1.0

Input data format

Data must be in one of following formats:

Array of objects (valid JSON)

[
    {
        "id": 78,
        "title": "Title",
        "dealType": "sale",
        "propertyType": "townhouse",
        "properties": {
            "bedroomsCount": 6,
            "parking": "yes"
        },
        "photos": [
            "1.jpg",
            "2.jpg"
        ],
        "agents": [
            {
                "name": "Joe",
                "email": "[email protected]"
            },
            {
                "name": "Sally",
                "email": "[email protected]"
            }
         ]
    },
    {
        "id": 729,
        "dealType": "rent_long",
        "propertyType": "villa"
    },
    {
        "id": 5165,
        "dealType": "rent_short",
        "propertyType": "villa"
    }
]

Sequence of object literals:

{
    "id": 78,
    "dealType": "sale",
    "propertyType": "townhouse"
}
{
    "id": 729,
    "dealType": "rent_long",
    "propertyType": "villa"
}
{
    "id": 5165,
    "dealType": "rent_short",
    "propertyType": "villa"
}

Sequence of object and array literals:

[[{
    "id": 78,
    "dealType": "sale",
    "propertyType": "townhouse"
}]]
{
    "id": 729,
    "dealType": "rent_long",
    "propertyType": "villa"
}
[{
    "id": 5165,
    "dealType": "rent_short",
    "propertyType": "villa"
}]

Sequence of object and array literals (some of objects in subarrays, comma-separated):

[
{
    "id": 78,
    "dealType": "sale",
    "propertyType": "townhouse"
},
{
    "id": 729,
    "dealType": "rent_long",
    "propertyType": "villa"
}
]
{
    "id": 5165,
    "dealType": "rent_short",
    "propertyType": "villa"
}

Usage

Function as callback:

function processItem(array $item)
{
    is_array($item); //true
    print_r($item);
}

$parser = new \JsonCollectionParser\Parser();
$parser->parse('/path/to/file.json', 'processItem');

Closure as callback:

$items = [];

$parser = new \JsonCollectionParser\Parser();
$parser->parse('/path/to/file.json', function (array $item) use (&$items) {
    $items[] = $item;
});

Static method as callback:

class ItemProcessor {
    public static function process(array $item)
    {
        is_array($item); //true
        print_r($item);
    }
}

$parser = new \JsonCollectionParser\Parser();
$parser->parse('/path/to/file.json', ['ItemProcessor', 'process']);

Instance method as callback:

class ItemProcessor {
    public function process(array $item)
    {
        is_array($item); //true
        print_r($item);
    }
}

$parser = new \JsonCollectionParser\Parser();
$processor = new \ItemProcessor();
$parser->parse('/path/to/file.json', [$processor, 'process']);

Receive items as objects:

function processItem(\stdClass $item)
{
    is_array($item); //false
    is_object($item); //true
    print_r($item);
}

$parser = new \JsonCollectionParser\Parser();
$parser->parseAsObjects('/path/to/file.json', 'processItem');

Receive chunks of items as arrays:

function processChunk(array $chunk)
{
    is_array($chunk);    //true
    count($chunk) === 5; //true

    foreach ($chunk as $item) {
        is_array($item);  //true
        is_object($item); //false
        print_r($item);
    }
}

$parser = new \JsonCollectionParser\Parser();
$parser->chunk('/path/to/file.json', 'processChunk', 5);

Receive chunks of items as objects:

function processChunk(array $chunk)
{
    is_array($chunk);    //true
    count($chunk) === 5; //true

    foreach ($chunk as $item) {
        is_array($item);  //false
        is_object($item); //true
        print_r($item);
    }
}

$parser = new \JsonCollectionParser\Parser();
$parser->chunkAsObjects('/path/to/file.json', 'processChunk', 5);

Pass stream as parser input:

$stream = fopen('/path/to/file.json', 'r');

$parser = new \JsonCollectionParser\Parser();
$parser->parseAsObjects($stream, 'processItem');

Pass PSR-7 MessageInterface as parser input:

use Psr\Http\Message\MessageInterface;

/** @var MessageInterface $resource */
$resource = $httpClient->get('https://httpbin.org/get');

$parser = new \JsonCollectionParser\Parser();
$parser->parseAsObjects($resource, 'processItem');

Pass PSR-7 StreamInterface as parser input:

use Psr\Http\Message\MessageInterface;

/** @var MessageInterface $resource */
$resource = $httpClient->get('https://httpbin.org/get');

$parser = new \JsonCollectionParser\Parser();
$parser->parseAsObjects($resource->getBody(), 'processItem');

Supported formats

  • .json - raw JSON
  • .gz - GZIP-compressed JSON (you will need zlib PHP extension installed)

Supported sources

  • file
  • string
  • stream / resource
  • HTTP message interface PSR-7

Running tests

composer test

License

This library is released under MIT license.

Comments
  • Support for document streams, objects, objects in subarrays

    Support for document streams, objects, objects in subarrays

    Currently the \JsonCollectionParser\Listener supports only:

    • objects inside array: [ { } , { } , { } , ... ]

    It seems reasonable to add support for more forms of input data, at least object-oriented:

    • object: { }
    • stream of objects: { } { } { } ...
    • objects in subarrays: [ [ { } , { } , ... ] ]
    • and combination of above, e.g: [ { } , { } ] { }

    Except for the first case, the data is actually a concatenation of json documents, a frequent case when working with stream data.

    The underlying library unfortunately didn't support for multiple documents, hence this PR: https://github.com/salsify/jsonstreamingparser/issues/60

    After and if it's accepted, please check out and evaluate this branch: https://github.com/OnkelTem/JsonCollectionParser/tree/documents-stream-support It implements the mentioned cases.

    opened by OnkelTem 6
  • 500mb json file timeout error

    500mb json file timeout error

    I have a 500mb json file, which is also available here (https://archive.scryfall.com/json/scryfall-all-cards.json) When I'm trying to process this file I need to raise max execution time to something like 1000 seconds as otherwise I receive timeout in the salsify parser.

    Is there any way to solve this? I mean it's great, that the files is not loaded into memory, but many servers don't allow for raising execution time to insane values neither?

    The error hits at Parser.php Line 152, 197, or 201.

    Thanks

    opened by kLOsk 5
  • PHP8 support

    PHP8 support

    Hi :wave:

    I've noticed that support for PHP8 is implemented, but not released even though CHANGELOG mentions v1.8.0.

    Just wanted to see if there is a plan to release a new version with PHP8 support soon?

    Thanks for your work on this great package! Cheers!

    opened by nikazooz 3
  • Support for list of objects {}, {},...

    Support for list of objects {}, {},...

    I have some json data that is presented in the format of object lists e.g.: {object}, {object},

    So in order to process htme i have to sed add [ at beginning and end ]

    It seems very trivial, so it would be great if the parse could support files that are not wrapped in an array but otherwise perfectly represent the format.

    Thanks

    opened by kLOsk 1
  • Input data format descend

    Input data format descend

    Hello,

    Is it possible to descend in JSON structure? Suppose I have data format like:

             {
                "objects": [
                    {
                        "uid": 1,
                        "name": "Name 1"
                    },
                    {
                        "uid": 2,
                        "name": "Name 2"
                    }
                ]
            }
    

    So I can walk through the objects items Thank you

    opened by ogrosko 3
  • Support for parsing progress

    Support for parsing progress

    JsonStreamingParser is able to report progress of file parsing. It will be really handful to have this progress, because processing of large file can take a lot of time.

    The only what you need is to add function "filePosition" to your Listener and add propagate progress values to callback (or add another callback).

    opened by peterpp 3
Releases(1.9.0)
Owner
Max Grigorian
White wizard
Max Grigorian
laminas-memory manages data in an environment with limited memory

Memory objects (memory containers) are generated by the memory manager, and transparently swapped/loaded when required.

Laminas Project 5 Jul 26, 2022
zend-memory manages data in an environment with limited memory

Memory objects (memory containers) are generated by the memory manager, and transparently swapped/loaded when required.

Zend Framework 16 Aug 29, 2020
JSONFinder - a library that can find json values in a mixed text or html documents, can filter and search the json tree, and converts php objects to json without 'ext-json' extension.

JSONFinder - a library that can find json values in a mixed text or html documents, can filter and search the json tree, and converts php objects to json without 'ext-json' extension.

Eboubaker Eboubaker 2 Jul 31, 2022
Magento 2 Module that adds Donation Product Type. Enables the customer to add a donation (product) of a preferred amount to the cart.

Magento 2 Module Experius DonationProduct (RC1.0) Demo website: https://donationproduct.experius.nl Magento Marketplace: https://marketplace.magento.c

Experius 23 Apr 1, 2022
DBML parser for PHP8. It's a PHP parser for DBML syntax.

DBML parser written on PHP8 DBML (database markup language) is a simple, readable DSL language designed to define database structures. This page outli

Pavel Buchnev 32 Dec 29, 2022
A tool for creating configurable dumps of large MySQL-databases.

slimdump slimdump is a little tool to help you create configurable dumps of large MySQL-databases. It works off one or several configuration files. Fo

webfactory GmbH 176 Dec 26, 2022
Adds a compact "easy-sort" mode to Repeater and Repeater Matrix, making those fields easier to sort when there are a large number of items.

Repeater Easy Sort Adds a compact "easy-sort" mode to Repeater and Repeater Matrix, making those fields easier to sort when there are a large number o

Robin Sallis 3 Oct 10, 2021
NamelessMC is a free, easy to use & powerful website software for your Minecraft server, which includes a large range of features.

NamelessMC is a free, easy to use & powerful website software for your Minecraft server, which includes a large range of features

NamelessMC 519 Dec 31, 2022
A small CLI tool to check missing dependency declarations in the composer.json and module.xml

Integrity checker Package allows to run static analysis on Magento 2 Module Packages to provide an integrity check of package. Supported tools: Compos

run_as_root GmbH 13 Dec 19, 2022
A Phalcon paginator adapter for Phalcon Collections

Phalcon Collection Paginator A Phalcon paginator adapter for Phalcon Collections Why create this? Being familiar with the various Pagination data adap

Angel S. Moreno 2 Oct 7, 2022
This Repo is a storage of Postman collections for Magento

Magento Postman repository This Repository is a storage of Postman collections for Magento. If you have what to share, you are welcome to contribute a

Lyzun Oleksandr 14 May 17, 2022
QuidPHP/Main is a PHP library that provides a set of base objects and collections that can be extended to build something more specific.

QuidPHP/Main is a PHP library that provides a set of base objects and collections that can be extended to build something more specific. It is part of the QuidPHP package and can also be used standalone.

QuidPHP 4 Jul 2, 2022
Get the system resources in PHP, as memory, number of CPU'S, Temperature of CPU or GPU, Operating System, Hard Disk usage, .... Works in Windows & Linux

system-resources. A class to get the hardware resources We can get CPU load, CPU/GPU temperature, free/used memory & Hard disk. Written in PHP It is a

Rafael Martin Soto 10 Oct 15, 2022
This PHP script optimizes the speed of your RAM memory

██████╗░██╗░░██╗██████╗░░█████╗░██╗░░░░░███████╗░█████╗░███╗░░██╗███████╗██████╗░ ██╔══██╗██║░░██║██╔══██╗██╔══██╗██║░░░░░██╔════╝██╔══██╗████╗░██║██╔

Érik Freitas 7 Feb 12, 2022
High-performance, low-memory-footprint, single-file embedded database for key/value storage

LDBA - a fast, pure PHP, key-value database. Information LDBA is a high-performance, low-memory-footprint, single-file embedded database for key/value

Simplito 12 Nov 13, 2022
Ip2region is a offline IP location library with accuracy rate of 99.9% and 0.0x millseconds searching performance. DB file is ONLY a few megabytes with all IP address stored. binding for Java,PHP,C,Python,Nodejs,Golang,C#,lua. Binary,B-tree,Memory searching algorithm

Ip2region是什么? ip2region - 准确率99.9%的离线IP地址定位库,0.0x毫秒级查询,ip2region.db数据库只有数MB,提供了java,php,c,python,nodejs,golang,c#等查询绑定和Binary,B树,内存三种查询算法。 Ip2region特性

Lion 12.6k Dec 30, 2022
Prisma is an app that strengthens the relationship between people with memory loss and the people close to them

Prisma is an app that strengthens the relationship between people with memory loss and the people close to them. It does this by providing a living, collaborative digital photo album that can be populated with content of interest to these people.

Soulcenter 45 Dec 8, 2021
An autoscaling Bloom filter with ultra-low memory footprint for PHP

Ok Bloomer An autoscaling Bloom filter with ultra-low memory footprint for PHP. Ok Bloomer employs a novel layered filtering strategy that allows it t

Andrew DalPino 2 Sep 20, 2022
PHP Meminfo is a PHP extension that gives you insights on the PHP memory content

MEMINFO PHP Meminfo is a PHP extension that gives you insights on the PHP memory content. Its main goal is to help you understand memory leaks: by loo

Benoit Jacquemont 994 Dec 29, 2022