High performance Clickhouse PHP lib with progress tracking, parametric queries and compression support

Overview

Clickhousy

High performance Clickhouse PHP library featuring:

  • Tiny memory footprint based on static class (times better than smi2 client)
  • High level methods to fetch rows, single row, array of scalar values or single value.
  • Parametric queries (native Clickhouse sql-injection protection).
  • Long queries progress update through callback function.
  • Large resultsets "step-by-step" reading/processing without memory overflow.
  • Large datasets inserts without memory overflow.
  • Custom logging and error handling.
  • HTTP native compression for traffic reduction.
  • Curl based (shell curl command required for large inserts).

Quick start

Clone or download library:

git clone https://github.com/mrcrypster/clickhousy.git

Import lib and use it:

require 'clickhousy/clickhousy.php';
$data = clickhousy::rows('SELECT * FROM table LIMIT 5');

Connection & database

Clickhousy works over Clickhouse HTTP protocol, so you just need to set connection url (by default it's localhost:8123):

clickhousy::set_url('http://host:port/'); # no auth
clickhousy::set_url('http://user:password@host:port/'); # auth

And then select database (default by default):

clickhousy::set_db('my_db');

Data fetch

Use predefined methods to quickly fetch needed data:

clickhousy::rows('SELECT * FROM table');         # -> returns array or associative arrays
clickhousy::row ('SELECT * FROM table LIMIT 1'); # -> returns single row associative array
clickhousy::cols('SELECT id FROM table');        # -> returns array of scalar values
clickhousy::col ('SELECT count(*) FROM table');  # -> returns single scalar value

Reading large datasets

If your query returns many rows, use reading callback in order not to run out of memory:

clickhousy::query('SELECT * FROM large_table', [], null, null, function($packet) {
  // $packet will contain small portion of returning data
  foreach ( $packet as $row ) {
    // do something with $row (array of each result row values)
    print_r($row) # [0 => 'col1 value', 1 => 'col2 value', ...]
  }
});

Safe query execution

In order to fight SQL injections, you can use parametric queries:

$count = clickhousy::col(
  'SELECT count(*) FROM table WHERE age > {age:UInt32} AND category = {cat:String}',
  ['age' => 30, 'cat' => 'Engineering']
);

Writing data

Though writing usually happens somewhere else (not on PHP side), Clickhousy has shell curl wrapper to write massive datasets to Clickhouse. To save memory, this is done through "write buffers" (temp files on disk), which can be as large as your disk allows it to be:

$b = clickhousy::open_buffer('table');  # open insert buffer to insert data into "table" table
                                        # buffer is a tmp file on disk, so no memory leaks

for ( $k = 0; $k < 100; $k++ ) {        # repeat generation 100 times
  $rows = [];

  for ( $i = 0; $i < 1000000; $i++ ) {
    $rows[] = [md5($i)];                # generate 1 million rows in single iteration
  }

  clickhousy::insert_buffer($b, $rows); # insert generated 1 million rows into buffer
}

$result = clickhousy::flush_buffer($b); # sends buffer (100m rows) content to Clickhouse

After insert is executed, $result variable gets summary from Clickhouse:

Array
(
    [read_rows] => 10000
    [read_bytes] => 410000
    [written_rows] => 10000
    [written_bytes] => 410000
    [total_rows_to_read] => 0
)

Custom queries

Generic query method is available for any read or write query:

clickhousy::query('INSERT INTO table(id) VALUES(1)');
clickhousy::query('TRUNCATE TABLE table');
clickhousy::query('SELECT NOW()');

For SELECT queries it will return resultset together with system information:

$result_set = clickhousy::query('SELECT * FROM numbers(5)');
print_r($result_set);

Will output:

Array
(
    [meta] => Array
        (
            [0] => Array
                (
                    [name] => number
                    [type] => UInt64
                )

        )

    [data] => Array
        (
            [0] => Array
                (
                    [number] => 0
                )
            ...
        )

    [rows] => 2
    [rows_before_limit_at_least] => 2
    [statistics] => Array
        (
            [elapsed] => 0.000237397
            [rows_read] => 2
            [bytes_read] => 16
        )

)

Method also supports parametric queries:

$res = clickhousy::query('SELECT * FROM table WHERE age > {age:Int32}', ['age' => $_GET['age']]);

Inserting data from files

You can insert data from files using $post_buffer argument pointing to a file:

$res = clickhousy::query('INSERT INTO table', [], '/path/to/tsv.gz');

File should be of gzipped TSV format.

Long query progress tracking

$progress_callback allows specifying callback function which will be called when query execution progress updates:

clickhousy::query('SELECT uniq(id) FROM huge_table', [], null, function($progress) {
  print_r($progress);
});

If the query is long enough, progress function will be called multiple times:

Array
(
    [read_rows] => 4716360              # -> currently read rows 
    [read_bytes] => 37730880
    [written_rows] => 0
    [written_bytes] => 0
    [total_rows_to_read] => 50000000    # -> total rows to read
)
...
Array
(
    [read_rows] => 47687640
    [read_bytes] => 381501120
    [written_rows] => 0
    [written_bytes] => 0
    [total_rows_to_read] => 50000000
)

It's easy to calculate and print query execution progress:

clickhousy::query('SELECT count(*) FROM large_table WHERE heavy_condition', [], null, function($progress) {
  echo round(100 * $progress['read_rows']/$progress['total_rows_to_read']) . '%' . "\n";
});

Which will result in:

7%
17%
28%
39%
50%
61%
72%
83%
94%

Query execution summary

Latest query summary is available through:

clickhousy::rows('SELECT * FROM numbers(100000)');
print_r(clickhousy::last_summary());

Which is an associative array of multiple metrics:

Array
(
    [read_rows] => 65505
    [read_bytes] => 524040
    [written_rows] => 0
    [written_bytes] => 0
    [total_rows_to_read] => 100000
)

Errors handling and debug

By default, clickhousy will return response with error key on error, but will not throw any exceptions:

$response = clickhousy::query('SELECT bad_query');  # sample query with error
print_r($response);

Which contains original Clickhouse error message:

Array
(
    [error] => Code: 47. DB::Exception: Missing columns: 'bad_query' while processing query: 'SELECT bad_query', required columns: 'bad_query'. (UNKNOWN_IDENTIFIER) (version 22.6.1.1696 (official build))

)

If you want exceptions functionality, you can extend clickhousy with your own class and override error() method:

class my_clickhousy_exception extends Exception {};
class my_clickhousy extends clickhousy {
  public static function error($msg, $request_info) {
    throw new my_clickhousy_exception($msg);
  }
}

Then use your own class to get exceptions working:

my_clickhousy::query('SELECT bad_query');   # bad query example
# PHP Fatal error:  Uncaught my_clickhousy_exception: Code: 47. DB::Exception: Missing columns: 'bad_query' ...

Debugging response

If you need to access raw response from Clickhouse, you can find it here:

clickhouse::last_response();  # raw text response from Clickhouse after latest query

Curl (which is used for HTTP communication) response info can be accessed via:

clickhouse::last_info();  # response from curl_getinfo() after latest query

Logging

There is also log() method which can be overrided to allow custom logging:

class my_clickhousy extends clickhousy {
  public static function log($raw_response, $curl_info) {
    error_log('Received response from Clickhouse: ' . $raw_response);
  }
}

Memory usage and performance

Based on performance testing, Clickhousy lib is times faster and hundreds of times less memory consuming than smi2 lib:

Smi2: 
mem:                565       
time:               10.3      

Clickhousy: 
mem:                0.8       688.8x  better
time:               2.3       4.4x  better
You might also like...
A PHP library for communicating with the Twilio REST API and generating TwiML.

twilio-php The default branch name for this repository has been changed to main as of 07/27/2020. Documentation The documentation for the Twilio API c

A PHP class for querying the Twitter API and rendering tweets as an HTML list

TweetPHP A PHP class for querying the Twitter API and rendering tweets as an HTML list. Features Works with Twitter API v1.1 Tweets are cached to avoi

PHP package providing easy and fast access to Twitter API V2.

Twitter API V2 is a PHP package that provides an easy and fast access to Twitter REST API for Version 2 endpoints.

This library allows you to quickly and easily use the Twilio SendGrid Web API v3 via PHP
This library allows you to quickly and easily use the Twilio SendGrid Web API v3 via PHP

This library allows you to quickly and easily use the Twilio SendGrid Web API v3 via PHP

Just a simple API PHP library with basic functions and config.

Installation Clone this Repository in your PHP Project git clone https://github.com/maximilianosinski/simple-api-php-library.git Change your Project n

Attempting to create an intelligent mock of the Google API PHP Client for unit and functional testing

google-api-php-client-mock A small scale intelligent mock of the Google API PHP Client for unit and functional testing. Overview This is intended to m

OpenAI API Client is a component-oriented, extensible client library for the OpenAI API. It's designed to be faster and more memory efficient than traditional PHP libraries.

OpenAI API Client in PHP (community-maintained) This library is a component-oriented, extensible client library for the OpenAI API. It's designed to b

⚡️ Web3 PHP is a supercharged PHP API client that allows you to interact with a generic Ethereum RPC.
⚡️ Web3 PHP is a supercharged PHP API client that allows you to interact with a generic Ethereum RPC.

Web3 PHP is a supercharged PHP API client that allows you to interact with a generic Ethereum RPC. This project is a work-in-progress. Code and docume

Lightweight PHP library for WhatsApp API to send the whatsapp messages in PHP provided by ultramsg.com

Ultramsg.com WhatsApp API PHP SDK Lightweight PHP library for WhatsApp API to send the whatsappp messages in PHP provided by Ultramsg.com Installation

Releases(1.0)
  • 1.0(Jul 19, 2022)

    High performance Clickhouse PHP library featuring:

    • Tiny memory footprint based on static class (times better than smi2 client)
    • High level methods to fetch rows, single row, array of scalar values or single value.
    • Parametric queries (native Clickhouse sql-injection protection).
    • Long queries progress update through callback function.
    • Large resultsets "step-by-step" reading/processing without memory overflow.
    • Large datasets inserts without memory overflow.
    • Custom logging and error handling.
    • HTTP native compression for traffic reduction.
    • Curl based (shell curl command required for large inserts).
    Source code(tar.gz)
    Source code(zip)
Owner
Denys Golotiuk (hacking)
Utilities and middle-ware for ultra-fast small-overhead engineering on top of Python/PHP and Mysql/Redis/Clickhouse
Denys Golotiuk (hacking)
Easy to install email tracker with gui and telegram api bot with date device & ip tracking,

mail-php-tracking-with-gui ?? Simple mail tracking program that uses php, html, telegram bot, and a gui The gui The gui lets you create specific links

null 7 Dec 20, 2022
Adds a specific header to every response to disable Google's usage of your site in it's FLoC tracking method.

Go Unfloc Yourself Description A bundle for Symfony 5 that adds a Permissions-Policy header in all the responses to prevent the use of new Google's "F

(infinite) loophp 3 Feb 25, 2022
PHP library to use IOTA REST API to help node management and tangle queries

iota.php About PHP library to use IOTA REST API to help node management and tangle queries. Please be aware that this library is in an early developme

IOTA Community 45 Dec 13, 2022
Toxiproxy PHP Client - Toxiproxy makes it easy and trivial to test network conditions, for example low-bandwidth and high-latency situations

Toxiproxy makes it easy and trivial to test network conditions, for example low-bandwidth and high-latency situations. toxiproxy-php-client includes everything needed to get started with configuring Toxiproxy upstream connection and listen endpoints.

Adrian Parker 29 Jun 24, 2022
Nexmo REST API client for PHP. API support for SMS, Voice, Text-to-Speech, Numbers, Verify (2FA) and more.

Client Library for PHP Support Notice This library and it's associated packages, nexmo/client and nexmo/client-core have transitioned into a "Maintena

Nexmo 75 Sep 23, 2022
TeleBot - Easy way to create Telegram-bots in PHP. Rich Laravel support out of the box.

TeleBot is a PHP library for telegram bots development. Rich Laravel support out of the box. Has an easy, clean, and extendable way to handle telegram Updates.

WeStacks 206 Jan 6, 2023
The Universal Device Detection library will parse any User Agent and detect the browser, operating system, device used (desktop, tablet, mobile, tv, cars, console, etc.), brand and model.

DeviceDetector Code Status Description The Universal Device Detection library that parses User Agents and detects devices (desktop, tablet, mobile, tv

Matomo Analytics 2.4k Jan 9, 2023
Laravel Package for 1APP. Learn how to integrate our APIs to build a web or mobile integration to send and accept payments for your application and businesses.

1APP Laravel Library Learn how to integrate our APIs to build a web or mobile integration to accept payments, make payment of Bills and as well custom

O'Bounce Technologies 4 Jul 25, 2022
A simple PHP GitHub API client, Object Oriented, tested and documented.

PHP GitHub API A simple Object Oriented wrapper for GitHub API, written with PHP. Uses GitHub API v3 & supports GitHub API v4. The object API (v3) is

KNP Labs 2k Jan 7, 2023