Base62 encoder and decoder for arbitrary data

Overview

Base62

This library implements base62 encoding. In addition to integers it can encode and decode any arbitrary data. This is useful for example when generating url safe random tokens for database identifiers.

Latest Version Packagist Software License Build Status Coverage

Install

Install with composer.

$ composer require tuupola/base62

This branch requires PHP 7.1 or up. The older 1.x branch supports also PHP 5.6 and 7.0.

$ composer require "tuupola/base62:^1.0"

Usage

This package has both pure PHP and GMP based encoders. By default encoder and decoder will use GMP functions if the extension is installed. If GMP is not available pure PHP encoder will be used instead.

$base62 = new Tuupola\Base62;

$encoded = $base62->encode(random_bytes(128));
$decoded = $base62->decode($encoded);

If you are encoding to and from integer use the implicit decodeInteger() and encodeInteger() methods.

$integer = $base62->encodeInteger(987654321); /* 14q60P */
print $base62->decodeInteger("14q60P"); /* 987654321 */

Note that encoding a string and an integer will yield different results.

$string = $base62->encode("987654321"); /* KHc6iHtXW3iD */
$integer = $base62->encodeInteger(987654321); /* 14q60P */

Character sets

By default Base62 uses GMP style character set. Shortcut is provided for the inverted character set which is also commonly used. You can also use any custom character set of 62 unique characters.

use Tuupola\Base62;

print Base62::GMP; /* 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz */
print Base62::INVERTED; /* 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ */

$default = new Base62(["characters" => Base62::GMP]);
$inverted = new Base62(["characters" => Base62::INVERTED]);
print $default->encode("Hello world!"); /* T8dgcjRGuYUueWht */
print $inverted->encode("Hello world!"); /* t8DGCJrgUyuUEwHT */

Speed

Install GMP if you can. It is much faster pure PHP encoder. Below benchmarks are for encoding random_bytes(128) data. BCMatch encoder is also included but it is mostly just a curiosity. It is too slow to be usable.

$ php --version
PHP 8.0.7 (cli) (built: Jun  4 2021 03:50:01) ( NTS )

$ make bench

+-----------------------+------------------+-----------+
| subject               | mean             | diff      |
+-----------------------+------------------+-----------+
| benchGmpDecoder       | 140,409.997ops/s | 1.10x     |
| benchGmpDecoderCustom | 154,607.297ops/s | 1.00x     |
| benchPhpDecoder       | 721.147ops/s     | 214.39x   |
| benchBcmathDecoder    | 72.191ops/s      | 2,141.64x |
+-----------------------+------------------+-----------+

+-----------------------+------------------+-----------+
| subject               | mean             | diff      |
+-----------------------+------------------+-----------+
| benchGmpEncoder       | 352,609.309ops/s | 1.00x     |
| benchGmpEncoderCustom | 350,140.056ops/s | 1.01x     |
| benchPhpEncoder       | 669.959ops/s     | 526.31x   |
| benchBcmathEncoder    | 72.956ops/s      | 4,833.21x |
+-----------------------+------------------+-----------+

Static Proxy

If you prefer to use static syntax use the provided static proxy.

use Tuupola\Base62Proxy as Base62;

$encoded = Base62::encode(random_bytes(128));
$decoded = Base62::decode($encoded);

$encoded2 = Base62::encodeInteger(987654321);
$decoded2 = Base62::decodeInteger($encoded2);

Testing

You can run tests either manually or automatically on every code change. Automatic tests require entr to work.

$ make test
$ brew install entr
$ make watch

Contributing

Please see CONTRIBUTING for details.

Security

If you discover any security related issues, please email [email protected] instead of using the issue tracker.

License

The MIT License (MIT). Please see License File for more information.

Comments
  • Leading 0x00 stripped from binary data

    Leading 0x00 stripped from binary data

    I met a leading \0 string when running tests, is this normal?

    There was 1 failure:
    
    1) Tuupola\Base62\Base62Test::testShouldEncodeAndDecodeRandomBytes
    Failed asserting that two strings are equal.
    --- Expected
    +++ Actual
    @@ @@
    -Binary String: 0x00486eea2de87439fc081b892616a3b0f1f098df86e2cdd23e7d21f5f046a30a1a6662fff6c3c017b1d4853a1fdd7dc00975016d9c2801b9df659fadc6abe1109b1e1f3960367603e75bb9ddf9d8097af5948f74df585d05bbee61aff992f3d35577e31aafce7d4342d3a68da0d5ca8d46bde2f7e7f555cf6a1938c4f52bdd43
    +Binary String: 0x486eea2de87439fc081b892616a3b0f1f098df86e2cdd23e7d21f5f046a30a1a6662fff6c3c017b1d4853a1fdd7dc00975016d9c2801b9df659fadc6abe1109b1e1f3960367603e75bb9ddf9d8097af5948f74df585d05bbee61aff992f3d35577e31aafce7d4342d3a68da0d5ca8d46bde2f7e7f555cf6a1938c4f52bdd43
    
    /private/tmp/base62/tests/Base62Test.php:41
    
    FAILURES!
    
    bug 
    opened by ElfSundae 12
  •  Can't successfully convert low-value UUIDs without blowing up.

    Can't successfully convert low-value UUIDs without blowing up.

    The Problem

    When I tried to use this with my phpexpertsinc/ConciseUUID project, the tests all failed on low value UUIDs.

    InvalidArgumentException: $bytes string should contain 16 characters.
    

    Here is a test case:

    use Ramsey\Uuid\Uuid;
    use Tuupola\Base62Proxy as Base62;
    
    $uuid = Uuid::fromBytes(Base62::decode($conciseUuid));
    

    Here's the test data:

        $badUuids = [
                '0023a441-a3a3-4d9e-bd65-de3381c3a226' => '00GHs6XflJ51yCvZ4TwH4g',
                '1ee9a026-48ef-4592-9d87-88ceea7bc35e' => '0wKXIE87UgfjIvSPLkAHao',
                '0e0aa2a8-1a10-45e4-a67a-c97b9c5a7d19' => '0QUkgNC86JAY1A8JhVZ7iT',
                '1ad0d525-97c9-4c08-ad56-59acd47e3f7c' => '0obEi3noEliUnbTQbhMrLo',
       ];
    

    The Solution

    I solved this using ext-gmp via:

        // 3. We pad zeros to the beginning, as the result returned by gmp_strval after base conversion
        // is not always 22 characters long.
        $uuid = str_pad($uuid, 22, '0', STR_PAD_LEFT);
    
    opened by hopeseekr 4
  • [Suggestion] Add support to encode hex directly

    [Suggestion] Add support to encode hex directly

    $hex = '123456abcdef';
    Base62::encode($hex);  // "JngPBzse6O99qumM"
    Base62::encode(hex2bin($hex)); // "5gOMRIbf"
    

    I think it is better to add encodeHex and decodeHex methods, then we can encode hex directly like Base62::encodeHex($hex). And for internal, we may call encodeHex for string encoding like:

    public function encodeHex($data)
    {
        return ctype_xdigit($data) ? gmp_strval(gmp_init($data, 16), 62) : '';
    }
    
    public function encode($data, $integer = false)
    {
        if (is_integer($data) || true === $integer) {
            return gmp_strval(gmp_init($data, 10), 62);
        }
    
        return $this->encodeHex(bin2hex($data));
    }
    
    opened by ElfSundae 4
  • Test enhancement

    Test enhancement

    Changed log

    • add the suggested extension: ext-gmp and ext-bcmath.
    • using the class-based PHPUnit namespace.
    • consider the php-7.2 test will output some deprecated and warning message when using the PHPUnit version 4.8. See more details about the Travis build log.
    opened by peter279k 2
  • Bugfix - Handle invalid characters when decoding

    Bugfix - Handle invalid characters when decoding

    When using this library in a project, I ran into an issue when I wrote a test that passed intentionally "bad data" to the decode method.

    Specifically, when passing the invalid data to the decode method of the GmpEncoder, which uses gmp_init(), the GMP extension was throwing an unchecked warning about the data not being a valid number within the specified base (62).

    I've recreated this error in the provided tests, along with a fix: screen shot 2018-03-26 at 6 53 48 pm

    ... Unfortunately, there's not a great way to check for this error at a higher level, due to GMP's usage of warnings and a false return value. Interestingly enough, I've actually run into this problem before when building my own library years ago, so I've actually reported this as a PHP bug, but it was closed with a "Won't Fix" status: https://bugs.php.net/bug.php?id=68002

    In any case, I've created this fix to handle the issue for both the GMP encoder's case and for the base encoder, as otherwise not handling this case could lead to an invalid decode result or "data-loss".

    opened by Rican7 2
  • [FIX] do not discard zero-byte prefixes

    [FIX] do not discard zero-byte prefixes

    Since the encoding algorithm is based on arithmetic division, the current algorithm is discarding 0x00 bytes at the beginning of the data stream.

    Examples: 0x0001 -(encode)-> 0x31 -(decode)-> 0x01 0x00000000 -(encode)-> 0x00 -(decode)-> 0x00 0x303030 -(decode)-> 0x00

    The fix I came up with involves chopping these leading zero-bytes from the data stream and appending their encoded (or decoded) form after performing the division. There's also a new test case for this corner case.

    PS Fixes #4. Had not seen the issue.

    opened by 1ma 0
Owner
Mika Tuupola
Mika Tuupola
DiscordLookup | Get more out of Discord with Discord Lookup! Snowflake Decoder, Guild List with Stats, Invite Info and more...

DiscordLookup Get more out of Discord with Discord Lookup! Snowflake Decoder, Guild List with Stats, Invite Info and more... Website Getting Help Tool

Felix 69 Dec 23, 2022
Okex API Like the official document interface, Support for arbitrary extension.

It is recommended that you read the official document first Okex docs https://www.okex.com/docs/en Okex Simulation Test API https://www.okex.com/docs/

lin 34 Jan 1, 2023
Smd thumbnail - Multiple image thumbnails of arbitrary dimensions

smd_thumbnail Download | Packagist If you’re bored of one Textpattern thumbnail per image and don’t fancy using an auto-resizing script or relying on

Stef Dawson 9 Dec 9, 2021
Melek Berita Backend is a service for crawling data from various websites and processing the data to be used for news data needs.

About Laravel Laravel is a web application framework with expressive, elegant syntax. We believe development must be an enjoyable and creative experie

Chacha Nurholis 2 Oct 9, 2022
Import data from and export data to a range of different file formats and media

Ddeboer Data Import library This library has been renamed to PortPHP and will be deprecated. Please use PortPHP instead. Introduction This PHP library

David de Boer 570 Dec 27, 2022
Data visualization for NASA's DSNNow public data

DSN Monitor Data visualization for NASA's DSNNow public data. A live version of the project can be accessed at http://dsnmonitor.ddns.net. Description

Vinz 2 Sep 18, 2022
:globe_with_meridians: List of all countries with names and ISO 3166-1 codes in all languages and data formats.

symfony upgrade fixer • twig gettext extractor • wisdom • centipede • permissions handler • extraload • gravatar • locurro • country list • transliter

Saša Stamenković 5k Dec 22, 2022
Get mobile app version and other related data from Google Play Store, Apple App Store and Huawei AppGallery

Mobile App Version Get mobile app version and other related data from Google Play Store, Apple App Store and Huawei AppGallery. Installation Add to co

Omer Salaj 11 Mar 15, 2022
JSON schema models and generated code to validate and handle various data in PocketMine-MP

DataModels JSON schema models and generated code to validate and handle various data in PocketMine-MP This library uses php-json-schema-model-generato

PMMP 2 Nov 9, 2022
Silverstripe-searchable - Adds to the default Silverstripe search by adding a custom results controller and allowing properly adding custom data objects and custom fields for searching

SilverStripe Searchable Module UPDATE - Full Text Search This module now uses Full Text Support for MySQL/MariaDB databases in version 3.* Adds more c

ilateral 13 Apr 14, 2022
Project that aims to create a website for a gym, where the clients and employees can access their data, buy in the gym store and check the gym activities.

Gym_Management_Project Project that aims to create a website for a gym, where the clients and employees can access their data, buy in the gym store an

null 1 Jan 12, 2022
Scalable and durable data imports for publishing and consuming APIs

Porter Scalable and durable data imports for publishing and consuming APIs Porter is the all-purpose PHP data importer. She fetches data from anywhere

null 596 Jan 6, 2023
A tool that allows to quickly export data from Magento 1 and Magento 2 store and import it back into Magento 2

Simple Import / Export tool A tool that allows to quickly export data from Magento 1 and Magento 2 store and import it back into Magento 2. Table data

EcomDev B.V. 51 Dec 5, 2022
MailChimp for Magento 2. Syncs all data (subscriber, customers, orders, products) and enables marketing automation with email campaigns, automations, ads, postcards and more.

MailChimp for Magento 2. Syncs all data (subscriber, customers, orders, products) and enables marketing automation with email campaigns, automations, ads, postcards and more.

Mailchimp 139 Sep 9, 2022
A PHP spreadsheet reader (Excel XLS and XLSX, OpenOffice ODS, and variously separated text files) with a singular goal of getting the data out, efficiently

spreadsheet-reader is a PHP spreadsheet reader that differs from others in that the main goal for it was efficient data extraction that could handle l

Nuovo 666 Dec 24, 2022
Enhance product data quality and streamline content creation with the Pimcore and ChatGPT integration.

chatgpt-pimcore Enhance product data quality and streamline content creation with the Pimcore and ChatGPT integration. Overview The integration of Pim

Pravin chaudhary 6 Jun 5, 2023
Dobren Dragojević 6 Jun 11, 2023
RRR makes structured data for WordPress really rich, and really easy.

Really Rich Results - JSON-LD Structured Data (Google Rich Results) for WordPress Search engines are putting more weight on structured data than ever

Pagely 22 Dec 1, 2022
Library download currency rate and save in database, It's designed to be extended by any available data source.

Library download currency rate and save in database, It's designed to be extended by any available data source.

Flexmind. Krzysztof Bielecki 2 Oct 6, 2021