A PHP library for counting short DNA sequences for use in Bioinformatics

Related tags

Security Helix
Overview

Helix

A PHP library for counting short DNA sequences for use in Bioinformatics. Helix consists of tools for data extraction as well as an ultra-low memory hash table called DNA Hash specialized for counting DNA sequences. DNA Hash stores sequence counts by their up2bit encoding - a two-way hash that exploits the fact that each DNA base need only 2 bits to be fully encoded. Accordingly, DNA Hash uses less memory than a lookup table that stores raw gene sequences. In addition, DNA Hash's novel layered Bloom filter eliminates the need to explicitly store counts for sequences that have only been seen once.

  • Ultra-low memory footprint
  • Compatible with FASTA and FASTQ formats
  • Supports canonical sequence counting
  • Open-source and free to use commercially

Note: The maximum sequence length is platform dependent. On a 64-bit machine, the max length is 31. On a 32-bit machine, the max length is 15.

Note: Due to the probabilistic nature of the Bloom filter, DNA Hash may over count sequences at a bounded rate.

Installation

Install into your project using Composer:

$ composer require andrewdalpino/helix

Requirements

  • PHP 7.4 or above

Example

use Helix\DNAHash;
use Helix\Extractors\FASTA;
use Helix\Tokenizers\Canonical;
use Helix\Tokenizers\Kmer;

$extractor = new FASTA('example.fa');

$tokenizer = new Canonical(new Kmer(25));

$hashTable = new DNAHash(0.001);

foreach ($extractor as $sequence) {
    $tokens = $tokenizer->tokenize($sequence);

    foreach ($tokens as $token) {
        $hashTable->increment($token);
    }
}

$top10 = $hashTable->top(10);

print_r($top10);
Array
(
    [GCTATAAAAAGAAAATTTTGGAATA] => 19
    [ATTCCAAAATTTTCTTTTTATAGCC] => 19
    [TAAAAAGAAAATTTTGGAATAAAAA] => 18
    [ATAAAAAGAAAATTTTGGAATAAAA] => 18
    [TATAAAAAGAAAATTTTGGAATAAA] => 18
    [CTATAAAAAGAAAATTTTGGAATAA] => 18
    [AAATAATTTCAATTTTCTATCTCAA] => 17
    [AAAATAATTTCAATTTTCTATCTCA] => 17
    [CAAAATAATTTCAATTTTCTATCTC] => 17
    [AGATAGAAAATTGAAATTATTTTGA] => 17
)

Testing

To run the unit tests:

$ composer test

Static Analysis

To run static code analysis:

$ composer analyze

Benchmarks

To run the benchmarks:

$ composer benchmark

References

You might also like...
JSON Object Signing and Encryption library for PHP.

NAMSHI | JOSE Deprecation notice Hi there, as much as we'd like to be able to work on all of the OSS in the world, we don't actively use this library

A library for generating random numbers and strings

RandomLib A library for generating random numbers and strings of various strengths. This library is useful in security contexts. Install Via Composer

Fast, general Elliptic Curve Cryptography library. Supports curves used in Bitcoin, Ethereum and other cryptocurrencies (secp256k1, ed25519, ..)

Fast Elliptic Curve Cryptography in PHP Information This library is a PHP port of elliptic, a great JavaScript ECC library. Supported curve types: Sho

A multitool library offering access to recommended security related libraries, standardised implementations of security defences, and secure implementations of commonly performed tasks.

SecurityMultiTool A multitool library offering access to recommended security related libraries, standardised implementations of security defences, an

A cryptography API wrapping the Sodium library, providing a simple object interface for symmetrical and asymmetrical encryption, decryption, digital signing and message authentication.

PHP Encryption A cryptography API wrapping the Sodium library, providing a simple object interface for symmetrical and asymmetrical encryption, decryp

PHP 5.x support for random_bytes() and random_int()

random_compat PHP 5.x polyfill for random_bytes() and random_int() created and maintained by Paragon Initiative Enterprises. Although this library sho

Simple Encryption in PHP.

php-encryption composer require defuse/php-encryption This is a library for encrypting data with a key or password in PHP. It requires PHP 5.6 or new

Standards compliant HTML filter written in PHP

HTML Purifier HTML Purifier is an HTML filtering solution that uses a unique combination of robust whitelists and aggressive parsing to ensure that no

A database of PHP security advisories

PHP Security Advisories Database The PHP Security Advisories Database references known security vulnerabilities in various PHP projects and libraries.

Owner
Andrew DalPino
Software Engineer, AI/ML, Web3, Open Source
Andrew DalPino
PHPIDS (PHP-Intrusion Detection System) is a simple to use, well structured, fast and state-of-the-art security layer for your PHP based web application

PHPIDS PHPIDS (PHP-Intrusion Detection System) is a simple to use, well structured, fast and state-of-the-art security layer for your PHP based web ap

null 752 Jan 3, 2023
Easy to use cryptographic framework for data protection: secure messaging with forward secrecy and secure data storage. Has unified APIs across 14 platforms.

Themis provides strong, usable cryptography for busy people General purpose cryptographic library for storage and messaging for iOS (Swift, Obj-C), An

Cossack Labs 1.6k Jan 6, 2023
[OUTDATED] Two-factor authentication for Symfony applications 🔐 (bunde version ≤ 4). Please use version 5 from https://github.com/scheb/2fa.

scheb/two-factor-bundle ⚠ Outdated version. Please use versions ≥ 5 from scheb/2fa. This bundle provides two-factor authentication for your Symfony ap

Christian Scheb 389 Nov 15, 2022
Javascript code scanner to use with gettext/gettext

Javascript code scanner to use with gettext/gettext

Gettext 4 Feb 14, 2022
php-chmod is a PHP library for easily changing permissions recursively.

PHP chmod php-chmod is a PHP library for easily changing the permissions recursively. Versions & Dependencies Version PHP Documentation ^1.1 ^7.4 curr

Mathias Reker ⚡️ 5 Oct 7, 2022
PHP Secure Communications Library

phpseclib - PHP Secure Communications Library Supporting phpseclib Become a backer or sponsor on Patreon One-time donation via PayPal or crypto-curren

null 4.9k Jan 7, 2023
TCrypto is a simple and flexible PHP 5.3+ in-memory key-value storage library

About TCrypto is a simple and flexible PHP 5.3+ in-memory key-value storage library. By default, a cookie will be used as a storage backend. TCrypto h

timoh 57 Dec 2, 2022
PHPGGC is a library of PHP unserialize() payloads along with a tool to generate them, from command line or programmatically.

PHPGGC: PHP Generic Gadget Chains PHPGGC is a library of unserialize() payloads along with a tool to generate them, from command line or programmatica

Ambionics Security 2.5k Jan 4, 2023
A petite library of encryption functions for PHP

?? dcrypt A petite library of essential encryption functions for PHP 7.1+. For legacy PHP version support, look here. If you need a dcrypt inspired en

null 96 Oct 6, 2022
Sodium Compat is a pure PHP polyfill for the Sodium cryptography library (libsodium)

Sodium Compat is a pure PHP polyfill for the Sodium cryptography library (libsodium), a core extension in PHP 7.2.0+ and otherwise available in PECL.

Paragon Initiative Enterprises 817 Dec 26, 2022