A language detection library for PHP. Detects the language from a given text string.

Patrick Schur

Last update: Dec 28, 2022

Related tags

Strings nlp training language php natural-language-processing language-detection n-grams

Overview

language-detection

Build Status	Code Coverage	Version	Total Downloads	Minimum PHP Version	License

This library can detect the language of a given text string. It can parse given training text in many different idioms into a sequence of N-grams and builds a database file in PHP to be used in the detection phase. Then it can take a given text and detect its language using the database previously generated in the training phase. The library comes with text samples used for training and detecting text in 110 languages.

Installation with Composer
How to upgrade from 3.y.z to 4.y.z?
Basic Usage
API
Method Chaining
Array Access
List of supported languages
Other languages
FAQ
Contributing
License

Installation with Composer

Note: This library requires the Multibyte String extension in order to work.

$ composer require patrickschur/language-detection

How to upgrade from `3.y.z` to `4.y.z`?

Important: Only for people who are using a custom directory with their own translation files.

Starting with version 4.y.z we have updated the resource files. For performance reasons we now use PHP instead of JSON as a format. That means people who want to use 4.y.z and used 3.y.z before, have to upgrade their JSON files to PHP. To upgrade your resource files you must generate a language profile again. The JSON files are then no longer needed.

You can delete unnecessary JSON files under Linux with the following command.

rm resources/*/*.json

Basic Usage

To detect the language correctly, the length of the input text should be at least some sentences.

use LanguageDetection\Language;
 
$ld = new Language;
 
$ld->detect('Mag het een onsje meer zijn?')->close();

Result:

Array
(
    "nl" => 0.66193548387097,
    "af" => 0.51338709677419,
    "br" => 0.49634408602151,
    "nb" => 0.48849462365591,
    "nn" => 0.48741935483871,
    "fy" => 0.47822580645161,
    "dk" => 0.47172043010753,
    "sv" => 0.46408602150538,
    "bi" => 0.46021505376344,
    "de" => 0.45903225806452,
    [...]
)

API

`__construct(array $result = [], string $dirname = '')`

You can pass an array of languages to the constructor. To compare the desired sentence only with the given languages. This can dramatically increase the performance. The other parameter is optional and the name of the directory where the translations files are located.

$ld = new Language(['de', 'en', 'nl']);
 
// Compares the sentence only with "de", "en" and "nl" language models.
$ld->detect('Das ist ein Test');

`whitelist(string ...$whitelist)`

Provide a whitelist. Returns a list of languages, which are required.

$ld->detect('Mag het een onsje meer zijn?')->whitelist('de', 'nn', 'nl', 'af')->close();

Result:

Array
(
    "nl" => 0.66193548387097,
    "af" => 0.51338709677419,
    "nn" => 0.48741935483871,
    "de" => 0.45903225806452
)

`blacklist(string ...$blacklist)`

Provide a blacklist. Removes the given languages from the result.

$ld->detect('Mag het een onsje meer zijn?')->blacklist('dk', 'nb', 'de')->close();

Result:

Array
(
    "nl" => 0.66193548387097,
    "af" => 0.51338709677419,
    "br" => 0.49634408602151,
    "nn" => 0.48741935483871,
    "fy" => 0.47822580645161,
    "sv" => 0.46408602150538,
    "bi" => 0.46021505376344,
    [...]
)

`bestResults()`

Returns the best results.

$ld->detect('Mag het een onsje meer zijn?')->bestResults()->close();

Result:

Array
(
    "nl" => 0.66193548387097
)

`limit(int $offset, int $length = null)`

You can specify the number of records to return. For example the following code will return the top three entries.

$ld->detect('Mag het een onsje meer zijn?')->limit(0, 3)->close();

Result:

Array
(
    "nl" => 0.66193548387097,
    "af" => 0.51338709677419,
    "br" => 0.49634408602151
)

`close()`

Returns the result as an array.

$ld->detect('This is an example!')->close();

Result:

Array
(
    "en" => 0.5889400921659,
    "gd" => 0.55691244239631,
    "ga" => 0.55376344086022,
    "et" => 0.48294930875576,
    "af" => 0.48218125960061,
    [...]
)

`setTokenizer(TokenizerInterface $tokenizer)`

The script use a tokenizer for getting all words in a sentence. You can define your own tokenizer to deal with numbers for example.

$ld->setTokenizer(new class implements TokenizerInterface
{
    public function tokenize(string $str): array 
    {
        return preg_split('/[^a-z0-9]/u', $str, -1, PREG_SPLIT_NO_EMPTY);
    }
});

This will return only characters from the alphabet in lowercase and numbers between 0 and 9.

`__toString()`

Returns the top entrie of the result. Note the echo at the beginning.

echo $ld->detect('Das ist ein Test.');

Result:

de

`jsonSerialize()`

Serialized the data to JSON.

$object = $ld->detect('Tere tulemast tagasi! Nägemist!');
 
json_encode($object, JSON_PRETTY_PRINT);

Result:

{
    "et": 0.5224748810153358,
    "ch": 0.45817028027498674,
    "bi": 0.4452670544685352,
    "fi": 0.440983606557377,
    "lt": 0.4382866208355367,
    [...]
}

Method chaining

You can also combine methods with each other. The following example will remove all entries specified in the blacklist and returns only the top four entries.

$ld->detect('Mag het een onsje meer zijn?')->blacklist('af', 'dk', 'sv')->limit(0, 4)->close();

Result:

Array
(
    "nl" => 0.66193548387097
    "br" => 0.49634408602151
    "nb" => 0.48849462365591
    "nn" => 0.48741935483871
)

ArrayAccess

You can also access the object directly as an array.

$object = $ld->detect(Das ist ein Test');
 
echo $object['de'];
echo $object['en'];
echo $object['xy']; // does not exists

Result:

0.6623339658444
0.56859582542694
NULL

Supported languages

The library currently supports 110 languages. To get an overview of all supported languages please have a look at here.

Other languages

The library is trainable which means you can change, remove and add your own language files to it. If your language not supported, feel free to add your own language files. To do that, create a new directory in resources and add your training text to it.

Note: The training text should be a .txt file.

Example

|- resources
    |- ham
        |- ham.txt
    |- spam
        |- spam.txt

As you can see, we can also used it to detect spam or ham.

When you stored your translation files outside of resources, you have to specify the path.

$t->learn('YOUR_PATH_HERE');

Whenever you change one of the translation files you must first generate a language profile for it. This may take a few seconds.

use LanguageDetection\Trainer;
 
$t = new Trainer();
 
$t->learn();

Remove these few lines after execution and now we can classify texts by their language with our own training text.

FAQ

How can I improve the detection phase?

To improve the detection phase you have to use more n-grams. But be careful this will slow down the script. I figured out that the detection phase is much better when you are using around 9.000 n-grams (default is 310). To do that look at the code right below:

$t = new Trainer();
 
$t->setMaxNgrams(9000);
 
$t->learn();

First you have to train it. Now you can classify texts like before but you must specify how many n-grams you want to use.

$ld = new Language();
 
$ld->setMaxNgrams(9000);
  
// "grille pain" is french and means "toaster" in english
var_dump($ld->detect('grille pain')->bestResults());

Result:

class LanguageDetection\LanguageResult#5 (1) {
  private $result =>
  array(2) {
    'fr' =>
    double(0.91307037037037)
    'en' =>
    double(0.90623333333333)
  }
}

Is the detection process slower if language files are very big?

No it is not. The trainer class will only use the best 310 n-grams of the language. If you don't change this number or add more language files it will not affect the performance. Only creating the N-grams is slower. However, the creation of N-grams must be done only once. The detection phase is only affected when you are trying to detect big chunks of texts.

Summary: The training phase will be slower but the detection phase remains the same.

Contributing

Feel free to contribute. Any help is welcome.

License

This projects is licensed under the terms of the MIT license.

Comments

Grille pain
Guten Tag,

I continue in english ;)

I compared all current language detectors based on sequence of N-grams and your solution is the best implementation.

However I have to translate very short sequences of words. For example the words 'Grille pain' which means toaster in in english returns

array(6) { ["it"]=> float(0.54711111111111) ["fr"]=> float(0.54633333333333) ["en"]=> float(0.506) ["de"]=> float(0.49488888888889) ["nl"]=> float(0.49) ["es"]=> float(0.43466666666667) }

It's almost good! it wins on fr by a difference 0.000777.

So, I added words 'Grille pain' in fr file, set trainer and new result gives:

array(6) { ["fr"]=> float(0.54944444444444) ["it"]=> float(0.54711111111111) ["en"]=> float(0.506) ["de"]=> float(0.49488888888889) ["nl"]=> float(0.49) ["es"]=> float(0.43466666666667) }

It's good. fr wins on it by a difference of 0.002333.

So my questions ,

1- Can I populate language files so that I can be sure that the first occurence wins with a very significant difference.

2- If yes to previous question, is the detection process slower if language files are very big?

3- I can see that, in fr file, you have put french declaration of rights. I dont think that a 200 years old text represents very well current french language. Is there somewhere some data which may be more accurate?

Thanks for your great job.

Best Regards Michel
question
opened by michelollivier 7
Get language name by language code

I suggest an improvement. Return not only the language code, but also its name.

https://github.com/patrickschur/language-detection/blob/master/resources/README.md
feature

opened by 4n70w4 6
Negative language probability
I try to improve language detection and set separate folder with samples as mentioned

$t = new LanguageDetection\Trainer(); $t->setMaxNgrams(9000); $t->learn(/project/language/samples');

So it created json files in language directories But when I try to detect language:

$ld = new LanguageDetection\Language([],/project/language/samples'); $ld->detect('some text here')->close()

I got negative probability

[ "bg" => -0.63268817204301, "ru" => -1.183311827957, ]

So if used bestResults(), the wrong language code returns. Text in my case is russian.

Is it normal that negative probability is returned?
opened by whitelessk 6
Fix Unsupported operand types for custom directory

A custom directory will lead to the library not working because the .json file is requested in the $dirname instead of the .php file.

So until this fixed as proposed, 4.0 will not work for anyone using a custom directory.

opened by iquito 5
Grille pain question 4
Hi,

Sorry, I forgot question 4:

'xzy' returns

array(6) { ["it"]=> float(0.092666666666667) ["en"]=> float(0.092) ["nl"]=> float(0.088) ["de"]=> float(0.084666666666667) ["es"]=> float(0.081666666666667) ["fr"]=> float(0.043333333333333) }

I can not launch translation with that.

-Could your briefly explain what this figures are and at which level one can say that they are reliable enough to return a detected language. You could add a bestResult() method 'winner by KO' (like germany brasil 7-0 ;).

-Maybe more interesting for developers, would be to add a validate() method which returns false if we definitely can not detect. I will insert it in my validation process.

Thanks again. Michel
question
opened by michelollivier 5

Deprecation notice with PHP 8.1

In PHP 8.1 a lot of interfaces got updated with new return types. That includes ArrayAccess. Therefore LanguageResult produces some little notices on PHP 8.1:

Deprecated: Return type of LanguageDetection\LanguageResult::offsetExists($offset) should either be compatible with ArrayAccess::offsetExists(mixed $offset): bool, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /app/vendor/patrickschur/language-detection/src/LanguageDetection/LanguageResult.php on line 37
Deprecated: Return type of LanguageDetection\LanguageResult::offsetGet($offset) should either be compatible with ArrayAccess::offsetGet(mixed $offset): mixed, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /app/vendor/patrickschur/language-detection/src/LanguageDetection/LanguageResult.php on line 46
Deprecated: Return type of LanguageDetection\LanguageResult::offsetSet($offset, $value) should either be compatible with ArrayAccess::offsetSet(mixed $offset, mixed $value): void, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /app/vendor/patrickschur/language-detection/src/LanguageDetection/LanguageResult.php on line 56
Deprecated: Return type of LanguageDetection\LanguageResult::offsetUnset($offset) should either be compatible with ArrayAccess::offsetUnset(mixed $offset): void, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /app/vendor/patrickschur/language-detection/src/LanguageDetection/LanguageResult.php on line 68

version: v5.1.0

opened by Knochenmarc 4

Language detection with php 5.6

Hey, I've been trying to do some basic operations (as described in readme file), but it simply doesn't work. I've been using PHP 5.6 for an old project and I removed those operators, which are newer (I've error reporting to E_ALL) and it doesn't have any errors, but it returns an empty array when calling this: $ld->detect('Mag het een onsje meer zijn?')->close();

Any ideas what I'm doing wrong or perhaps if there are any problems with older versions of PHP?

opened by SvetoslavStefanov 4
Increase max n-gram to 9000 by default

Since an n-gram of 9000 yields the best results, and by default if you want to train an additional language you would be interacting with the trainer, make the default 9000 and allow users to use a lower n-gram if they want.

As per the FAQ, the upfront cost exists in generating the resources, not using them.

opened by matthewnessworthy 4
Issue with detection of English text

Phrase : A beautiful villa in eastern Sweden --- Language Detected : sv

Phrase : Uma bela casa no leste da Suécia --- Language Detected : pt-BR Phrase : Eine schöne Villa in Ost-Schweden --- Language Detected : de Phrase : En vacker villa i östra sverige --- Language Detected : sv Phrase : পূর্ব সুইডেনে একটি সুন্দর বাগানবাড়ি --- Language Detected : bn Phrase : Une belle villa en Suède orientale --- Language Detected : it Phrase : فيلا جميلة في شرق السويد --- Language Detected : ar Phrase : En vakker villa i Øst-Sverige --- Language Detected : sv Phrase : Una hermosa villa en el este de Suecia --- Language Detected : es Phrase : En smuk villa i det østlige Sverige --- Language Detected : dk Phrase : Kaunis huvila Itä Ruotsissa --- Language Detected : fi Phrase : पूर्वी स्वीडन में एक खूबसूरत विला --- Language Detected : hi Phrase : Una bella villa in Svezia orientale --- Language Detected : it Phrase : 東部スウェーデンの美しいヴィラ --- Language Detected : ja Phrase : O vilă frumoasă în estul Suediei --- Language Detected : ro Phrase : Isang magandang villa sa silangang Sweden --- Language Detected : jv Phrase : Красивая вилла в восточной части Швеции --- Language Detected : ru Phrase : Красива вілла в східній частині Швеції --- Language Detected : uk Phrase : A beautiful villa in eastern Sweden --- Language Detected : sv Phrase : Uma bela casa no leste da Suécia --- Language Detected : pt-BR Phrase : Eine schöne Villa in Ost-Schweden --- Language Detected : de Phrase : En vacker villa i östra sverige --- Language Detected : sv Phrase : পূর্ব সুইডেনে একটি সুন্দর বাগানবাড়ি --- Language Detected : bn Phrase : Une belle villa en Suède orientale --- Language Detected : it Phrase : فيلا جميلة في شرق السويد --- Language Detected : ar Phrase : En vakker villa i Øst-Sverige --- Language Detected : sv Phrase : Una hermosa villa en el este de Suecia --- Language Detected : es Phrase : En smuk villa i det østlige Sverige --- Language Detected : dk Phrase : Kaunis huvila Itä Ruotsissa --- Language Detected : fi Phrase : पूर्वी स्वीडन में एक खूबसूरत विला --- Language Detected : hi Phrase : Una bella villa in Svezia orientale --- Language Detected : it Phrase : 東部スウェーデンの美しいヴィラ --- Language Detected : ja Phrase : O vilă frumoasă în estul Suediei --- Language Detected : ro Phrase : Isang magandang villa sa silangang Sweden --- Language Detected : jv Phrase : Красивая вилла в восточной части Швеции --- Language Detected : ru Phrase : Красива вілла в східній частині Швеції --- Language Detected : uk Phrase : A beautiful villa in eastern Sweden --- Language Detected : sv Phrase : Uma bela casa no leste da Suécia --- Language Detected : pt-BR Phrase : Eine schöne Villa in Ost-Schweden --- Language Detected : de Phrase : En vacker villa i östra sverige --- Language Detected : sv Phrase : পূর্ব সুইডেনে একটি সুন্দর বাগানবাড়ি --- Language Detected : bn Phrase : Une belle villa en Suède orientale --- Language Detected : it Phrase : فيلا جميلة في شرق السويد --- Language Detected : ar Phrase : En vakker villa i Øst-Sverige --- Language Detected : sv Phrase : Una hermosa villa en el este de Suecia --- Language Detected : es Phrase : En smuk villa i det østlige Sverige --- Language Detected : dk Phrase : Kaunis huvila Itä Ruotsissa --- Language Detected : fi Phrase : पूर्वी स्वीडन में एक खूबसूरत विला --- Language Detected : hi Phrase : Una bella villa in Svezia orientale --- Language Detected : it Phrase : 東部スウェーデンの美しいヴィラ --- Language Detected : ja Phrase : O vilă frumoasă în estul Suediei --- Language Detected : ro Phrase : Isang magandang villa sa silangang Sweden --- Language Detected : jv Phrase : Красивая вилла в восточной части Швеции --- Language Detected : ru Phrase : Красива вілла в східній частині Швеції --- Language Detected : uk

Above are the phrase and the language detected. The library seems to be working fine, but its not detecting the correct for simple english text.

opened by solution7 4
__toString() must be of the type string, null returned

When there's no results, __toString() returns null which conflicts with the return type. Happens when the input string is empty (maybe some other cases)

opened by nachitox 3
Use count($samples) instead of $this->maxNgrams

Just an idea. Increasing the size of maxNgrams will change the result confidences even if there's no actual difference. Using count($samples), we would get the same result if maxNgrams is set to 10,000 or 100,000.

opened by oyejorge 3
Unable to detect Chinese if there is only 1 character
$languageDetection = new Language(); $languageDetection->detect('很')->close();

The actual result would be 0 for all languages, while expecting zh-Hant and zh-Hans to have non-zero results.
opened by jhkchan 0

Feature Request - Min language's values

Currently we can use the limit function to return a specific quantity of languages:

$ld->detect('Mag het een onsje meer zijn?')->blacklist('af', 'dk', 'sv')->limit(0, 4)->close();

Array
(
    "nl" => 0.66193548387097
    "br" => 0.49634408602151
    "nb" => 0.48849462365591
    "nn" => 0.48741935483871
)

Would be nice to have a standalone function in the library to limit the results by its values.

$ld->detect('Mag het een onsje meer zijn?')->blacklist('af', 'dk', 'sv')->min(0.5)->close(); // or atLeast() instead of min()

Array
(
    "nl" => 0.66193548387097
)

// In case of a greater than number:

$ld->detect('Mag het een onsje meer zijn?')->blacklist('af', 'dk', 'sv')->min(1)->close(); // or atLeast() instead of min()

Array
(
)

opened by FabianoLothor 0

Create langLibrary form different directories
Hello,

Now, when we create a library, we can use following ways:

new Language() - following directory will be used (by default): DIR . '/../../resources//.json';

We use other .json-files (take as an example, en language): 2.1 Create at any place folder $dirname=LanguageDetection/en/. 2.2 Put there your own text file: en.txt. 2.3 Train library:

$t = new Trainer(); $t->learn($dirname);

2.4 Then use newly created/updated .json-file: LanguageDetection/en/en.json by:

new Language([], $dirname)

So, if we want to use default lang file, we should:

Copy already existing en.txt to our newly created folder.

Add our text to existing

Train library.

Use newly created/updated en.json

Request:

Not to copy-paste, it would be good have a possibility to use already existing en.json and newly created together, something like: new Language([], $dirname, $useDefaultFile = true):

3rd params by default is false

if dirname is defined and $useDefaultFile=true: use 2 path together - default one ( __DIR__ . '/../../resources/*/*.json'_) and new - dirname

help wanted feature
opened by IgorFrancais 3

Releases(v5.2.0)

v5.2.0(Mar 1, 2022)
Removed deprecation notices when using PHP 8.1 (#49)

Removed support for PHP 7.3 because it's no longer supported

Source code(tar.gz)
Source code(zip)
v5.1.0(Mar 5, 2021)
Added occitan language file (thanks to @Mejans)

Source code(tar.gz)
Source code(zip)
v5.0.0(Dec 11, 2020)
This release will add support for PHP 8 and removes the support for PHP 7.2 (because it's end of life). Users of PHP 7.2 should upgrade as soon as possible, as they may be exposed to unpatched security vulnerabilities.

You can upgrade from version 4.y.z to 5.y.z without any problems if you are already using PHP 7.3 or higher.

Added

Support for PHP 8

Swahili and Oromo language files (thanks to @matthewnessworthy)

Removed

PHP 7.2 support as it is no longer supported

Source code(tar.gz)
Source code(zip)
v4.0.1(Aug 12, 2020)
Fixed Unsupported operand types for custom directory (thanks to @iquito)

Source code(tar.gz)
Source code(zip)
v4.0.0(Aug 9, 2020)
Optimized performance by using PHP instead of JSON as a database file (thanks to @Pierstoval)

Note: The format of the resource files changed. You have to upgrade your JSON files to PHP. Please take a look at the upgrade guide for more information.
Source code(tar.gz)
Source code(zip)
v3.4.2(Nov 16, 2019)
Updated to PHPUnit 8

Source code(tar.gz)
Source code(zip)
v3.4.1(Nov 16, 2019)
Optimized performance by using fully-qualified function calls

Updated to PHP 7.2

Removed echo from Trainer class

Source code(tar.gz)
Source code(zip)
v3.4.0(Sep 19, 2018)

Changed danish language code from dk to da.
Source code(tar.gz)
Source code(zip)
v3.3.0(Jun 7, 2018)
Added serbian cyrylic and serbian latin

The method __toString returns always a string

Source code(tar.gz)
Source code(zip)
v3.2.0(Aug 19, 2017)
Added experimental support for custom directories. You can now specify your own path for translation files.

Added credits for the translation files.

Corrections for Amharic, Czech and Lao translation.

Source code(tar.gz)
Source code(zip)
v3.1.1(Jul 27, 2017)

Added the latest commits to the release.
Source code(tar.gz)
Source code(zip)
v3.1.0(Mar 14, 2017)
Added

Support for the languages Lingala, Sanskrit, Tonga and Urdu

Source code(tar.gz)
Source code(zip)
v3.0(Feb 5, 2017)
Added

CHANGELOG.md

CONTRIBUTING.md

Tokenizer interface

Changed

Folder structure (renamed etc to resources)

Renamed all language files (added .txt extension)

Improved performance

Updated to PHPUnit 6

Updated README.md

Removed

Autoloader script

Language model _langs.json

Source code(tar.gz)
Source code(zip)
v2.1.1(Feb 2, 2017)
Fixed typos for Lithuanian language sample

Fixed wrong ISO 639-1 codes

Source code(tar.gz)
Source code(zip)
v2.1(Jan 29, 2017)

Fixed a bug that could produce slightly worse results.
Source code(tar.gz)
Source code(zip)
v2.0(Jan 9, 2017)

Changed the interface and added an IteratorAggregate interface to iterate over the results from LanguageResult easily.
Source code(tar.gz)
Source code(zip)
v1.2(Jan 4, 2017)

Added several new languages. It can now recognize 106 languages.
Source code(tar.gz)
Source code(zip)
v1.1(Dec 31, 2016)

Added new languages
Source code(tar.gz)
Source code(zip)
v1.0(Dec 25, 2016)

Source code(tar.gz)
Source code(zip)

Owner

Patrick Schur

Professional DevOps Engineer

GitHub

The Universal Device Detection library will parse any User Agent and detect the browser, operating system, device used (desktop, tablet, mobile, tv, cars, console, etc.), brand and model.

DeviceDetector Code Status Description The Universal Device Detection library that parses User Agents and detects devices (desktop, tablet, mobile, tv

2.4k Jan 5, 2023

A PHP string manipulation library with multibyte support. Compatible with PHP 5.4+, PHP 7+, and HHVM.

A PHP string manipulation library with multibyte support. Compatible with PHP 5.4+, PHP 7+, and HHVM. s('string')->toTitleCase()->ensureRight('y') ==

2.5k Dec 28, 2022

A PHP string manipulation library with multibyte support

A PHP string manipulation library with multibyte support. Compatible with PHP 5.4+, PHP 7+, and HHVM. s('string')->toTitleCase()->ensureRight('y') ==

2.5k Jan 3, 2023

🉑 Portable UTF-8 library - performance optimized (unicode) string functions for php.

?? Portable UTF-8 Description It is written in PHP (PHP 7+) and can work without "mbstring", "iconv" or any other extra encoding php-extension on your

474 Dec 22, 2022

🔡 Portable ASCII library - performance optimized (ascii) string functions for php.

?? Portable ASCII Description It is written in PHP (PHP 7+) and can work without "mbstring", "iconv" or any other extra encoding php-extension on your

380 Jan 6, 2023

PHP library to parse urls from string input

Url highlight - PHP library to parse URLs from string input. Works with complex URLs, edge cases and encoded input. Features: Replace URLs in string b

77 Sep 16, 2022

:accept: Stringy - A PHP string manipulation library with multibyte support, performance optimized

?? Stringy A PHP string manipulation library with multibyte support. Compatible with PHP 7+ 100% compatible with the original "Stringy" library, but t

144 Dec 12, 2022

Mobile_Detect is a lightweight PHP class for detecting mobile devices (including tablets). It uses the User-Agent string combined with specific HTTP headers to detect the mobile environment.

Motto: "Every business should have a detection script to detect mobile readers." About Mobile Detect is a lightweight PHP class for detecting mobile d

10.2k Jan 4, 2023

Strings Component provide a fluent, object-oriented interface for working with multibyte strings, allowing you to chain multiple string operations together using a more readable syntax compared to traditional PHP strings functions.

Strings Component Strings Component provide a fluent, object-oriented interface for working with multibyte string, allowing you to chain multiple stri

14 Mar 12, 2022

Converts a string to a slug. Includes integrations for Symfony, Silex, Laravel, Zend Framework 2, Twig, Nette and Latte.

cocur/slugify Converts a string into a slug. Developed by Florian Eckerstorfer in Vienna, Europe with the help of many great contributors. Features Re

2.8k Dec 22, 2022

"結巴"中文分詞：做最好的 PHP 中文分詞、中文斷詞組件。 / "Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best PHP Chinese word segmentation module.

jieba-php "結巴"中文分詞：做最好的 PHP 中文分詞、中文斷詞組件，目前翻譯版本為 jieba-0.33 版本，未來再慢慢往上升級，效能也需要再改善，請有興趣的開發者一起加入開發！若想使用 Python 版本請前往 fxsjy/jieba 現在已經可以支援繁體中文！只要將字典切換為 bi

1.2k Dec 31, 2022

A language detection library for PHP. Detects the language from a given text string.

Related tags

Overview

language-detection

Table of Contents

Installation with Composer

How to upgrade from 3.y.z to 4.y.z?

Basic Usage

API

__construct(array $result = [], string $dirname = '')

whitelist(string ...$whitelist)

blacklist(string ...$blacklist)

bestResults()

limit(int $offset, int $length = null)

close()

setTokenizer(TokenizerInterface $tokenizer)

__toString()

jsonSerialize()

Method chaining

ArrayAccess

Supported languages

Other languages

Example

FAQ

How can I improve the detection phase?

Is the detection process slower if language files are very big?

Contributing

License

Comments

Releases(v5.2.0)

v5.2.0(Mar 1, 2022)

v5.1.0(Mar 5, 2021)

v5.0.0(Dec 11, 2020)

Added

Removed

v4.0.1(Aug 12, 2020)

v4.0.0(Aug 9, 2020)

v3.4.2(Nov 16, 2019)

v3.4.1(Nov 16, 2019)

v3.4.0(Sep 19, 2018)

v3.3.0(Jun 7, 2018)

v3.2.0(Aug 19, 2017)

v3.1.1(Jul 27, 2017)

v3.1.0(Mar 14, 2017)

Added

v3.0(Feb 5, 2017)

Added

Changed

Removed

v2.1.1(Feb 2, 2017)

v2.1(Jan 29, 2017)

v2.0(Jan 9, 2017)

v1.2(Jan 4, 2017)

v1.1(Dec 31, 2016)

v1.0(Dec 25, 2016)

Owner

Patrick Schur

The Universal Device Detection library will parse any User Agent and detect the browser, operating system, device used (desktop, tablet, mobile, tv, cars, console, etc.), brand and model.

A PHP string manipulation library with multibyte support. Compatible with PHP 5.4+, PHP 7+, and HHVM.

A PHP string manipulation library with multibyte support

🉑 Portable UTF-8 library - performance optimized (unicode) string functions for php.

🔡 Portable ASCII library - performance optimized (ascii) string functions for php.

PHP library to parse urls from string input

:accept: Stringy - A PHP string manipulation library with multibyte support, performance optimized

Mobile_Detect is a lightweight PHP class for detecting mobile devices (including tablets). It uses the User-Agent string combined with specific HTTP headers to detect the mobile environment.

Strings Component provide a fluent, object-oriented interface for working with multibyte strings, allowing you to chain multiple string operations together using a more readable syntax compared to traditional PHP strings functions.

Converts a string to a slug. Includes integrations for Symfony, Silex, Laravel, Zend Framework 2, Twig, Nette and Latte.

"結巴"中文分詞：做最好的 PHP 中文分詞、中文斷詞組件。 / "Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best PHP Chinese word segmentation module.

Paranoid text spacing in PHP

Render Persian Text (UTF-8 Hexadecimals)

HippyVM - an implementation of the PHP language in RPython

A PHP library for generating universally unique identifiers (UUIDs).

A fast PHP slug generator and transliteration library that converts non-ascii characters for use in URLs.

ColorJizz is a PHP library for manipulating and converting colors.

PHP library to detect and manipulate indentation of strings and files

Library for free use Google Translator. With attempts connecting on failure and array support.

How to upgrade from `3.y.z` to `4.y.z`?

`__construct(array $result = [], string $dirname = '')`

`whitelist(string ...$whitelist)`

`blacklist(string ...$blacklist)`

`bestResults()`

`limit(int $offset, int $length = null)`

`close()`

`setTokenizer(TokenizerInterface $tokenizer)`

`__toString()`

`jsonSerialize()`