A fast PHP slug generator and transliteration library that converts non-ascii characters for use in URLs.

Aband*nthecar

Last update: Dec 20, 2022

Related tags

Strings php unicode ascii seo urlify slugs transliteration blogging urls pretty-urls blogs slugify slug pretty-url

Overview

URLify for PHP

A fast PHP slug generator and transliteration library, started as a PHP port of URLify.js from the Django project.

Handles symbols from latin languages, Arabic, Azerbaijani, Bulgarian, Burmese, Croatian, Czech, Danish, Esperanto, Estonian, Finnish, French, Switzerland (French), Austrian (French), Georgian, German, Switzerland (German), Austrian (German), Greek, Hindi, Kazakh, Latvian, Lithuanian, Norwegian, Persian, Polish, Romanian, Russian, Swedish, Serbian, Slovak, Turkish, Ukrainian and Vietnamese, and many other via ASCII::to_transliterate().

Symbols it cannot transliterate it can omit or replace with a specified character.

Installation

Install the latest version with:

$ composer require jbroadway/urlify

Usage

First, include Composer's autoloader:

require_once 'vendor/autoload.php';

To generate slugs for URLs:

<?php

echo URLify::slug (' J\'étudie le français ');
// "jetudie-le-francais"

echo URLify::slug ('Lo siento, no hablo español.');
// "lo-siento-no-hablo-espanol"

To generate slugs for file names:

<?php

echo URLify::filter ('фото.jpg', 60, "", true);
// "foto.jpg"

To simply transliterate characters:

<?php

echo URLify::downcode ('J\'étudie le français');
// "J'etudie le francais"

echo URLify::downcode ('Lo siento, no hablo español.');
// "Lo siento, no hablo espanol."

/* Or use transliterate() alias: */

echo URLify::transliterate ('Lo siento, no hablo español.');
// "Lo siento, no hablo espanol."

To extend the character list:

<?php

URLify::add_chars ([
	'¿' => '?', '®' => '(r)', '¼' => '1/4',
	'½' => '1/2', '¾' => '3/4', '¶' => 'P'
]);

echo URLify::downcode ('¿ ® ¼ ¼ ¾ ¶');
// "? (r) 1/2 1/2 3/4 P"

To extend the list of words to remove:

<?php

URLify::remove_words (['remove', 'these', 'too']);

To prioritize a certain language map:

<?php

echo URLify::filter ('Ägypten und Österreich besitzen wie üblich ein Übermaß an ähnlich öligen Attachés', 60, 'de');
// "aegypten-und-oesterreich-besitzen-wie-ueblich-ein-uebermass-aehnlich-oeligen-attaches"

echo URLify::filter ('Cağaloğlu, çalıştığı, müjde, lazım, mahkûm', 60, 'tr');
// "cagaloglu-calistigi-mujde-lazim-mahkum"

Please note that the "ü" is transliterated to "ue" in the first case, whereas it results in a simple "u" in the latter.

Comments

Not compatable with Laravel 9

Since Laravel 9 is requiring voku/portable-ascii:^2.0 and this repo is requiring voku/portable-ascii:^1.4 it causes a conflict when trying to update composer.

opened by emedchill 7
Use classmap instead.

The verdict is out our cool feature was just too cool =)

They suggest we use classmap instead.

Included the URLify.php as a classmap for autoloader instead of psr-0.

opened by nickl- 6
Fix URLify::init() when called with some language

If $language is not one for which there's a key in the $maps array, the $chars is not reset and the regular expression becomes longer and longer every time init() is called

opened by mlocati 4
PSR-0 compliance and other goodies
I was rather sceptical at first not wanting to over complicate this simple class with namespace and 30 levels deep library/src/package folders I just bit my lip and tried:

"autoload": { "psr-0": { "": "" } }

and it worked =) so "" is PSR-0 capable and we have all the nyummy goodness of autoloading.

Added an INSTALL file to explain installation. Added bootstrap.php to strap the vendor/autoloader Added phpunit.xml Removed the require_once from the test

In test folder just run phpunit no arguments if bootstrap can't strap it will display the INSTALL file else the test will run as if no one cares.

Enjoy!
opened by nickl- 4
Fix Issue #55

This change fixes a language exception that occurs when a character that is used as a regular expression delimiter ("/" by default) is included as a key in the array argument passed into the add_chars() method, and then downcode() is called.

Note that the existing tests do not cover this scenario.

opened by cbj4074 2
Missing A char
Hi, I found a strange bug, look at the below code (local ENV: php 5.6 on mac os, dev-prod ENV: php 5.6 on ubuntu 16):

var_dump(\URLify::filter('Text sample A')); // text-sample

var_dump(\URLify::filter('Text sample B')); // text-sample-b

var_dump(\URLify::filter('Text sample AA')); // text-sample-aa

Where is, in the first var_dump, the last "a" char?

Is this package still maintained?
opened by marlenesco 2
Make usage of remove_list optional

Hi,

currently, the removal of words can only be influenced by setting the public static $remove_list property (as seen in #35).

When using URLify in multiple places in a project, this has to be multiple times which seems error-prone. Also, using the remove_list feature in some calls to URLify::filter while disabling it in other calls isn't possible currently.

This pull request adds an additional parameter to the URLify::filter() method to toggle the usage of the remove list feature.

Thx! :)

opened by mkraemer 2
Unable to urlify properly
Hi there,

I've been trying to urlify a very simple string but the last part is being dropped. It's probably a wanted behaviour but it could be useful if there may be an option to avoid that.

My string is "Brazilian Série A" and I want it to become "brazilian-serie-a". It becomes "brazilian-serie" instead without the final "-a" part. Any way I can do this?

Below my code:

\URLify::filter('Brazilian Série A') // produces "brazilian-serie"

Tried also with:

\URLify::filter('Brazilian Série A', 120, 'en') // produces "brazilian-serie"
opened by fracasula 2
Added the possibility to priorize urlify language maps

This is useful if languages have different rules for the same character. e.g. German: "ü"=>"ue" Turkish "ü" => "u"

opened by patrickheck 2
[+]: use "voku/portable-ascii"

reference: https://github.com/jbroadway/urlify/issues/51

Here I added "voku/portable-ascii" which is for example used in the "dev-master" version of laravel and it's also based on this project. :smile:

I know it's not that easy to pick a minimal php version to support, but most systems has already upgraded to > 7.0 (https://blog.packagist.com/php-versions-stats-2019-1-edition/) so maybe it's time to move forward?

Linux Distro | Version | End of Life | Default PHP -- | -- | -- | -- | Ubuntu | 14.04 (Trusty) | April 2019 (EOL) | 5.5.9 | | Ubuntu | 16.04 (Xenial) | April 2024 | 7.0 | | Ubuntu | 18.04 (Bionic) | April 2028 | 7.2 | | Debian | 8 (Jessie) | June 30, 2020 | 5.6.29 | | Debian | 9 (Stretch) | ~2022 | 7.0 | | Fedora | 29 | October 30, 2018 | 7.2.10 | | Fedora | 30 | April 30, 2019 | 7.3.4 | | OpenSUSE | Leap 15.1 | November 22, 2020 | 7.2.5 | | CentOS | 6 | November 30, 2020 | 5.3.3 | | CentOS | 7 | June 30, 2024 | 5.4.16 | | RHEL | 6 | November 30, 2020 | 5.3.3 | | RHEL | 7 | June 30, 2024 | 5.4.16 | | RHEL | 8 | May 2029 | 7.2 | | OEL | 6 | March 2021 | 7.0 (min) | | OEL | 7 | July 2024 | 7.0 (min) | |

PS: in my fork I also added different "stop-words" (https://github.com/voku/stop-words/tree/master/src/voku/helper/stopwords) for different languages and some other specials like support for currencies (https://github.com/voku/urlify/blob/master/src/voku/helper/URLify.php#L484). I don't know if this is also interesting for you?

This change is

opened by voku 1
Passing certain characters to add_chars() method causes "preg_match_all(): Unknown modifier ']'"
Consider the following:

URLify::add_chars(['/' => '']);

This causes a language exception, preg_match_all(): Unknown modifier ']', because the / character is used as the regular expression delimiter within the URLify library.

The above example derives from a fairly common and reasonable use-case: I want to remove all illegal characters from a file name, and on UNIX and Windows, / is illegal.

To fix this, PHP's preg_quote() function must be called on the keys in the array argument passed to add_chars().

I'll submit a PR shortly that seeks to fix the issue.
opened by cbj4074 1
1.2.4 changed transliteration behaviour

Upgrading from 1.2.3 to 1.2.4 broke our test suite, in particular some characters are transliterated differently, breaking assertions and semver.

E.g. we test that това е текст на бълрагски за тест becomes tova-e-tekst-na-blragski-za-test which is true in 1.2.3 and false in 1.2.4.

In 1.2.4 it instead transliterates to tova-e-tekst-na-bielragski-za-test.

| urlify version | in | out | |----------------|-----------|------------| | 1.2.3 | бълрагски | blragski | | 1.2.4 | бълрагски | bielragski |

I'm sure the dependency has its reasons for doing this, but composer pulled in 1.2.4 automatically and broke out test suites, this should have been a 1.3.0 or a 2.0.0 release.

opened by tomjn 2
Why is $underscoreToSpace removed ?

Hi,

Why is $underscoreToSpace removed from the filter ? It was pretty handy to make underscores hypens of you wanted, or spaces ofcourse.

I hope there is a good reason for it!

Thanks

opened by Yamakasi 4
Add new param $trim_under_score

It will fix the issue when trimming a text:

From : -test- to test

With this option the "-" won't be removed.

By default it works as usual.

opened by jmontoyaa 0

Support more characters by default

Had to add the following chars for our transliteration test to pass:

        URLify::add_chars(
            array(
                'Ÿ' => 'Y',
                'µ' => 'u',
                '¥' => 'Y',
                'Ĉ' => 'C',
                'ĉ' => 'c',
                'Ċ' => 'C',
                'ċ' => 'c',
                'Ĝ' => 'G',
                'ĝ' => 'g',
                'Ġ' => 'G',
                'ġ' => 'g',
                'Ĥ' => 'H',
                'ĥ' => 'h',
                'Ħ' => 'H',
                'ħ' => 'h',
                'Ĕ' => 'E',
                'ĕ' => 'e',
                'Ĭ' => 'I',
                'ĭ' => 'i',
                'Ĵ' => 'J',
                'ĵ' => 'j',
                'Ĺ' => 'L',
                'ĺ' => 'l',
                'Ľ' => 'L',
                'ľ' => 'l',
                'Ŀ' => 'L',
                'ŀ' => 'l',
                'ŉ' => 'n',
                'Ō' => 'O',
                'ō' => 'o',
                'Ŏ' => 'O',
                'ŏ' => 'o',
                'Ŕ' => 'R',
                'ŕ' => 'r',
                'Ŗ' => 'R',
                'ŗ' => 'r',
                'Ŝ' => 'S',
                'ŝ' => 's',
                'Ŧ' => 'T',
                'ŧ' => 't',
                'Ŭ' => 'U',
                'ŭ' => 'u',
                'Ŵ' => 'W',
                'ŵ' => 'w',
                'Ŷ' => 'Y',
                'ŷ' => 'y',
                'ſ' => 'i',
                'ƒ' => 'f',
                'O' => 'O',
                'o' => 'o',
                'U' => 'U',
                'u' => 'u',
                'Ǎ' => 'A',
                'ǎ' => 'a',
                'Ǐ' => 'I',
                'ǐ' => 'i',
                'Ǒ' => 'O',
                'ǒ' => 'o',
                'Ǔ' => 'U',
                'ǔ' => 'u',
                'Ǖ' => 'U',
                'ǖ' => 'u',
                'Ǘ' => 'U',
                'ǘ' => 'u',
                'Ǚ' => 'U',
                'ǚ' => 'u',
                'Ǜ' => 'U',
                'ǜ' => 'u',
                'Ǻ' => 'A',
                'ǻ' => 'a',
                'Ǿ' => 'O',
                'ǿ' => 'o',
                'Ǽ' => 'Ae',
                'ǽ' => 'ae',
                'Ĳ' => 'IJ',
                'ĳ' => 'ij',
                'J' => 'J',
                'ĸ' => 'k',
                'Ŋ' => 'N',
                'ŋ' => 'n',
                'Ẁ' => 'W',
                'ẁ' => 'w',
                'Ẃ' => 'W',
                'ẃ' => 'w',
                'Ẅ' => 'W',
                'ẅ' => 'w',
            )
        );

Unfortunately, since I do not know what language they belong to, I find it difficult to provide a PR when the code is structured based on language.

opened by motin 1

Releases(1.2.4-stable)

1.2.4-stable(Jun 15, 2022)
Updated voku/portable-ascii dependency to ^2.0 to fix Laravel 9 compatibility - thanks @madman-81!

Source code(tar.gz)
Source code(zip)
1.2.3-stable(Jan 18, 2022)
Migrated CI from travis-ci to GitHub Actions

Updated test fixtures

Updated composer description and dependency versions

Added badges to readme

Source code(tar.gz)
Source code(zip)
1.2.2-stable(Jun 14, 2020)
Added URLify::slug() method with simplified options as a wrapper around URLify::filter().

Readme updates

Source code(tar.gz)
Source code(zip)
1.2.1-stable(Jun 2, 2020)
Fixed tests broken from changes in voku/portable-ascii

Strip additional dev files from releases - thanks @Tobion!

Now requires PHP 7.2+ to match PHPUnit

Fixed missing autoloader include in command line scripts

Readme updates

Source code(tar.gz)
Source code(zip)
1.2.0-stable(Dec 13, 2019)
Using voku/portable-ascii performance optimized ascii string function library

Stop word support for multiple languages (disabled by default)

Currency symbol support

Support for more unicode characters

Removed support for PHP versions before 7.0

Thanks to @voku for the improvements!
Source code(tar.gz)
Source code(zip)
1.1.3-stable(Jun 27, 2019)
Fixed issue with / character being added via add_chars()

Fixed Vietnamese language code

Fixed potential duplicate word issue

Removed HHVM from testing and added newer PHP versions to automated tests

Thanks to @pincombe, @scorp13, and @cbj4074 for the fixes!
Source code(tar.gz)
Source code(zip)
1.1.2-stable(Dec 8, 2018)

Added Slovak characters, moved PHPUnit to dev dependencies, updated README.
Source code(tar.gz)
Source code(zip)
1.1.1-stable(Aug 28, 2018)

Corrected license identifier in composer.json, minor install instruction edits.
Source code(tar.gz)
Source code(zip)
1.1.0-stable(Jan 3, 2017)

Kazakh support added, PHPDoc params and returns added.
Source code(tar.gz)
Source code(zip)
1.0.9-stable(Sep 14, 2016)

New Persian character set, more Arabic characters added.
Source code(tar.gz)
Source code(zip)
1.0.8-stable(Jul 27, 2016)
This release adds two new options to filter():

$lower_case specifies whether you want to convert to lower case (the default), or preserve the existing case of the text.

$treat_underscore_as_space specifies whether you want to convert underscores to spaces (the default), or preserve underscores in the output.

Thanks @ywarnier and @jmontoyaa for these additions!
Source code(tar.gz)
Source code(zip)
1.0.7-stable(Dec 7, 2015)

Added optional $use_remove_list parameter, added missing œ character.
Source code(tar.gz)
Source code(zip)
1.0.6-stable(Oct 15, 2015)

Bulgarian characters added, new CLI scripts (downcode, filter, transliterate), fix for UTF-8 spaces. Thanks to @skyosev, @shefi, @rinogo, and @karptonite for these!
Source code(tar.gz)
Source code(zip)
1.0.5-stable(May 29, 2015)

Bulgarian language added.
Source code(tar.gz)
Source code(zip)
1.0.4-stable(Mar 9, 2015)

License fixed to match Django project.
Source code(tar.gz)
Source code(zip)
1.0.3-stable(Mar 17, 2014)

Added characters for Arabic, Serbian and Azerbaijani.
Source code(tar.gz)
Source code(zip)
1.0.2-stable(Feb 5, 2014)

Added missing Romanian diactricts.
Source code(tar.gz)
Source code(zip)
1.0.1-stable(Oct 16, 2013)

Added a missing character to the map.
Source code(tar.gz)
Source code(zip)

Owner

Aband*nthecar

Full-stack developer. One-man synthpop band. CTO/Co-Founder @ HeyAlfa + Flipside XR.

GitHub

Converts a string to a slug. Includes integrations for Symfony, Silex, Laravel, Zend Framework 2, Twig, Nette and Latte.

cocur/slugify Converts a string into a slug. Developed by Florian Eckerstorfer in Vienna, Europe with the help of many great contributors. Features Re

2.8k Dec 22, 2022

A PHP class which allows the decoding and encoding of a wider variety of characters compared to the standard htmlentities and html_entity_decode functions.

The ability to encode and decode a certain set of characters called 'Html Entities' has existed since PHP4. Amongst the vast number of functions built into PHP, there are 4 nearly identical functions that are used to encode and decode html entities; despite their similarities, however, 2 of them do provide additional capabilities not available to the others.

2 Nov 12, 2022

only 5 characters to rce

phpfuck-6characters @Y4tacker Description: only 6 characters to rce ( ) ^ 9 . ; Useage php 6character-rce.php system(\"whoami\"); (((((99999999999999

12 Oct 4, 2022

PHP library to parse urls from string input

Url highlight - PHP library to parse URLs from string input. Works with complex URLs, edge cases and encoded input. Features: Replace URLs in string b

77 Sep 16, 2022

php-crossplane - Reliable and fast NGINX configuration file parser and builder

php-crossplane Reliable and fast NGINX configuration file parser and builder ℹ️ This is a PHP port of the Nginx Python crossplane package which can be

19 Jun 30, 2022

Library for free use Google Translator. With attempts connecting on failure and array support.

GoogleTranslateForFree Packagist: https://packagist.org/packages/dejurin/php-google-translate-for-free Library for free use Google Translator. With at

122 Dec 23, 2022

Generate Heroku-like random names to use in your php applications.

HaikunatorPHP Generate Heroku-like random names to use in your PHP applications. Installation composer require atrox/haikunator Usage Haikunator is p

99 Jul 19, 2022

A PHP string manipulation library with multibyte support. Compatible with PHP 5.4+, PHP 7+, and HHVM.

A PHP string manipulation library with multibyte support. Compatible with PHP 5.4+, PHP 7+, and HHVM. s('string')->toTitleCase()->ensureRight('y') ==

2.5k Dec 28, 2022

PHP library to detect and manipulate indentation of strings and files

indentation PHP library to detect and manipulate the indentation of files and strings Installation composer require --dev colinodell/indentation Usage

34 Nov 28, 2022

The Universal Device Detection library will parse any User Agent and detect the browser, operating system, device used (desktop, tablet, mobile, tv, cars, console, etc.), brand and model.

DeviceDetector Code Status Description The Universal Device Detection library that parses User Agents and detects devices (desktop, tablet, mobile, tv

2.4k Jan 5, 2023

A fast PHP slug generator and transliteration library that converts non-ascii characters for use in URLs.

Related tags

Overview

URLify for PHP

Installation

Usage

Comments

Releases(1.2.4-stable)

1.2.4-stable(Jun 15, 2022)

1.2.3-stable(Jan 18, 2022)

1.2.2-stable(Jun 14, 2020)

1.2.1-stable(Jun 2, 2020)

1.2.0-stable(Dec 13, 2019)

1.1.3-stable(Jun 27, 2019)

1.1.2-stable(Dec 8, 2018)

1.1.1-stable(Aug 28, 2018)

1.1.0-stable(Jan 3, 2017)

1.0.9-stable(Sep 14, 2016)

1.0.8-stable(Jul 27, 2016)

1.0.7-stable(Dec 7, 2015)

1.0.6-stable(Oct 15, 2015)

1.0.5-stable(May 29, 2015)

1.0.4-stable(Mar 9, 2015)

1.0.3-stable(Mar 17, 2014)

1.0.2-stable(Feb 5, 2014)

1.0.1-stable(Oct 16, 2013)

Owner

Aband*nthecar

Converts a string to a slug. Includes integrations for Symfony, Silex, Laravel, Zend Framework 2, Twig, Nette and Latte.

A PHP class which allows the decoding and encoding of a wider variety of characters compared to the standard htmlentities and html_entity_decode functions.

only 5 characters to rce

PHP library to parse urls from string input

php-crossplane - Reliable and fast NGINX configuration file parser and builder

Library for free use Google Translator. With attempts connecting on failure and array support.

Generate Heroku-like random names to use in your php applications.

A PHP string manipulation library with multibyte support. Compatible with PHP 5.4+, PHP 7+, and HHVM.

PHP library to detect and manipulate indentation of strings and files

The Universal Device Detection library will parse any User Agent and detect the browser, operating system, device used (desktop, tablet, mobile, tv, cars, console, etc.), brand and model.

ColorJizz is a PHP library for manipulating and converting colors.

A PHP library for generating universally unique identifiers (UUIDs).

A PHP string manipulation library with multibyte support

🉑 Portable UTF-8 library - performance optimized (unicode) string functions for php.

:accept: Stringy - A PHP string manipulation library with multibyte support, performance optimized

A language detection library for PHP. Detects the language from a given text string.

Text - Simple 1 Class Text Manipulation Library

The Hoa\Ustring library.

PCRE wrapping library that offers type-safe preg_* replacements.