PHP Class Encoding featuring popular Encoding::toUTF8() function --formerly known as forceUTF8()-- that fixes mixed encoded strings.

Overview

forceutf8

PHP Class Encoding featuring popular \ForceUTF8\Encoding::toUTF8() function --formerly known as forceUTF8()-- that fixes mixed encoded strings.

Description

If you apply the PHP function utf8_encode() to an already-UTF8 string it will return a garbled UTF8 string.

This class addresses this issue and provides a handy static function called \ForceUTF8\Encoding::toUTF8().

You don't need to know what the encoding of your strings is. It can be Latin1 (ISO 8859-1), Windows-1252 or UTF8, or the string can have a mix of them. \ForceUTF8\Encoding::toUTF8() will convert everything to UTF8.

Sometimes you have to deal with services that are unreliable in terms of encoding, possibly mixing UTF8 and Latin1 in the same string.

Update:

I've included another function, \ForceUTF8\Encoding::fixUTF8(), which will fix the double (or multiple) encoded UTF8 string that looks garbled.

Usage:

use \ForceUTF8\Encoding;

$utf8_string = Encoding::toUTF8($utf8_or_latin1_or_mixed_string);

$latin1_string = Encoding::toLatin1($utf8_or_latin1_or_mixed_string);

also:

$utf8_string = Encoding::fixUTF8($garbled_utf8_string);

Examples:

use \ForceUTF8\Encoding;

echo Encoding::fixUTF8("Fédération Camerounaise de Football\n");
echo Encoding::fixUTF8("Fédération Camerounaise de Football\n");
echo Encoding::fixUTF8("Fédération Camerounaise de Football\n");
echo Encoding::fixUTF8("Fédération Camerounaise de Football\n");

will output:

Fédération Camerounaise de Football
Fédération Camerounaise de Football
Fédération Camerounaise de Football
Fédération Camerounaise de Football

Options:

By default, Encoding::fixUTF8 will use the Encoding::WITHOUT_ICONV flag, signalling that iconv should not be used to fix garbled UTF8 strings.

This class also provides options for iconv processing, such as Encoding::ICONV_TRANSLIT and Encoding::ICONV_IGNORE to enable these flags when the iconv class is utilized. The functionality of such flags are documented in the PHP iconv documentation.

Examples:

use \ForceUTF8\Encoding;

$str = "Fédération Camerounaise—de—Football\n"; // Uses U+2014 which is invalid ISO8859-1 but exists in Win1252
echo Encoding::fixUTF8($str); // Will break U+2014
echo Encoding::fixUTF8($str, Encoding::ICONV_IGNORE); // Will preserve U+2014
echo Encoding::fixUTF8($str, Encoding::ICONV_TRANSLIT); // Will preserve U+2014

will output:

Fédération Camerounaise?de?Football
Fédération Camerounaise—de—Football
Fédération Camerounaise—de—Football

while:

use \ForceUTF8\Encoding;

$str = "čęėįšųūž"; // Uses several characters not present in ISO8859-1 / Win1252
echo Encoding::fixUTF8($str); // Will break invalid characters
echo Encoding::fixUTF8($str, Encoding::ICONV_IGNORE); // Will remove invalid characters, keep those present in Win1252
echo Encoding::fixUTF8($str, Encoding::ICONV_TRANSLIT); // Will trasliterate invalid characters, keep those present in Win1252

will output:

????????
šž
ceeišuuž

Install via composer:

Edit your composer.json file to include the following:

{
    "require": {
        "neitanod/forceutf8": "~2.0"
    }
}

Tips:

You can tip me with Bitcoin if you want. :)

1Awfu4TZpy99H7Pyzt1mooxU1aP2mJVdHP

Comments
  • fixUTF8 Problem with certain characters

    fixUTF8 Problem with certain characters

    Hi, I have found that the fixUTF8 has issues when the input string has ligature characters such as Œ, this is converted to a ? sign, even tough the input string does not need any fixing.

    Input: Café Nöel Œuf Aoüt Output: Café Nöel ?uf Août

    opened by cruvalcaba 13
  • Create a release

    Create a release

    It would be really good if you could create a release for this so we can target a specific release of this. I feel uncomfortable targeting 'master'.

    Thanks in advance!

    opened by parse 12
  • Could a new v1.5 release get tagged?

    Could a new v1.5 release get tagged?

    There's a lot of fantastic work happening and I love seeing so much activity. But having a newer tag release would help me feel more comfortable with my build stability.

    opened by ecaron 10
  • Unable to convert string

    Unable to convert string

    I am unable to convert the string, no mater what function i try to use.

    <?php
    
    require_once 'vendor/forceutf8/src/ForceUTF8/Encoding.php';
    
    $string = "“Grinvich”";
    
    echo "string to convert: {$string}<br>";
    
    $string1 = \ForceUTF8\Encoding::toUTF8($string);
    
    echo "<hr>toUTF8: {$string1}";
    
    $string2 = \ForceUTF8\Encoding::fixUTF8($string);
    
    echo "<hr>fixUTF8: {$string2}";
    
    $string3 = \ForceUTF8\Encoding::UTF8FixWin1252Chars($string);
    
    echo "<hr>UTF8FixWin1252Chars: {$string3}";
    
    $string4 = \ForceUTF8\Encoding::toWin1252($string);
    
    echo "<hr>toWin1252: {$string4}";
    
    
    
    opened by Exadra37 6
  • utf symbols converted to questionmarks

    utf symbols converted to questionmarks

    I try to use utf fixer with Lithuanian characters like: čęėįšųūž and they are being replaced with questionmarks :(

    Ex. ForceUTF8\Encoding::fixUTF8('Kėbulo numeris') Result: K?ebulo numeris

    Same with other mensioned letters.

    opened by kgiedrius 5
  • Method to fix utf8 from any array

    Method to fix utf8 from any array

    Thank you very much to your class, helped to solve the many problems in the project in which I am part. I've added a method to be used to fix arrays, which is very useful in json_encode's that return "Malformed UTF-8 characters, possibly incorrectly encoded". I hope to add a lot to this class that helped me a lot.

    opened by raulinoneto 5
  • Cant convert to UTF8 from string

    Cant convert to UTF8 from string

    Hello, i have an XML with mixed character encoding. I used this object to convert it to UTF-8, but after i have some faulty character.

    Here is my XML: https://drive.google.com/file/d/0Bwc9cF3Q50LTUmMwUTVyT1pVY28/view?usp=sharing

    opened by tthlaszlo 5
  • Added a new parameter in fixUTF8 method

    Added a new parameter in fixUTF8 method

    I am running into problems with this new version that uses iconv.

    I added a new parameter in the fixUTF8 method that allows the user to choose whether to use iconv with //TRANSLIT, //IGNORE or use the old method.

    opened by byjg 5
  • Didn't work with some languages for example CZECH

    Didn't work with some languages for example CZECH

    Example text:

    Dle prohlášení Novavax bude tato továrna schopna produkovat až 1 miliardu vakcín ročně již od října. V srpnu spolu s velvyslancem USA navštívíl Andrej Babiš tuto továrnu na výrobu vakcín. Otázkou zatím zůstává, proč se toho Andrej Babiš zúčastnil, pravděpodobně (spekulace) má jeho Agrofert nějakou spojitost buď s dodávkami nebo subdodávkami.

    ...::fixUTF8($text)

    Outputs with ? question marks for some chars example ž returned as question mark ?

    opened by lianglee 4
  • Problem with Ù character

    Problem with Ù character

    Hi, first of all thanks for this library, its excellent.
    I have a problem with this database field:

    Come stai? (PIÙ RISPOSTE POSSIBILI)

    Useing this library I get this:

    Come stai? (PIÃ? RISPOSTE POSSIBILI)

    The database collation is latin1_swedish_ch while the charset of the table is utf8_general_ci.

    opened by giacomok 4
  • Some strings that were failed to Fix

    Some strings that were failed to Fix

    I have some strings with broken encoding:

    $testStr1 = <<<TEXT
    China. In 1953, Max̥s parents decided
    by ÌÕnew-ageÌÒ meditative
    Peter Max (American, b.1937) ÌÕFour KennedysÌÒ, screenprint in colors, 1989, signed in white pencil
    TEXT;
    
    echo Encoding::UTF8FixWin1252Chars($testStr1), "\n\n", Encoding::fixUTF8($testStr1), "\n\n";
    

    None of them were fixed:

    China. In 1953, Max̥s parents decided
    by ÌÕnew-ageÌÒ meditative
    Peter Max (American, b.1937) ÌÕFour KennedysÌÒ, screenprint in colors, 1989, signed in white pencil
    
    China. In 1953, Max?s parents decided
    by ÌÕnew-ageÌÒ meditative
    Peter Max (American, b.1937) ÌÕFour KennedysÌÒ, screenprint in colors, 1989, signed in white pencil
    

    Any idea?

    opened by Kostanos 4
Releases(v2.0.2)
  • v2.0.2(Oct 16, 2018)

  • v2.0.1(May 22, 2017)

  • v1.4(Sep 24, 2014)

    This version has no dependecies.

    It has some issues with Win1252 special characters like the Euro sign. If you want a version of ForceUTF8 that deals with the Euro character (and those that appear in red here), try the 2.0 version (which depends on iconv).

    Source code(tar.gz)
    Source code(zip)
Owner
Sebastián Grignoli
Sebastián Grignoli
Making multiple identical function calls has the same effect as making a single function call.

Making multiple identical function calls has the same effect as making a single function call.

李铭昕 4 Oct 16, 2021
Shortest Path - have a function ShortestPath (strArr) take strArr which will be an array of strings which models a non-looping Graph.

Have the function ShortestPath(strArr) take strArr which will be an array of strings which models a non-looping Graph

null 1 Feb 5, 2022
Performance fixes for magento 2 core.

magento2-performance-fixes Performance fixes for magento 2 core. Problem and solution's concept - briefly PHP / Magento doesn't support concurency req

Mariusz Łopuch 48 Dec 30, 2022
Fixes WordPress 5.9 global CSS styles specificity issues

Fixes WordPress 5.9 global CSS styles specificity issues

Oliver Juhas 3 Feb 22, 2022
YesilCMS is based on BlizzCMS and specifically adapted for VMaNGOS Core and includes new features and many bug fixes.

YesilCMS · YesilCMS is based on BlizzCMS and specifically adapted for VMaNGOS Core and includes new features and many bug fixes. Features In addition

yesilmen 12 Jan 4, 2023
PHP class for parsing user agent strings (HTTP_USER_AGENT).

PHP class for parsing user agent strings (HTTP_USER_AGENT). Includes mobile checks, bots and banned bots checks, browser types/versions and more. Based on browscap (via phpbrowscap), Mobile_Detect and ua-parser. Created for high traffic websites and fast batch processing.

Mikolaj Misiurewicz 44 Jul 26, 2022
WooCommerce function and class declaration stubs for static analysis.

WooCommerce Stubs This package provides stub declarations for WooCommerce functions, classes and interfaces. These stubs can help plugin and theme dev

PHP Stubs Library 54 Dec 27, 2022
An alternative Redis session handler for PHP featuring per-session locking and session fixation protection

RedisSessionHandler An alternative Redis session handler featuring session locking and session fixation protection. News phpredis v4.1.0 (released on

Marcel Hernandez 117 Oct 19, 2022
A simple but scalable FFA Practice Core featuring one Game Mode & Vasar PvP aspects.

A simple but scalable FFA Practice Core featuring one Game Mode & Vasar PvP aspects. An example of this Plugin can be found in-game at ganja.bet:19132!

null 6 Dec 7, 2022
The Current US Version of PHP-Nuke Evolution Xtreme v3.0.1b-beta often known as Nuke-Evolution Xtreme. This is a hardened version of PHP-Nuke and is secure and safe. We are currently porting Xtreme over to PHP 8.0.3

2021 Nightly Builds Repository PHP-Nuke Evolution Xtreme Developers TheGhost - Ernest Allen Buffington (Lead Developer) SeaBeast08 - Sebastian Scott B

Ernest Buffington 7 Aug 28, 2022
Base58 Encoding and Decoding Library for PHP

Base58 Encoding and Decoding Library for PHP Long Term Support Each major version of this library will be supported for 5 years after it's initial rel

Stephen Hill 69 Jan 1, 2023
BitTorrent library for encoding and decoding torrents in PHP language.

PHPBitTorrentLib BitTorrent library for PHP. This library been tested and works on PHP 7.4+, it originally was just a project to handle the process of

Lee Howarth 3 Dec 19, 2022
A bundle to handle encoding and decoding of parameters using OpenSSL and Doctrine lifecycle events.

SpecShaper Encrypt Bundle A bundle to handle encoding and decoding of parameters using OpenSSL and Doctrine lifecycle events. Features include: Master

Mark Ogilvie 48 Nov 4, 2022
Audit your PHP version for known CVEs and patches

PHP Version Audit PHP Version Audit is a convenience tool to easily check a given PHP version against a regularly updated list of CVE exploits, new re

Daniel 103 Dec 19, 2022
List of Magento extensions with known security issues.

Magento Vulnerability Database List of Magento 1 and 2 integrations with known security issues. Objective: easily identify insecure 3rd party software

Sansec 184 Dec 7, 2022
SPAM Registration Stopper is a Q2A plugin that prevents highly probable SPAM user registrations based on well-known SPAM checking services and other techniques

SPAM Registration Stopper [by Gabriel Zanetti] Description SPAM Registration Stopper is a Question2Answer plugin that prevents highly probable SPAM us

Gabriel Zanetti 2 Jan 23, 2022
A plugin for working with popular money libraries in Pest

This package is a plugin for Pest PHP. It allows you to write tests against monetary values provided by either brick/money or moneyphp/money using the same declarative syntax you're used to with Pest's expectation syntax.

Luke Downing 19 Oct 30, 2022
Standardized wrapper for popular currency rate APIs. Currently supports FixerIO, CurrencyLayer, Open Exchange Rates and Exchange Rates API.

?? Wrapper for popular Currency Exchange Rate APIs A PHP API Wrapper to offer a unified programming interface for popular Currency Rate APIs. Dont wor

Alexander Graf 24 Nov 21, 2022
The tool converts different error reporting standards for deep compatibility with popular CI systems (TeamCity, IntelliJ IDEA, GitHub Actions, etc).

JBZoo / CI-Report-Converter Why? Installing Using as GitHub Action Example GitHub Action workflow Available Directions Help description in terminal Co

JBZoo Toolbox 17 Jun 16, 2022