PHP class for parsing user agent strings (HTTP_USER_AGENT).

Overview

See changelog file for a list of changes.

Information about performance and scaling of UserAgentInfo.

UserAgentInfo uses other project to get the data it needs. See list of those projects.

UserAgentInfo

PHP class for parsing user agent strings (HTTP_USER_AGENT). Includes mobile checks, bots and banned bots checks, browser types/versions and more. Based on browscap (via phpbrowscap), Mobile_Detect and ua-parser. Created for high traffic websites and fast batch processing.

This class doesn't use any php.ini settings, so it can be deployed without changing server configuration.

please note:

It's a new project and there are still some major things to do (see: todo list and the issues list).

Why another user agent detection class?

This project was crated because I couldn't find one script that would give me all the information I needed from HTTP_USER_AGENT strings.

I was using browscap to identify bots, ban users, get information about browsers and user OS for internal monitoring, and some random scripts to detect and redirect mobile devices. In addition to that I had to add my own user agents to browscap detection because the project half died (it looks like it will be back on track soon though).

What's the aim of this project?

  • To retrieve all important information from user agent - so you won't have to user more than one script to parse your user agents for different purposes.

  • To work fast on enterprise level websites with high traffic - to achieve that all the required information is retrieved in one go and cached, not (as in some other projects) retrieved on demand. The class also leverages PHP opcaches (bytecode caches).

  • To provide a single up to date source of user agent information - I'm going to update this project as long as I need it, so it should be current for quite some time. When you use UserAgentInfo you don't need to update any source files or use php.ini directive for browscap. All the sources are integrated into the class.

Updates

Right now my goal is to update the project source parsers once a week. Updating more often is bad because this will reset the whole cache, so it's not something you want to do every day. Perhaps the updates could be done just twice a month, but that may be too long to wait for new user agents identification.

Installing

  1. Download.
  2. Choose what cache you want to use and either choose one of the existing cache classes (see cache/ directory) or write your own class that implements UaiCacheInterface and put it in cache/ directory.
  3. Change UserAgentInfoConfig::CACHE_* variables to reflect your cache choices
  4. require_once '/your_directory_structure/UserAgentInfo/UserAgentInfoPeer.class.php';
  5. Keep the classes up to date.

Optionally: Remove get_browser() support from your server and turn off updates for browscap.ini files. You won't have to use browscap via get_browser() any more.

Note: For best performance use fast caching system and opcache (bytecode cache).

Usage

UserAgentInfo is a class containing information about a single user agent string. It should provide means of identifying any data present in user agent string that can be used for practical purposes. Retrieve it by writing:

UserAgentInfoPeer::getMy() - to get current user info

UserAgentInfoPeer::getOther($arbitrary_user_agent) - to get info about any user agent

You can call those methods as many times as you want in your code, there is no need to cache the retrieved object, because it's already cached by the main class.

The most important values you can get are:

  • ->getUserAgentString() - return the source user agent string, using this is NOT the same as using HTTP_USER_AGENT because Mobile_Detect can take this value from other http header fields.

  • ->isMobile(), ->isMobileTablet(), ->isMobileAndroid(), ->isMobileAppleIos() - allows a full user device identification for mobile redirects:

if ($ua->isMobileAndroid() && !$ua->isMobileTablet()) echo 'Android Phone';
if ($ua->isMobileAndroid() && $ua->isMobileTablet()) echo 'Android Tablet';
if ($ua->isMobileAppleIos() && !$ua->isMobileTablet()) echo 'iPhone';
if ($ua->isMobileAppleIos() && $ua->isMobileTablet()) echo 'iPad';
if ($ua->isMobile() && !$ua->isMobileAndroid() && !$ua->isMobileAppleIos())
  echo 'Meh, some other mobile device';
  • ->isBanned() - it's a bot you may want to look at very closely and probably ban; it may be an e-mail scrapper, malicious bot with badly set user agent string, etc.

note:

->isbanned used to be a part of browscap project, but it was removed. Right now the only source of ban information is a list of user agents I've added by hand. It's not a very long list, but I'm working on adding the original list from browscap to the project too.

important note:

When adding a bot to ->isBanned() list we always verify what the bot is and use our best judgement on whether this bot should be universally banned or not. Having said that, the decisions are still arbitrary, so there may be sitiuations in which you wouldn't agree that a certain bot should be banned. So, keep that in mind when using ->isBanned() and if you think a mistake was made here feel free to report it as an issue.

  • ->isBot() - a very useful check to both save in your logs and serve slightly different content, for example disable dynamic images loading for spiders. Be careful, never hide or show any user readable content only to bots or you'll get banned from Google!

  • ->isIEVersion(...) - separate old Internet Explorer versions from other browsers, for example to show 'you are using an outdated browser' notice.

  • ->renderInfoBrowser(), ->renderInfoDevice(), ->renderInfoOs() - to get a human readable information about user browser, device or operating system.

  • ->renderInfoAll() - get all the above values in one string, very useful to include if you show information about given user for your internal purposes. For example when users report bugs to via forms on your website.

Performance and scaling

All performance tests are done in ua-speed-tests project - look there for details.

Testing the speed

Test was run in two modes on data from example user agent strings (2506 unique entries).

Each user agent was checked using UserAgentInfoPeer::getOther($user_agent_string).

The test was performed on Ubuntu virtual machine on a high end host machine.

Bulk parsing

You check all the user agents in one script (usually a cron script). It performs as follows:

With empty cache the average retrieval time is 6.0 ms per entry, with 99% of requests done in 10.9 ms.

When the cache is filled the retrieval time is 0.2 ms per entry, with 99% of requests done in 1.0 ms.

One per script

This is a typical case of checking user agent information on a website (usually on apache or nginx server). It performs as follows:

With empty cache the average retrieval time is 14.9 ms per entry, with 99% of requests done in 20.4 ms.

When the cache is filled the retrieval time is 2.5 ms per entry, with 99% of requests done in 3.7 ms.

So it's slower than bulk parsing. However, you can go down to almost the same time if you use opcache (bytecode cache) on your server. It's something you should have installed on your production server anyway, as it speed whole PHP by a lot.

Results with opcache on are:

With empty cache the average retrieval time is 7.3 ms per entry, with 99% of requests done in 13.0 ms.

When the cache is filled the retrieval time is 0.4 ms per entry, with 99% of requests done in 1.2 ms.

conclusion

As you can see, while the retrieval of information from the source parsers is quite fast, it's still much slower than when getting cached data. This means that the logical way to go is to use the cache. If you also have opcache turned on (or are using bulk mode) you will be limited only by the speed and performance of your cache, without almost any overhead.

If you take that approach, you will be able use all information available from user agent strings at will and you won't have to worry about performance problems. Even large bulk analysis of user agents won't be an issue (for example, you can preform cron checks on IP+browser pairs to check for bots).

Of course, a question remains, what's the cache hit ratio when you choose to use it for user agent string detection?

Test the cache hit ratio

My UserAgentInfo was running for about a week without any changes or cache resets on a set of websites with more than 1.5 million user visits per month. During that time:

  • There were an average of 2,478 script calls per minute (each script call uses UserAgentInfo), which gives a total of 24,978,240 calls.
  • I've accumulated 20,282 UserAgentInfo cached objects.
  • The total size of those objects when saved in cache is around 12 MB (around 620 bytes per object).

That means that the number of calls that did not use cache was below 0.09% (one in 1000) which is a great result. Moreover, the most popular user agent strings were cached right away.

As you can imagine, the number of unique user agents does not grow proportionally to the website traffic. The number of popular browsers is quite limited, so the larger your website gets the lower chance of seeing a new user agent. This means that the more users you server the more difference using UserAgentInfo makes.

Conclusion

As long as you want to just check if a browser is mobile or not, or do some other one simple check based on user agent string, if you know what you're doing, there is no need to use any advanced scripts.

However, UserAgentInfo delivers a very good average performance (limited by the performance of your cache system) while reliably providing as much information about the user as possible.

Switching to UserAgentInfo gives you many interesting opportunities you might have not thought about before.

An example:

By using UserAgentInfoPeer::getMy()->isBot() to completely disable session for all bots you can speed up your website and save a huge amount of disk operations. That's because bots (in general) do not use cookies and thus PHP will, by default, create a new session for each bot call that is made to your website. So it's entirely possible that more than 90% of your current sessions come from bot calls, and will never be used.

Relation to other projects

UserAgentInfo relies on multiple other projects to get its user agent information. Thanks to that it offers detection better than any other project that relies on its own parser, or only a single external parser.

The used projects are:

  • browscap - http://tempdownloads.browserscap.com/ - browscap contains a huge database of incredibly detailed specific user agents information but it sucks with newer user agents and sucks even more for mobile detection.

  • phpbrowscap (bc) - https://github.com/GaretJax/phpbrowscap - phpbrowscap is used to deliver the results from browscap source files. (phpbrowscap classes are not included, parsing is copied to BrowscapWrapper class)

  • Mobile_Detect (md) - https://github.com/serbanghita/Mobile-Detect - it detects mobile device types with very high precision.

  • ua-parser (uap) with data from BrowserScope - https://github.com/tobie/ua-parser - provides good generic information about all types of browsers so it's an excellent addition to find information about things browscap does not detect.

  • Some information is generated directly in UserAgentInfo. Currently those are two things:

    • Additional user agents identified in browscap format (see BrowscapWrapper class).
    • Browser and operating system architecture information (see self::parseArchitecture()).

Todo list

  • update source parsers once a week
  • should I standardize OS name and move Windows version to ->version?
  • should device family be changed to device manufacturer and version to name (same as in full browscap)?
  • request to add version number to uaparser json file
  • see which PHP version is required to run the script. PHP 5.0 would be the best, there is no need to push for 5.3. However, right now it may not be compatible with older PHP versions, as it was created on PHP 5.4.
  • Internet Explorer vs. Chrome Frame
  • Add batch retrieval from cache (batch save could also be implemented, but that seems kinda weird... although... O.o)
Comments
  • Can composer be used?

    Can composer be used?

    Hey thanks for the email about this library.

    Currently I need a portable solution. That means not needing to use php.ini to point to browsecap. Would it be possible to change browsecap to something other than an ini file or parse it independent from php.ini. This could allow other languages to get involved. Furthermore are you going to add this composer?

    discussion to be tested 
    opened by CMCDragonkai 11
  • How to update

    How to update

    There are notes in the readme mentioning weekly updates. It looks like this has not been updated in some time. Is there still activity on this?

    Thank you.

    user agents update 
    opened by kevinsteger 4
  • Project being maintained?

    Project being maintained?

    I have been using this library at v1.3 for some time and I see it has not been updated in a long time. Is this project more or less dead? If so are there any forks still being updated or other projects being maintained that offer similar functionality?

    One major sticking point for me is this library does not support composer which it badly needs to do moving forward (see #4).

    opened by Austinb 2
  • Add a website to check user agent using UserAgentInfo.

    Add a website to check user agent using UserAgentInfo.

    Several times now I needed to check one user agent using UserAgentInfo, and each time I had to modify my application to do that. Of course it's extremely simple, but still - it would be nicer if I had a website like user-agent-string.info that shows the UA results.

    will be added 
    opened by quentin389 2
  • Restore is_banned property for browscap source parser

    Restore is_banned property for browscap source parser

    Browscap removed ->is_banned property which I don't agree with, and I'm actually using this property in my work. It's active for few browser I've added but all the data researched by Gary when he had this property in his browscap project is gone from the files. I've obtained the latest browscap file version where this information is present and I'm going to extract information from there and put it into nice array to be added to the rest of browscap data when parsing. I've added raised an issue with browscap team asking them to restore the property.

    The list has to be verified before using.

    user agents update 
    opened by quentin389 2
  • Leverage opcache for uaparser.

    Leverage opcache for uaparser.

    Test if uaparser leverages opcache when loading the source JSON file. I'm not sure how opcaches work - will they cache such file? If not, then moving the source file to PHP will be a huuuuge performance boost for non bulk + opcache situation. Which is the most common scenarion on high traffic production servers.

    If such switch should be made - create a commit to uaparser project.

    will be added performance 
    opened by quentin389 2
  • Migrate browscap from php.ini solution to 100% PHP solution

    Migrate browscap from php.ini solution to 100% PHP solution

    This will accept a file from modified https://github.com/GaretJax/phpbrowscap and use its own parser to parse data in PHP (parser based on Garet Jax solution).

    will be added performance 
    opened by quentin389 2
  • Cache constantly regenerates because of different browscap versions...

    Cache constantly regenerates because of different browscap versions...

    The caching mechanism has a built-in version check for all the source parser files. Browscap version is checked using filesize(ini_get('browscap')), as it is a good enough approximation of the file contents.

    The problem is that when you're running a multi server configuration and each server uses its own browscap file, not the one supplied with this class (available in imports/browcap.ini), the files may differ on each server. If that happens and you have a cache that is common for all the servers then the cached UserAgentInfo objects will constantly be invalidated and regenerated, slowing down the servers, as parsing ua strings is a costly operation.

    This is not a bug, the class actually behaves correctly, because the parsing results using different browscap version will differ. However, it is an issue, because it's not obvious that this can happen. The proper way to fix it, without changing anything in UserAgentInfo project is to force your admins to keep the browscap files updated on all the servers, or to point the servers to UserAgentInfo internal browscap.ini file.

    However, since that requires changes on the servers, it may not always be an option, it may take long time to change, and so on and so forth.

    This will stop being an issue when I stop relying on browscap php.ini configuration and the PHP module for browscap. Using this module is not a good idea for other reasons (such as: required server configuration and parsing speed), so this issue is another important reason to start using browscap fully in PHP, as it is shown in https://github.com/GaretJax/phpbrowscap/

    duplicate performance 
    opened by quentin389 2
  • Memcached/Memcache/APC support

    Memcached/Memcache/APC support

    In regards to the https://github.com/quentin389/UserAgentInfo/issues/12 issue you've posted I've added Memcached/Memcache/APC caching mechanisms to stand behind UserAgentInfo.

    opened by ignasbernotas 1
  • Important user agent is not fully overriden.

    Important user agent is not fully overriden.

    User agent Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt) is only partially overriden from the local array and shows isBanned and Crawler as true, but still identifies as Internet Explorer instead of E-mail address crawler. That's because the override array is treated as part of browscap. However, I should create an array in the main class which overrides EVERYTHING. That won't be always useful, but will allow for correction of specific cases, like the above one.

    bug user agents update 
    opened by quentin389 1
  • get_browser comparison at README

    get_browser comparison at README

    The get_browser() function is cited but there are no information why to use a class instead a build-in function. Is interesting to inform user about performance, and about strategies — eg. a fast detection and log with get_browser(), and later using UserAgentInfo for report analyzing log or building a detailed database.

    opened by ppKrauss 0
Owner
Mikolaj Misiurewicz
Mikolaj Misiurewicz
Lightning Fast, Minimalist PHP User Agent String Parser.

Lightning Fast, Minimalist PHP User Agent String Parser.

Jesse Donat 523 Dec 21, 2022
uaDetect – A multi-language port of Browserscope's user agent parser

uaDetect is a lightweight for detecting mobile devices. It uses the User-Agent string combined with specific HTTP headers to detect the mobile environment.

Fadjrir Herlambang 1 Jan 7, 2022
PHP Class Encoding featuring popular Encoding::toUTF8() function --formerly known as forceUTF8()-- that fixes mixed encoded strings.

forceutf8 PHP Class Encoding featuring popular \ForceUTF8\Encoding::toUTF8() function --formerly known as forceUTF8()-- that fixes mixed encoded strin

Sebastián Grignoli 1.6k Dec 22, 2022
Elastic APM PHP Agent

Elastic APM Agent for PHP This is the official PHP agent for Elastic APM. The PHP agent enables you to trace the execution of operations in your appli

elastic 217 Dec 13, 2022
Magento 2 Module for parsing xlsx, xlsm and csv files from Excel

Magento 2 Spreadsheet Parser Facts Parse XLSX, XLSM and CSV Files from Excel Requirements PHP >= 7.0.* Magento >= 2.1.* Compatibility Magento >= 2.1 U

Stämpfli AG 9 Sep 24, 2020
Exploiting and fixing security vulnerabilities of an old version of E-Class. Project implemented as part of the class YS13 Cyber-Security.

Open eClass 2.3 Development of XSS, CSRF, SQLi, RFI attacks/defences of an older,vulnerable version of eclass. Project implemented as part of the clas

Aristi_Papastavrou 11 Apr 23, 2022
PHP Japanese string helper functions for converting Japanese strings from full-width to half-width and reverse. Laravel Rule for validation Japanese string only full-width or only half-width.

Japanese String Helpers PHP Japanese string helper functions for converting Japanese strings from full-width to half-width and reverse. Laravel Rule f

Deha 54 Mar 22, 2022
Dobren Dragojević 6 Jun 11, 2023
Decimal handling as value object instead of plain strings.

Decimal Object Decimal value object for PHP. Background When working with monetary values, normal data types like int or float are not suitable for ex

Spryker 16 Oct 24, 2022
Create executable strings using a fluent API.

command-builder A PHP class to build executable with using fluent API. Summary About Features Installation Examples Compatibility table Tests About I

Khalyomede 2 Jan 26, 2022
A comprehensive library for generating differences between two strings in multiple formats (unified, side by side HTML etc). Based on the difflib implementation in Python

PHP Diff Class Introduction A comprehensive library for generating differences between two hashable objects (strings or arrays). Generated differences

Chris Boulton 708 Dec 25, 2022
Shortest Path - have a function ShortestPath (strArr) take strArr which will be an array of strings which models a non-looping Graph.

Have the function ShortestPath(strArr) take strArr which will be an array of strings which models a non-looping Graph

null 1 Feb 5, 2022
Laminas\Text is a component to work on text strings

laminas-text This package is considered feature-complete, and is now in security-only maintenance mode, following a decision by the Technical Steering

Laminas Project 38 Dec 31, 2022
ICSGenerator - The module can generate basic ICS calendar strings and files

ICSGenerator The module can generate basic ICS calendar strings and files. The module simply extends WireData. It has these properties and default val

Timo Hausmann 4 Jun 25, 2022
Parse DSN strings into value objects to make them easier to use, pass around and manipulate

DSN parser Parse DSN strings into value objects to make them easier to use, pass around and manipulate. Install Via Composer composer require nyholm/d

Tobias Nyholm 77 Dec 13, 2022
Zend\Text is a component to work on text strings from Zend Framework

zend-text Repository abandoned 2019-12-31 This repository has moved to laminas/laminas-text. Zend\Text is a component to work on text strings. It cont

Zend Framework 31 Jan 24, 2021
Strings Package provide a fluent, object-oriented interface for working with multibyte string

Strings Package provide a fluent, object-oriented interface for working with multibyte string, allowing you to chain multiple string operations together using a more readable syntax compared to traditional PHP strings functions.

Glowy PHP 14 Mar 12, 2022
Replace, concat strings or change number fields permanently using Grid Options

It's Pimcore Bundle to replace ,concat strings or change number fields permanently using Grid Options. It will save replaced strings directly in object.

LemonMind.com 5 Aug 31, 2022
Allow any Discord user to sign in to your website and save their discord user information for later use.

Simple Discord SSO ( Single Sign-On ) Requires at least: 5.0 Tested up to: 5.8.3 Stable tag: 1.0.2 Requires PHP: 7.4 License: GPLv2 or later License U

null 2 Oct 7, 2022