PHP package that provides functions for calculating mathematical statistics of numeric data

Overview

PHP package for Statistics

Statistics PHP package

Latest Version on Packagist Total Downloads Static Code analysis Packagist License Packagist PHP Version Support GitHub last commit

Tests

PHP package that provides functions for calculating mathematical statistics of numeric data.

In this package I'm collecting some useful statistic functions. Once upon a time, I was playing with FIT files. A FIT file is a file where is collected a lot of information about your sport activities. In that file you have the tracking of your Hearth Rate, Speed, Cadence, Power etc. I needed to apply some statistic functions to understand better the numbers and the sport activity performance. I collected some functions like mean, mode, median, range, quantiles, first quartile ( or 25th percentile), third quartile (or 75th percentile), frequency table (cumulative, relative), standard deviation (population and sample), variance (population and sample) etc...

This package is inspired by the Python statistics module

Installation

You can install the package via composer:

composer require hi-folks/statistics

Usage

Stat class

This class provides methods for calculating mathematical statistics of numeric data. Stat class has methods to calculate an average or typical value from a population or sample like:

  • mean(): arithmetic mean or "average" of data;
  • median(): median or "middle value" of data;
  • medianLow(): low median of data;
  • medianHigh(): high median of data;
  • mode(): single mode (most common value) of discrete or nominal data;
  • multimode(): list of modes (most common values) of discrete or nominal data;
  • quantiles(): cut points dividing the range of a probability distribution into continuous intervals with equal probabilities;
  • thirdQuartile(): 3rd quartile, is the value at which 75 percent of the data is below it;
  • firstQuartile(): first quartile, is the value at which 25 percent of the data is below it;
  • pstdev(): Population standard deviation;
  • stdev(): Sample standard deviation;
  • pvariance(): variance for a population;
  • variance(): variance for a sample;
  • geometricMean(): geometric mean;
  • harmonicMean(): harmonic mean;
  • correlation(): the Pearson’s correlation coefficient for two inputs;
  • covariance(): the sample covariance of two inputs;
  • linearRegression(): return the slope and intercept of simple linear regression parameters estimated using ordinary least squares.

Stat::mean( array $data )

Return the sample arithmetic mean of the array $data. The arithmetic mean is the sum of the data divided by the number of data points. It is commonly called “the average”, although it is only one of many mathematical averages. It is a measure of the central location of the data.

use HiFolks\Statistics\Stat;
$mean = Stat::mean([1, 2, 3, 4, 4]);
// 2.8
$mean = Stat::mean([-1.0, 2.5, 3.25, 5.75]);
// 2.625

Stat::geometricMean( array $data )

The geometric mean indicates the central tendency or typical value of the data using the product of the values (as opposed to the arithmetic mean which uses their sum).

use HiFolks\Statistics\Stat;
$mean = Stat::geometricMean([54, 24, 36], 1);
// 36.0

Stat::harmonicMean( array $data )

The harmonic mean is the reciprocal of the arithmetic mean() of the reciprocals of the data. For example, the harmonic mean of three values a, b and c will be equivalent to 3/(1/a + 1/b + 1/c). If one of the values is zero, the result will be zero.

use HiFolks\Statistics\Stat;
$mean = Stat::harmonicMean([40, 60], null, 1);
// 48.0

You can also calculate harmonic weighted mean. Suppose a car travels 40 km/hr for 5 km, and when traffic clears, speeds-up to 60 km/hr for the remaining 30 km of the journey. What is the average speed?

use HiFolks\Statistics\Stat;
Stat::harmonicMean([40, 60], [5, 30], 1);
// 56.0

where:

  • 40, 60 : are the elements
  • 5, 30: are the weights for each element (first weight is the weight of the first element, the second one is the weight of the second element)
  • 1: is the decimal numbers you want to round

Stat::median( array $data )

Return the median (middle value) of numeric data, using the common “mean of middle two” method.

use HiFolks\Statistics\Stat;
$median = Stat::median([1, 3, 5]);
// 3
$median = Stat::median([1, 3, 5, 7]);
// 4

Stat::medianLow( array $data )

Return the low median of numeric data. The low median is always a member of the data set. When the number of data points is odd, the middle value is returned. When it is even, the smaller of the two middle values is returned.

use HiFolks\Statistics\Stat;
$median = Stat::medianLow([1, 3, 5]);
// 3
$median = Stat::medianLow([1, 3, 5, 7]);
// 3

Stat::medianHigh( array $data )

Return the high median of data. The high median is always a member of the data set. When the number of data points is odd, the middle value is returned. When it is even, the larger of the two middle values is returned.

use HiFolks\Statistics\Stat;
$median = Stat::medianHigh([1, 3, 5]);
// 3
$median = Stat::medianHigh([1, 3, 5, 7]);
// 5

Stat::quantiles( array $data, $n=4, $round=null )

Divide data into n continuous intervals with equal probability. Returns a list of n - 1 cut points separating the intervals. Set n to 4 for quartiles (the default). Set n to 10 for deciles. Set n to 100 for percentiles which gives the 99 cuts points that separate data into 100 equal sized groups.

use HiFolks\Statistics\Stat;
$quantiles = Stat::quantiles([98, 90, 70,18,92,92,55,83,45,95,88]);
// [ 55.0, 88.0, 92.0 ]
$quantiles = Stat::quantiles([105, 129, 87, 86, 111, 111, 89, 81, 108, 92, 110,100, 75, 105, 103, 109, 76, 119, 99, 91, 103, 129,106, 101, 84, 111, 74, 87, 86, 103, 103, 106, 86,111, 75, 87, 102, 121, 111, 88, 89, 101, 106, 95,103, 107, 101, 81, 109, 104], 10);
// [81.0, 86.2, 89.0, 99.4, 102.5, 103.6, 106.0, 109.8, 111.0]

Stat::firstQuartile( array $data, $round=null )

The lower quartile, or first quartile (Q1), is the value under which 25% of data points are found when they are arranged in increasing order.

use HiFolks\Statistics\Stat;
$percentile = Stat::firstQuartile([98, 90, 70,18,92,92,55,83,45,95,88]);
// 55.0

Stat::thirdQuartile( array $data, $round=null )

The upper quartile, or third quartile (Q3), is the value under which 75% of data points are found when arranged in increasing order.

use HiFolks\Statistics\Stat;
$percentile = Stat::thirdQuartile([98, 90, 70,18,92,92,55,83,45,95,88]);
// 92.0

Stat::pstdev( array $data )

Return the Population Standard Deviation, a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

use HiFolks\Statistics\Stat;
$stdev = Stat::pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75]);
// 0.986893273527251
$stdev = Stat::pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75], 4);
// 0.9869

Stat::stdev( array $data )

Return the Sample Standard Deviation, a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

use HiFolks\Statistics\Stat;
$stdev = Stat::stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75]);
// 1.0810874155219827
$stdev = Stat::stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75], 4);
// 1.0811

Stat::variance ( array $data)

Variance is a measure of dispersion of data points from the mean. Low variance indicates that data points are generally similar and do not vary widely from the mean. High variance indicates that data values have greater variability and are more widely dispersed from the mean.

For calculate variance from a sample:

use HiFolks\Statistics\Stat;
$variance = Stat::variance([2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5]);
// 1.3720238095238095

If you need to calculate the variance on the whole population and not just on a sample you need to use pvariance method:

use HiFolks\Statistics\Stat;
$variance = Stat::pvariance([0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25]);
// 1.25

Stat::covariance ( array $x , array $y )

Covariance, static method, returns the sample covariance of two inputs $x and $y. Covariance is a measure of the joint variability of two inputs.

$covariance = Stat::covariance(
    [1, 2, 3, 4, 5, 6, 7, 8, 9],
    [1, 2, 3, 1, 2, 3, 1, 2, 3]
);
// 0.75
$covariance = Stat::covariance(
    [1, 2, 3, 4, 5, 6, 7, 8, 9],
    [9, 8, 7, 6, 5, 4, 3, 2, 1]
);
// -7.5

Stat::correlation ( array $x , array $y )

Return the Pearson’s correlation coefficient for two inputs. Pearson’s correlation coefficient r takes values between -1 and +1. It measures the strength and direction of the linear relationship, where +1 means very strong, positive linear relationship, -1 very strong, negative linear relationship, and 0 no linear relationship.

$correlation = Stat::correlation(
    [1, 2, 3, 4, 5, 6, 7, 8, 9],
    [1, 2, 3, 4, 5, 6, 7, 8, 9]
);
// 1.0
$correlation = Stat::correlation(
    [1, 2, 3, 4, 5, 6, 7, 8, 9],
    [9, 8, 7, 6, 5, 4, 3, 2, 1]
);
// -1.0

Stat::linearRegression ( array $x , array $y )

Return the slope and intercept of simple linear regression parameters estimated using ordinary least squares. Simple linear regression describes relationship between an independent variable $x and a dependent variable $y in terms of linear function.

$years = [1971, 1975, 1979, 1982, 1983];
$films_total = [1, 2, 3, 4, 5] 
list($slope, $intercept) = Stat::linearRegression(
    $years,
    $films_total
);
// 0.31
// -610.18

What happens in 2022, according to the samples above?

round($slope * 2022 + $intercept);
// 17.0

Freq class

With Statistics package you can calculate frequency table. A frequency table is list the frequency of various outcomes in a sample. Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval.

Freq::frequencies( array $data )

use HiFolks\Statistics\Freq;

$fruits = ['🍈', '🍈', '🍈', '🍉','🍉','🍉','🍉','🍉','🍌'];
$freqTable = Freq::frequencies($fruits);
print_r($freqTable);

You can see the frequency table as an array:

Array
(
    [🍈] => 3
    [🍉] => 5
    [🍌] => 1
)

Freq::relativeFrequencies( array $data )

You can retrieve the frequency table in relative format (percentage):

$freqTable = Freq::relativeFrequencies($fruits, 2);
print_r($freqTable);

You can see the frequency table as an array with percentage of the occurrences:

Array
(
    [🍈] => 33.33
    [🍉] => 55.56
    [🍌] => 11.11
)

Freq::frequencyTableBySize( array $data , $size)

If you want to create a frequency table based on class (ranges of values) you can use frequencyTableBySize. The first parameter is the array, and the second one is the size of classes.

Calculate the frequency table with classes. Each group size is 4

$data = [1,1,1,4,4,5,5,5,6,7,8,8,8,9,9,9,9,9,9,10,10,11,12,12,
    13,14,14,15,15,16,16,16,16,17,17,17,18,18, ];
$result = \HiFolks\Statistics\Freq::frequencyTableBySize($data, 4);
print_r($result);
/*
Array
(
    [1] => 5
    [5] => 8
    [9] => 11
    [13] => 9
    [17] => 5
)
 */

Freq::frequencyTable()

If you want to create a frequency table based on class (ranges of values) you can use frequencyTable. The first parameter is the array, and the second one is the number of classes.

Calculate the frequency table with 5 classes.

$data = [1,1,1,4,4,5,5,5,6,7,8,8,8,9,9,9,9,9,9,10,10,11,12,12,
    13,14,14,15,15,16,16,16,16,17,17,17,18,18, ];
$result = \HiFolks\Statistics\Freq::frequencyTable($data, 5);
print_r($result);
/*
Array
(
    [1] => 5
    [5] => 8
    [9] => 11
    [13] => 9
    [17] => 5
)
 */

Statistics class

mean() . PHP_EOL; // Mean : 4.3333333333333 echo "Count : " . $stat->count() . PHP_EOL; // Count : 6 echo "Median : " . $stat->median() . PHP_EOL; // Median : 4.5 echo "First Quartile : " . $stat->firstQuartile() . PHP_EOL; // First Quartile : 2.5 echo "Third Quartile : " . $stat->thirdQuartile() . PHP_EOL; // Third Quartile : 5 echo "Mode : " . $stat->mode() . PHP_EOL; // Mode : 5">
$stat = HiFolks\Statistics\Statistics::make(
    [3,5,4,7,5,2]
);
echo $stat->valuesToString(5) . PHP_EOL;
// 2,3,4,5,5
echo "Mean              : " . $stat->mean() . PHP_EOL;
// Mean              : 4.3333333333333
echo "Count             : " . $stat->count() . PHP_EOL;
// Count             : 6
echo "Median            : " . $stat->median() . PHP_EOL;
// Median            : 4.5
echo "First Quartile  : " . $stat->firstQuartile() . PHP_EOL;
// First Quartile  : 2.5
echo "Third Quartile : " . $stat->thirdQuartile() . PHP_EOL;
// Third Quartile : 5
echo "Mode              : " . $stat->mode() . PHP_EOL;
// Mode              : 5

Calculate Frequency Table

Statistics packages has some methods for generating Frequency Table:

  • frequencies(): a frequency is the number of times a value of the data occurs;
  • relativeFrequencies(): a relative frequency is the ratio (fraction or proportion) of the number of times a value of the data occurs in the set of all outcomes to the total number of outcomes;
  • cumulativeFrequencies(): is the accumulation of the previous relative frequencies;
  • cumulativeRelativeFrequencies(): is the accumulation of the previous relative ratio.
use HiFolks\Statistics\Statistics;

$s = Statistics::make(
    [98, 90, 70,18,92,92,55,83,45,95,88,76]
);
$a = $s->frequencies();
print_r($a);
/*
Array
(
    [18] => 1
    [45] => 1
    [55] => 1
    [70] => 1
    [76] => 1
    [83] => 1
    [88] => 1
    [90] => 1
    [92] => 2
    [95] => 1
    [98] => 1
)
 */

$a = $s->relativeFrequencies();
print_r($a);
/*
Array
(
    [18] => 8.3333333333333
    [45] => 8.3333333333333
    [55] => 8.3333333333333
    [70] => 8.3333333333333
    [76] => 8.3333333333333
    [83] => 8.3333333333333
    [88] => 8.3333333333333
    [90] => 8.3333333333333
    [92] => 16.666666666667
    [95] => 8.3333333333333
    [98] => 8.3333333333333
)
 */

Testing

composer run test           Runs the test script
composer run test-coverage  Runs the test-coverage script
composer run format         Runs the format script
composer run static-code    Runs the static-code script
composer run all-check      Runs the all-check script

Changelog

Please see CHANGELOG for more information on what has changed recently.

Contributing

Please see CONTRIBUTING for details.

Security Vulnerabilities

Please review our security policy on how to report security vulnerabilities.

Credits

License

The MIT License (MIT). Please see License File for more information.

Comments
Releases(v0.2.1)
  • v0.2.1(Feb 22, 2022)

  • v0.2.0(Feb 21, 2022)

  • v0.1.7(Feb 19, 2022)

  • v0.1.6(Feb 17, 2022)

  • v0.1.5(Feb 5, 2022)

  • v0.1.4(Jan 30, 2022)

  • v0.1.3(Jan 29, 2022)

  • v0.1.2(Jan 28, 2022)

    • pstdev(): Population standard deviation
    • stdev(): Sample standard deviation
    • pvariance(): variance for a population
    • variance(): variance for a sample
    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Jan 27, 2022)

    • Create Freq class with static method for managing frequencies table
    • Create Stat class with static methods for basci statistic functions like: mean, mode, median, multimode...
    • Refactor Statistics class in order to use logic provided by Freq and Stat class
    • Create ArrUtil with some helpers/functions to manage arrays
    • Add CICD test for PHP 8.1
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Jan 8, 2022)

    Initial release with:

    • getMean()
    • getCount()
    • getMedian()
    • getLowerPercentile()
    • getHigherPercentile()
    • getMode()
    • getFrequencies(): a frequency is the number of times a value of the data occurs;
    • getRelativeFrequencies(): a relative frequency is the ratio (fraction or proportion) of the number of times a value of the data occurs in the set of all outcomes to the total number of outcomes;
    • getCumulativeFrequences(): is the accumulation of the previous relative frequencies.;
    • getCumulativeRelativeFrequencies(): is the accumulation of the previous relative ratio.
    Source code(tar.gz)
    Source code(zip)
A PHP package for calculating & tracking the Service Level Agreement completion timings

A PHP package for calculating & tracking the Service Level Agreement completion timings. Features ?? Easy schedule building ‼️ Defined breaches ?? Hol

Alex 26 Oct 5, 2022
Simple, image-based, mathematical captcha, with increasing levels of difficulty

simple-captcha Simple, image-based, mathematical captcha, with increasing levels of difficulty version 1.1.0 see also: ModelView a simple, fast, power

Nikos M. 6 Dec 18, 2022
Here is the top 100 PHP functions: it is the list of the most often used PHP native functions

Here is the top 100 PHP functions: it is the list of the most often used PHP native functions. If you are a PHP developer, you must know the Top 100 PHP Functions deeply.

Max Base 16 Dec 11, 2022
Magento-Functions - A Resource of Magento Functions

Magento-Functions A Resource of Magento Functions Table of Contents Category Product User Cart Checkout General Account [Working w/ URL's] (#urls) Cat

Bryan Littlefield 28 Apr 19, 2021
A Pocketmine-MP (PMMP) leaderboard plugin that shows player statistics on a website.

WebLeaderBoard A Pocketmine-MP (PMMP) leaderboard plugin that shows all sorts of statistics on a website. Setup Guide To start using the plugin, downl

ItsMax123 6 Apr 5, 2022
Statistics of server growth

Growth Server growth statistics plugin by NhanAZ for PocketMine-MP Contacts You can contact me directly through the platforms listed below Platform Co

Thành Nhân 1 Dec 18, 2021
Server growth statistics plugin by NhanAZ for PocketMine-MP

General Server growth statistics plugin by NhanAZ for PocketMine-MP Contacts You can contact me directly through the platforms listed below Platform C

NhanAZ's PocketMine-MP Plugins 1 Dec 18, 2021
Surftimer-Web-Stats is Web with surftimer statistics.

Surftimer-Web-Stats v2 Surftimer-Web-Stats is Official Web with statistics for Surftimer-Official. Features: Dashboard with TOP players and recent Rec

SurfTimer 15 Dec 11, 2022
This package provides a simple and intuitive way to work on the Youtube Data API. It provides fluent interface to Youtube features.

Laravel Youtube Client This package provides a simple and intuitive way to work on the Youtube Data API. It provides fluent interface to Youtube featu

Tilson Mateus 6 May 31, 2023
A package that provides `array_*` like functions for iterators.

The doekenorg/iterator-functions package provides a curated set of array_* like functions for iterators in PHP. This package is built to encourage developers to make more use of Iterators by simplifying common tasks.

Doeke Norg 67 Jun 24, 2022
This component provides a collection of functions/classes using the symfony/intl package when the Intl extension is not installed.

Symfony Polyfill / Intl: ICU This package provides fallback implementations when the Intl extension is not installed. It is limited to the "en" locale

Symfony 2.4k Jan 6, 2023
This project backports features found in the latest PHP versions and provides compatibility layers for some extensions and functions

This project backports features found in the latest PHP versions and provides compatibility layers for some extensions and functions. It is intended to be used when portability across PHP versions and extensions is desired.

Symfony 2.2k Dec 29, 2022
This component provides functions unavailable in releases prior to PHP 8.0.

This component provides functions unavailable in releases prior to PHP 8.0.

Symfony 1.5k Dec 29, 2022
Melek Berita Backend is a service for crawling data from various websites and processing the data to be used for news data needs.

About Laravel Laravel is a web application framework with expressive, elegant syntax. We believe development must be an enjoyable and creative experie

Chacha Nurholis 2 Oct 9, 2022
A utility package that helps inspect functions in PHP.

A utility package that helps inspect functions in PHP. This package provides some utilities for inspecting functions (callables) in PHP. You can use i

Ryan Chandler 14 May 24, 2022
Easy to use utility functions for everyday PHP projects. This is a port of the Lodash JS library to PHP

Lodash-PHP Lodash-PHP is a port of the Lodash JS library to PHP. It is a set of easy to use utility functions for everyday PHP projects. Lodash-PHP tr

Lodash PHP 474 Dec 31, 2022
PHP functions that help you validate structure of complex nested PHP arrays.

PHP functions that help you validate structure of complex nested PHP arrays.

cd rubin 7 May 22, 2022
Dobren Dragojević 6 Jun 11, 2023
A redacted PHP port of Underscore.js with additional functions and goodies – Available for Composer and Laravel

Underscore.php The PHP manipulation toolbelt First off : Underscore.php is not a PHP port of Underscore.js (well ok I mean it was at first). It's does

Emma Fabre 1.1k Dec 11, 2022