A PHP spreadsheet reader (Excel XLS and XLSX, OpenOffice ODS, and variously separated text files) with a singular goal of getting the data out, efficiently

Overview

spreadsheet-reader is a PHP spreadsheet reader that differs from others in that the main goal for it was efficient data extraction that could handle large (as in really large) files. So far it may not definitely be CPU, time or I/O-efficient but at least it won't run out of memory (except maybe for XLS files).

So far XLSX, ODS and text/CSV file parsing should be memory-efficient. XLS file parsing is done with php-excel-reader from http://code.google.com/p/php-excel-reader/ which, sadly, has memory issues with bigger spreadsheets, as it reads the data all at once and keeps it all in memory.

Requirements:

Usage:

All data is read from the file sequentially, with each row being returned as a numeric array. This is about the easiest way to read a file:

<?php
	// If you need to parse XLS files, include php-excel-reader
	require('php-excel-reader/excel_reader2.php');

	require('SpreadsheetReader.php');

	$Reader = new SpreadsheetReader('example.xlsx');
	foreach ($Reader as $Row)
	{
		print_r($Row);
	}
?>

However, now also multiple sheet reading is supported for file formats where it is possible. (In case of CSV, it is handled as if it only has one sheet.)

You can retrieve information about sheets contained in the file by calling the Sheets() method which returns an array with sheet indexes as keys and sheet names as values. Then you can change the sheet that's currently being read by passing that index to the ChangeSheet($Index) method.

Example:

<?php
	$Reader = new SpreadsheetReader('example.xlsx');
	$Sheets = $Reader -> Sheets();

	foreach ($Sheets as $Index => $Name)
	{
		echo 'Sheet #'.$Index.': '.$Name;

		$Reader -> ChangeSheet($Index);

		foreach ($Reader as $Row)
		{
			print_r($Row);
		}
	}
?>

If a sheet is changed to the same that is currently open, the position in the file still reverts to the beginning, so as to conform to the same behavior as when changed to a different sheet.

Testing

From the command line:

php test.php path-to-spreadsheet.xls

In the browser:

http://path-to-library/test.php?File=/path/to/spreadsheet.xls

Notes about library performance

  • CSV and text files are read strictly sequentially so performance should be O(n);
  • When parsing XLS files, all of the file content is read into memory so large XLS files can lead to "out of memory" errors;
  • XLSX files use so called "shared strings" internally to optimize for cases where the same string is repeated multiple times. Internally XLSX is an XML text that is parsed sequentially to extract data from it, however, in some cases these shared strings are a problem - sometimes Excel may put all, or nearly all of the strings from the spreadsheet in the shared string file (which is a separate XML text), and not necessarily in the same order. Worst case scenario is when it is in reverse order - for each string we need to parse the shared string XML from the beginning, if we want to avoid keeping the data in memory. To that end, the XLSX parser has a cache for shared strings that is used if the total shared string count is not too high. In case you get out of memory errors, you can try adjusting the SHARED_STRING_CACHE_LIMIT constant in SpreadsheetReader_XLSX to a lower one.

TODOs:

  • ODS date formats;

Licensing

All of the code in this library is licensed under the MIT license as included in the LICENSE file, however, for now the library relies on php-excel-reader library for XLS file parsing which is licensed under the PHP license.

Comments
  • Weird text appears instead of readable texts.

    Weird text appears instead of readable texts.

    Hi!

    I've recently came across this library due to my frustration in PHPExcel not being able to read large (wide) XLSX files. Thank you.

    I've made it ran after fixing ZipArchive::extractTo(): Permission denied error on my Mac (just pass SpreadsheetReader() a 2nd parameter containing an array that includes "TempDir")..

    But print_r($Row) shows this:

    <br />
    <b>Warning</b>:  pathinfo() expects parameter 1 to be string, array given in <b>/Applications/XAMPP/xamppfiles/htdocs/spreadsheet-reader-master/SpreadsheetReader.php</b> on line <b>60</b><br />
    Array
    (
        [0] => PKÆK1E_rels/.rels­’ÏJ1‡ï}Š{w¶Dd³½ˆÐ›H}€˜Ìþa7™0u}{ƒZ©¥I~óÍ7CšÝfõŠœGŠFoªZ+ŒŽü{£Ÿ÷ë½kWÍ#ÎVJ$cʪÔÄlô ’n²0Ø\QÂX^:â`¥¹‡dÝd{„m]_ÿdèöˆ©öÞhÞûV‡÷„—°©ëF‡wä^F9ÑâW¢-÷(F/3¼OÏDSU N»l/wù{N(Ö[±àˆq¸T³Œ˜¿u<¹‡r?ç„®þs9¸Fþ¼’MéËhÕÀÑ'h?PKfª‚·à
        [1] => PKÆK1EdocProps/app.xmlQKÃ0…ßý%øº&­iÌFÚ!ˆOÊ«ì­Ääv‹´IHâØþ½QÁíÙÇÃ9|÷œ+ÖÇy*¢q¶EUIPV9mì®E¯ý£"&iµœœ… ¢uw%žƒó’Xd‚-Ú§äWGµ‡YÆ2Û6
        [2] => £³LY†vãhÜ
        [3] => õ9ƒM¸&„a8&°ôÂÿÑ/quHÿ…j§¾ûÅ·þä3¯½KrêÍ]E–Ÿ¥¸ó~2J¦<¿{4ï6?<LË›’”ôúIªÍËvØr60Z\$†\øT¬–ºáçœÊzT54·œhMÊg\ÓŠ%("ðå-Ïì¾PK7tNù†PKÆK1EdocProps/core.xml’Qo‚0Çß÷)HßK)¨¸ða‹O[b"fËÞšrb3(¤­¢ß~•9ãÃÞzýÿïwwí%‹c]yÐF6*EԐJ4…TeŠ6ùÏ‘g,W¯):A‹ì)-†•nZÐV‚ñH&Úí¬m!Fì æÆwåÄm£kn]¨KÒrñÍK aÌH
    )
    

    here is the code that i've used:

    <?php
        require('SpreadsheetReader.php');
        $Reader = new SpreadsheetReader('gale.xlsx', array('TempDir' => 'xl'));
        $Sheets = $Reader -> Sheets();
            $Reader -> ChangeSheet(1);
            foreach ($Reader as $Row)
            {
                print_r($Row);
            }
    ?>
    

    I am only interested in the 2nd sheet. The XLSX used to have a password. Other sheets are password protected from being edited. Some cells on the 2nd sheet points to a reference column on another Excel file or points to a column on a protected sheet. (Sorry, I don't know how it's formally called. When I click on the cell, a drop down appears). The sheet has a lot of styling because it is a report to the managements.

    I tried issue #38 but it only displayed question marks (?????) on some parts.

    Thank you!

    opened by roiji 6
  • issue with PHP spreadsheet reader

    issue with PHP spreadsheet reader

    I am facing one issue with PHP spreadsheet reader https://github.com/nuovo/spreadsheet-reader , it is reading all formats file except .xlsx , in case of .xlsx script was not running

    require($docroot.'/inc/filereader/excel_reader2.php');
    require($docroot.'/inc/SpreadsheetReader.inc.php');             
    $Spreadsheet = new SpreadsheetReader("filepath");
    

    When i am executing php script at my localhost, SpreadsheetReader reading .xlsx file and returning desired result, but when i am trying to run same code on some server then it is only executing for(.csv,.xls,.txt) but not for .xlsx.

    I have already checked folder and file permissions for above issue at server.

    opened by ekansh1 6
  • Cannot read files from another directory

    Cannot read files from another directory

    Code: require DIR.'/spreadsheet-reader/SpreadsheetReader.php';

    $ProductsReader = new SpreadsheetReader('C:\UniServerZ\www\new_prj\uploads\771168_20151223_054601_S.xlsx');

    foreach ($ProductsReader as $Row) { print_r($Row); }

    Above is the code i use to read XLSX files from a subfolder uploads. The scripts throws the below errors. But the script works fine when file is in the same folder as the php script.

    Errors: Warning: Invalid argument supplied for foreach() in C:\UniServerZ\www\new_prj\sp readsheet-reader\SpreadsheetReader_XLSX.php on line 371

    Warning: XMLReader::open(): Empty string supplied as input in C:\UniServerZ\www
    new_prj\spreadsheet-reader\SpreadsheetReader_XLSX.php on line 959

    Warning: XMLReader::read(): Load Data before trying to read in C:\UniServerZ\www \new_prj\spreadsheet-reader\SpreadsheetReader_XLSX.php on line 995 Array ( )

    Warning: XMLReader::read(): Load Data before trying to read in C:\UniServerZ\www \new_prj\spreadsheet-reader\SpreadsheetReader_XLSX.php on line 995

    opened by sarveshshejwadkar 5
  • getting junk in cells and rows for xls file

    getting junk in cells and rows for xls file

    i created an xls file with two words: "hello" in one column and "me" in another and saved. i tried to use the component to analyase it and got junk in many cells. i attached a picture of the log file i write the cell content into.

    also weird is that i had only two words in two columns, and i get junk on about 30 or 40 cells.

    my code is:

    require('modules/InboundEmail/spreadsheet-reader-master/php-excel-reader/excel_reader2.php');

    require('modules/InboundEmail/spreadsheet-reader-master/SpreadsheetReader.php');

    $Reader = new SpreadsheetReader($filePathInUpload);

    foreach ($Reader as $Row) { foreach($Row as $cell) { $GLOBALS['log']->fatal("cell = " . $cell); } }

    anyone have an idea?

    junk

    opened by rodnikosh 5
  • Dates Issue

    Dates Issue

    I'm so glad I spent ages integrating this into a data import system only to find out it renders date fields so they're unusable.

    Might be something worth fixing for other people. I'm just getting 5 digit integers.

    opened by joebirkin 5
  • ChangeSheet doesn't work

    ChangeSheet doesn't work

    I have the following code from the example:

    foreach ($Sheets as $Index => $Name) { echo 'Sheet #'.$Index.': '.$Name; $Reader->ChangeSheet($Index); echo "
    change sheet"; foreach ($Reader as $rowData) { print_r($rowData); } }

    output: Sheet #0: cash Sheet #1 : others

    put the printout of (print_r) gives twice the rows of Sheet #1:others.... ?? Weird??

    opened by mn1aC 4
  • Problem with Macro modules

    Problem with Macro modules

    Hi there,

    I noticed that when a XLS spreadsheet contains a Macro module, the said module will be treated as a data sheet by the reader. I did a quick print_r of the Sheets() method and ended up with:

    0 => Macro1 1 => Data

    The problem is that Macro1 contains no data whatsoever, which generates an error in the ChangeSheet() method on this line:

    $this -> ColumnCount = $this -> Handle -> sheets[$this -> CurrentSheet]['numCols'];

    The reason is that, because this sheet is actually a Macro module, it is not actually part of the

    $this->Handle->sheets

    So to prevent the error, I had to change the condition of the if statement in the ChangeSheet() method to:

    if (isset($this -> Sheets[$Index]) && isset( $this -> Handle -> sheets[$Index] ) )

    And now things stopped crashing. Though I had to add a little extra logic in my own programs to make sure I was actually on the first datasheet before starting treatment.

    I hope this helps.

    Osu

    opened by osuwariboy 4
  • unable to fetch data xlsx

    unable to fetch data xlsx

    Hi I tried using your class, I am getting the rows and columns but. I am unable to get the inside the cell. instead null string is coming.. I tried saving the test xls file as xlsx and the output i got is something like

    Array ( [0] => [1] => [2] => [3] => ) Array ( [0] => [1] => [2] => [3] => ) Array ( [0] => [1] => [2] => [3] => ) Array ( [0] => [1] => [2] => [3] => )

    Please let me know what could be the problem.

    opened by sunandchakradhar 3
  • Perfomance increase for xlsx

    Perfomance increase for xlsx

    Dramatically increased perfomance by implementing shared strings index for XLSX.

    Uses not much memory, but speeds up a lot. Tested use 17mb for 80000 rows, full traversal took 23sec instead of almost infinite with 1 second by row

    opened by ramzes642 3
  • Fix built in date format

    Fix built in date format

    Excel short date format is actually represented by the full year m-d-yyyy. Updating the build in date format to use yyy allows the php function strtotime to calculate the correct unix timestamp.

    opened by fcha 2
  • set active sheet

    set active sheet

    Hello friends,

    if i want to set set active sheet then where i need to change as i have file with multiple sheets and i just want to fetch data from only one file. i was checking but unable to find the solution for xlsx,xls and for csv parse.

    opened by dhimancontact 2
  • Deprecated: Implicit conversion from float xx.xxxx to int loses precision in excel_reader2.php on line 922

    Deprecated: Implicit conversion from float xx.xxxx to int loses precision in excel_reader2.php on line 922

    OS: Windows 11 PHP Version 8.1.6

    Error Mesage: Deprecated: Implicit conversion from float 65.03846153846153 to int loses precision in C:\xampp\htdocs\sati\import\excel_reader2.php on line 922

    public function __construct($file='',$store_extended_info=true,$outputEncoding='') {

      $this->_ole = new OLERead();
      $this->setUTFEncoder('iconv');
      if ($outputEncoding != '') { 
      	$this->setOutputEncoding($outputEncoding);
      }
      for ($i=1; $i<245; $i++) {
      	$name = strtolower(( (($i-1)/26>=1)?chr(($i-1)/26+64):'') . chr(($i-1)%26+65));  //line 922
      	$this->colnames[$name] = $i;
      	$this->colindexes[$i] = $name;
      }
      $this->store_extended_info = $store_extended_info;
      if ($file!="") {
      	$this->read($file);
      }
    

    }

    Anyone have ideia to fix the problem?

    opened by rafaelspfonseca 1
  • Fatal error: Uncaught ValueError: XMLReader::open(): Argument #1 ($uri) cannot be empty

    Fatal error: Uncaught ValueError: XMLReader::open(): Argument #1 ($uri) cannot be empty

    When users download XLSX from Google Sheets in xl\workbook.xml we see next items: <sheet name="Sheet 1" sheetId="1" state="visible" r:id="rId2"/> When we open this file and save in Office Excel its convert to: <sheet name="Sheet 1" sheetId="1" r:id="rId1"/>

    opened by Krezalis 0
  • Fatal error: Uncaught ValueError: XMLReader::open(): Argument #1 ($uri) cannot be empty

    Fatal error: Uncaught ValueError: XMLReader::open(): Argument #1 ($uri) cannot be empty

    I tried to upload xlsx Fatal error: Uncaught ValueError: XMLReader::open(): Argument #1 ($uri) cannot be empty in C:\MAMP\htdocs\excel9\library\SpreadsheetReader_XLSX.php:959 Stack trace: #0 C:\MAMP\htdocs\excel9\library\SpreadsheetReader_XLSX.php(959): XMLReader->open('') #1 C:\MAMP\htdocs\excel9\library\SpreadsheetReader.php(238): SpreadsheetReader_XLSX->rewind() #2 C:\MAMP\htdocs\excel9\excelUpload.php(24): SpreadsheetReader->rewind() #3 {main} thrown in C:\MAMP\htdocs\excel9\library\SpreadsheetReader_XLSX.php on line 959

    Here is my full code: https://coding-zon.blogspot.com/2022/04/import-excel-file-into-mysql-in-php.html

    opened by BoltUIX 1
  •  Warning:

     Warning: "continue" targeting switch is equivalent to "break"

    hi

    error on php 7.4

    <html>
    <body>
    <!--StartFragment--><font size="1" style="color: rgb(0, 0, 0); font-family: &quot;Times New Roman&quot;; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">
    
    ( ! ) Warning: "continue" targeting switch is equivalent to "break". Did you mean to use "continue 2"? in D:\wamp64\www\1400\excel-jadid\spreadsheet-reader-master\SpreadsheetReader_XLSX.php on line 468
    --
    
    
    1 | 0.0005 | 366328 | {main}( ) | ...\test.php:0
    2 | 0.0588 | 374992 | SpreadsheetReader->__construct( ) | ...\test.php:14
    3 | 0.0590 | 375440 | SpreadsheetReader::Load( ) | ...\SpreadsheetReader.php:165
    
    </font><!--EndFragment-->
    </body>
    </html>
    
    opened by parsibox 0
Composer addon to efficiently get installed packages' version numbers

Package Versions composer/package-versions-deprecated is a fully-compatible fork of ocramius/package-versions which provides compatibility with Compos

Composer 1.4k Dec 27, 2022
It's goal is to create my own platform where I can play various board games I like.

Card games The goal of this project is to create a social platform where players can communicate, create lobbies and play various board games. Current

Ondřej Mastík 2 Oct 29, 2021
The goal of this course is to give you a brief introduction to GitHub.

?? The Basics of GitHub ?? Course overview and learning outcomes The goal of this course is to give you a brief introduction to GitHub. We’ll also pro

DAW 1 Dec 13, 2021
Initiated by me, enhanced by us, created for us. This is the fork (public) version separated from my private diary repository.

diary public repository Initiated by me, enhanced by us, created for us. This is the fork (public) version separated from my private diary repository.

Weicheng Ao 3 Jul 30, 2022
Perch Dashboard app for exporting content to (Kirby) text files and Kirby Blueprint files

toKirby Perch Dashboard app for exporting content to (Kirby) text files and Kirby Blueprint files. You can easily install and test it in a few steps.

R. Banus 4 Jan 15, 2022
Laminas\Text is a component to work on text strings

laminas-text This package is considered feature-complete, and is now in security-only maintenance mode, following a decision by the Technical Steering

Laminas Project 38 Dec 31, 2022
Zend\Text is a component to work on text strings from Zend Framework

zend-text Repository abandoned 2019-12-31 This repository has moved to laminas/laminas-text. Zend\Text is a component to work on text strings. It cont

Zend Framework 31 Jan 24, 2021
NFC Reader with the libnfc written in PHP

What is libnfc-for-php? This library is a toy for me. You can read NFC with libnfc written in PHP. Requirements PHP 7.4 or later PHP FFI libnfc Suppor

memory 20 Aug 26, 2022
NFC Reader written in PHP

What is nfc-for-php? This library is a toy for me. You can read NFC with libnfc written in PHP. Requirements PHP 7.4+ PHP FFI libnfc 1.8.0 (if you use

memory 20 Aug 26, 2022
Bencodex reader/writer for PHP

Bencodex reader/writer for PHP This package implements Bencodex serialization format which extends Bencoding. Complianet with Bencodex 1.2. php > echo

Hong Minhee (洪 民憙) 3 Oct 20, 2021
The new, most powerful Comic Reader ever created by the human race. Reworked by an Otaku.

FoOlSlideX The new, most powerful Comic Reader ever created by the human race. Reworked by an Otaku. Requirements PHP greater than 7.0 and everything

saintly2k 21 Dec 21, 2022
Melek Berita Backend is a service for crawling data from various websites and processing the data to be used for news data needs.

About Laravel Laravel is a web application framework with expressive, elegant syntax. We believe development must be an enjoyable and creative experie

Chacha Nurholis 2 Oct 9, 2022
Improve default Magento 2 Import / Export features - cron jobs, CSV , XML , JSON , Excel

Improve default Magento 2 Import / Export features - cron jobs, CSV , XML , JSON , Excel , mapping of any format, Google Sheet, data and price modification, improved speed and a lot more!

Firebear Studio 173 Dec 17, 2022
Automatically load the next page of products in Magento. Easy to install and configure, this module works 100% out of the box with vanilla Magento 1.9.x and earlier.

Automatically load the next page of products in Magento. Easy to install and configure, this module works 100% out of the box with vanilla Magento 1.9.x and earlier.

Strategery 123 Nov 20, 2021
DiscordLookup | Get more out of Discord with Discord Lookup! Snowflake Decoder, Guild List with Stats, Invite Info and more...

DiscordLookup Get more out of Discord with Discord Lookup! Snowflake Decoder, Guild List with Stats, Invite Info and more... Website Getting Help Tool

Felix 69 Dec 23, 2022
This script allows to bypass Oracle Cloud Infrastructure 'Out of host capacity' error immediately when additional OCI capacity will appear in your Home Region / Availability domain.

Resolving Oracle Cloud "Out of Capacity" issue and getting free VPS with 4 ARM cores / 24GB of memory Very neat and useful configuration was recently

Alexander Hitrov 323 Jan 6, 2023
UpDown is an uptime monitor, We send out notifications when something's wrong.

About UpDown Updown is an uptime monitor, We send out notifications when something's wrong. Be the first to know that your website is down! Reliable m

Hussam Adil 4 May 8, 2022
SlimJim was born out of a need for a simple auto update script which would update multiple development/test environments every time someone

SlimJim WHY? SlimJim was born out of a need for a simple auto update script which would update multiple development/test environments every time someo

Jesal Gadhia 100 Apr 22, 2022
Sandbox for figuring out why the Alpine + Turbo bridge is broken

Podcaster A Turn-Key Podcasting Starter Kit for Statamic 3 Features This kit is deceptively simple – it may look like a 3 page site but there's a whol

Jack McDade 1 Mar 25, 2022