Import data from and export data to a range of different file formats and media

Overview

Ddeboer Data Import library

Build Status Scrutinizer Quality Score Code Coverage Latest Stable Version

This library has been renamed to PortPHP and will be deprecated. Please use PortPHP instead.

Introduction

This PHP library offers a way to read data from, and write data to, a range of file formats and media. Additionally, it includes tools to manipulate your data.

Features

  • Read from and write to CSV files, Excel files, databases, and more.
  • Convert between charsets, dates, strings and objects on the fly.
  • Build reusable and extensible import workflows.
  • Decoupled components that you can use on their own, such as a CSV and Excel reader and writer.
  • Well-tested code.

Documentation

Installation

This library is available on Packagist. The recommended way to install it is through Composer:

$ composer require ddeboer/data-import:@stable

Then include Composer’s autoloader:

require_once 'vendor/autoload.php';

For integration with Symfony2 projects, the DdeboerDataImportBundle is available.

Usage

Broadly speaking, you can use this library in two ways:

The workflow

Each data import revolves around the workflow and takes place along the following lines:

  1. Construct a reader.
  2. Construct a workflow and pass the reader to it, optionally pass a logger as second argument. Add at least one writer to the workflow.
  3. Optionally, add filters, item converters and value converters to the workflow.
  4. Process the workflow. This will read the data from the reader, filter and convert the data, and write the output to each of the writers. The process method also returns a Result object which contains various information about the import.

In other words, the workflow acts as a mediator between a reader and one or more writers, filters and converters.

Optionally you can skip items on failure like this $workflow->setSkipItemOnFailure(true). Errors will be logged if you have passed a logger to the workflow constructor.

Schematically:

use Ddeboer\DataImport\Workflow;
use Ddeboer\DataImport\Reader;
use Ddeboer\DataImport\Writer;
use Ddeboer\DataImport\Filter;

$reader = new Reader\...;
$workflow = new Workflow($reader, $logger);
$result = $workflow
    ->addWriter(new Writer\...())
    ->addWriter(new Writer\...())
    ->addFilter(new Filter\CallbackFilter(...))
    ->setSkipItemOnFailure(true)
    ->process()
;

The workflow Result

The Workflow Result object exposes various methods which you can use to decide what to do after an import. The result will be an instance of Ddeboer\DataImport\Result. It is automatically created and populated by the Workflow. It will be returned to you after calling the process() method on the Workflow

The Result provides the following methods:

//the name of the import - which is an optional 3rd parameter to
//the Workflow class. Returns null by default.
public function getName();

//DateTime instance created at the start of the import.
public function getStartTime();

//DateTime instance created at the end of the import.
public function getEndTime();

//DateInterval instance. Diff off the start + end times.
public function getElapsed();

//Count of exceptions which caught by the Workflow.
public function getErrorCount();

//Count of processed items minus the count of exceptions caught.
public function getSuccessCount();

//Count of items processed
//This will not include any filtered items or items which fail conversion.
public function getTotalProcessedCount();

//bool to indicate whether any exceptions were caught.
public function hasErrors();

//An array of exceptions caught by the Workflow.
public function getExceptions();

Example use cases:

  • You want to send an e-mail with the results of the import
  • You want to send a Text alert if a particular file failed
  • You want to move an import file to a failed directory if there were errors
  • You want to log how long imports are taking

Readers

Readers read data that will be imported by iterating over it. This library includes a handful of readers. Additionally, you can easily implement your own.

You can use readers on their own, or construct a workflow from them:

$workflow = new Workflow($reader);

ArrayReader

Reads arrays. Most useful for testing your workflow.

CsvReader

Reads CSV files, and is optimized to use as little memory as possible.

use Ddeboer\DataImport\Reader\CsvReader;

$file = new \SplFileObject('/path/to/csv_file.csv');
$reader = new CsvReader($file);

Optionally construct with different delimiter, enclosure and/or escape character:

$reader = new CsvReader($file, ';');

Then iterate over the CSV file:

foreach ($reader as $row) {
    // $row will be an array containing the comma-separated elements of the line:
    // array(
    //   0 => 'James',
    //   1 => 'Bond'
    //   etc...
    // )
}
Column headers

If one of your rows contains column headers, you can read them to make the rows associative arrays:

$reader->setHeaderRowNumber(0);

foreach ($reader as $row) {
    // $row will now be an associative array:
    // array(
    //   'firstName' => 'James',
    //   'lastName'  => 'Bond'
    //   etc...
    // )
}
Strict mode

The CSV reader operates in strict mode by default. If the reader encounters a row where the number of values differs from the number of column headers, an error is logged and the row is skipped. Retrieve the errors with getErrors().

To disable strict mode, set $reader->setStrict(false) after you instantiate the reader.

Disabling strict mode means:

  1. Any rows that contain fewer values than the column headers are simply padded with null values.
  2. Any additional values in a row that contain more values than the column headers are ignored.

Examples where this is useful:

  • Outlook 2010: which omits trailing blank values
  • Google Contacts: which exports more values than there are column headers
Duplicate headers

Sometimes a CSV file contains duplicate column headers, for instance:

id details details
1 bla more bla

By default, a DuplicateHeadersException will be thrown if you call setHeaderRowNumber(0) on this file. You can handle duplicate columns in one of three ways:

  • call setColumnHeaders(['id', 'details', 'details_2']) to specify your own headers
  • call setHeaderRowNumber with the CsvReader::DUPLICATE_HEADERS_INCREMENT flag to generate incremented headers; in this case: id, details and details1
  • call setHeaderRowNumber with the CsvReader::DUPLICATE_HEADERS_MERGE flag to merge duplicate values into arrays; in this case, the first row’s values will become: [ 'id' => 1, 'details' => [ 'bla', 'more bla' ] ].

DbalReader

Reads data through Doctrine’s DBAL. Your project should include Doctrine’s DBAL package:

$ composer require doctrine/dbal
use Ddeboer\DataImport\Reader\DbalReader;

$reader = new DbalReader(
    $connection, // Instance of \Doctrine\DBAL\Connection
    'SELECT u.id, u.username, g.name FROM `user` u INNER JOIN groups g ON u.group_id = g.id'
);

DoctrineReader

Reads data through the Doctrine ORM:

use Ddeboer\DataImport\Reader\DoctrineReader;

$reader = new DoctrineReader($entityManager, 'Your\Namespace\Entity\User');

ExcelReader

Acts as an adapter for the PHPExcel library. Make sure to include that library in your project:

$ composer require phpoffice/phpexcel

Then use the reader to open an Excel file:

use Ddeboer\DataImport\Reader\ExcelReader;

$file = new \SplFileObject('path/to/excel_file.xls');
$reader = new ExcelReader($file);

To set the row number that headers will be read from, pass a number as the second argument.

$reader = new ExcelReader($file, 2);

To read the specific sheet:

$reader = new ExcelReader($file, null, 3);

OneToManyReader

Allows for merging of two data sources (using existing readers), for example you have one CSV with orders and another with order items.

Imagine two CSV's like the following:

OrderId,Price
1,30
2,15
OrderId,Name
1,"Super Cool Item 1"
1,"Super Cool Item 2"
2,"Super Cool Item 3"

You want to associate the items to the order. Using the OneToMany reader we can nest these rows in the order using a key which you specify in the OneToManyReader.

The code would look something like:

$orderFile = new \SplFileObject("orders.csv");
$orderReader = new CsvReader($file, $orderFile);
$orderReader->setHeaderRowNumber(0);

$orderItemFile = new \SplFileObject("order_items.csv");
$orderItemReader = new CsvReader($file, $orderFile);
$orderItemReader->setHeaderRowNumber(0);

$oneToManyReader = new OneToManyReader($orderReader, $orderItemReader, 'items', 'OrderId', 'OrderId');

The third parameter is the key which the order item data will be nested under. This will be an array of order items. The fourth and fifth parameters are "primary" and "foreign" keys of the data. The OneToMany reader will try to match the data using these keys. Take for example the CSV's given above, you would expect that Order "1" has the first 2 Order Items associated to it due to their Order Id's also being "1".

Note: You can omit the last parameter, if both files have the same field. Eg if parameter 4 is 'OrderId' and you don't specify parameter 5, the reader will look for the foreign key using 'OrderId'

The resulting data will look like:

//Row 1
array(
    'OrderId' => 1,
    'Price' => 30,
    'items' => array(
        array(
            'OrderId' => 1,
            'Name' => 'Super Cool Item 1',
        ),
        array(
            'OrderId' => 1,
            'Name' => 'Super Cool Item 2',
        ),
    ),
);

//Row2
array(
    'OrderId' => 2,
    'Price' => 15,
    'items' => array(
        array(
            'OrderId' => 2,
            'Name' => 'Super Cool Item 1',
        ),
    )
);

Create a reader

You can create your own data reader by implementing the Reader Interface.

Writers

ArrayWriter

Resembles the ArrayReader. Probably most useful for testing your workflow.

CsvWriter

Writes CSV files:

use Ddeboer\DataImport\Writer\CsvWriter;

$writer = new CsvWriter();
$writer->setStream(fopen('output.csv', 'w'));

// Write column headers:
$writer->writeItem(array('first', 'last'));

$writer
    ->writeItem(array('James', 'Bond'))
    ->writeItem(array('Auric', 'Goldfinger'))
    ->finish();

DoctrineWriter

Writes data through Doctrine:

use Ddeboer\DataImport\Writer\DoctrineWriter;

$writer = new DoctrineWriter($entityManager, 'YourNamespace:Employee');
$writer
    ->prepare()
    ->writeItem(
        array(
            'first' => 'James',
            'last'  => 'Bond'
        )
    )
    ->finish();

By default, DoctrineWriter will truncate your data before running the workflow. Call disableTruncate() if you don't want this.

If you are not truncating data, DoctrineWriter will try to find an entity having it's primary key set to the value of the first column of the item. If it finds one, the entity will be updated, otherwise it's inserted. You can tell DoctrineWriter to lookup the entity using different columns of your item by passing a third parameter to it's constructor.

$writer = new DoctrineWriter($entityManager, 'YourNamespace:Employee', 'columnName');

or

$writer = new DoctrineWriter($entityManager, 'YourNamespace:Employee', array('column1', 'column2', 'column3'));

The DoctrineWriter will also search out associations automatically and link them by an entity reference. For example suppose you have a Product entity that you are importing and must be associated to a Category. If there is a field in the import file named 'Category' with an id, the writer will use metadata to get the association class and create a reference so that it can be associated properly. The DoctrineWriter will skip any association fields that are already objects in cases where a converter was used to retrieve the association.

PdoWriter

Use the PDO writer for importing data into a relational database (such as MySQL, SQLite or MS SQL) without using Doctrine.

use Ddeboer\DataImport\Writer\PdoWriter;

$pdo = new \PDO('sqlite::memory:');
$pdo->setAttribute(\PDO::ATTR_ERRMODE, \PDO::ERRMODE_EXCEPTION);

$writer = new PdoWriter($pdo, 'my_table');

ExcelWriter

Writes data to an Excel file. It requires the PHPExcel package:

$ composer require phpoffice/phpexcel
use Ddeboer\DataImport\Writer\ExcelWriter;

$file = new \SplFileObject('data.xlsx', 'w');
$writer = new ExcelWriter($file);

$writer
    ->prepare()
    ->writeItem(array('first', 'last'))
    ->writeItem(array('first' => 'James', 'last' => 'Bond'))
    ->finish();

You can specify the name of the sheet to write to:

$writer = new ExcelWriter($file, 'My sheet');

You can open an already existing file and add a sheet to it:

$file = new \SplFileObject('data.xlsx', 'a');   // Open file with append mode
$writer = new ExcelWriter($file, 'New sheet');

If you wish to overwrite an existing sheet instead, specify the name of the existing sheet:

$writer = new ExcelWriter($file, 'Old sheet');

ConsoleTableWriter

This writer displays items as table on console output for debug purposes when you start the workflow from the command-line. It requires Symfony’s Console component 2.5 or higher:

$ composer require symfony/console ~2.5
use Ddeboer\DataImport\Reader;
use Ddeboer\DataImport\Writer\ConsoleTableWriter;
use Symfony\Component\Console\Output\ConsoleOutput;
use Symfony\Component\Console\Helper\Table;

$reader = new Reader\...;
$output = new ConsoleOutput(...);

$table = new Table($output);

// Make some manipulations, e.g. set table style
$table->setStyle('compact');

$workflow = new Workflow($reader);
$workflow->addWriter(new ConsoleTableWriter($output, $table));

ConsoleProgressWriter

This writer displays import progress when you start the workflow from the command-line. It requires Symfony’s Console component:

$ composer require symfony/console
use Ddeboer\DataImport\Writer\ConsoleProgressWriter;
use Symfony\Component\Console\Output\ConsoleOutput;

$output = new ConsoleOutput(...);
$progressWriter = new ConsoleProgressWriter($output, $reader);

// Most useful when added to a workflow
$workflow->addWriter($progressWriter);

There are various optional arguments you can pass to the ConsoleProgressWriter. These include the output format and the redraw frequency. You can read more about the options here.

You might want to set the redraw rate higher than the default as it can slow down the import/export process quite a bit as it will update the console text after every record has been processed by the Workflow.

$output = new ConsoleOutput(...);
$progressWriter = new ConsoleProgressWriter($output, $reader, 'debug', 100);

Above we set the output format to 'debug' and the redraw rate to 100. This will only re-draw the console progress text after every 100 records.

The debug format is default as it displays ETA's and Memory Usage. You can use a more simple formatter if you wish:

$output = new ConsoleOutput(...);
$progressWriter = new ConsoleProgressWriter($output, $reader, 'normal', 100);

CallbackWriter

Instead of implementing your own writer, you can use the quick solution the CallbackWriter offers:

use Ddeboer\DataImport\Writer\CallbackWriter;

$workflow->addWriter(new CallbackWriter(function ($row) use ($storage) {
    $storage->store($row);
}));

AbstractStreamWriter

Instead of implementing your own writer from scratch, you can use AbstractStreamWriter as a basis, implemented the writeItem method and you're done:

use Ddeboer\DataImport\Writer\AbstractStreamWriter;

class MyStreamWriter extends AbstractStreamWriter
{
    public function writeItem(array $item)
    {
        fputs($this->getStream(), implode(',', $item));
    }
}

$writer = new MyStreamWriter(fopen('php://temp', 'r+'));
$writer->setCloseStreamOnFinish(false);

$workflow->addWriter(new MyStreamWriter());
$workflow->process();

$stream = $writer->getStream();
rewind($stream);

echo stream_get_contents($stream);

StreamMergeWriter

Suppose you have 2 stream writers handling fields differently according to one of the fields. You should then use StreamMergeWriter to call the appropriate Writer for you.

The default field name is discr but could be changed with the setDiscriminantField() method.

use Ddeboer\DataImport\Writer\StreamMergeWriter;

$writer = new StreamMergeWriter();

$writer->addWriter('first writer', new MyStreamWriter());
$writer->addWriter('second writer', new MyStreamWriter());

Create a writer

Build your own writer by implementing the Writer Interface.

Filters

A filter decides whether data input is accepted into the import process.

CallbackFilter

The CallbackFilter wraps your callback function that determines whether data should be accepted. The data input is accepted only if the function returns true.

use Ddeboer\DataImport\Filter\CallbackFilter;

// Don’t import The Beatles
$filter = new CallbackFilter(function ($data) {
    return ('The Beatles' != $data['name']);
});

$workflow->addFilter($filter);

OffsetFilter

OffsetFilter allows you to

  • skip a certain amount of items from the beginning
  • process only specified amount of items (and skip the rest)

You can combine these two parameters to process a slice from the middle of the data, like rows 5-7 of a CSV file with ten rows.

OffsetFilter is configured by its constructor: new OffsetFilter($offset = 0, $limit = null). Note: $offset is a 0-based index.

use Ddeboer\DataImport\Filter\OffsetFilter;

// Default implementation is to start from the beginning without maximum count
$filter = new OffsetFilter(0, null);
$filter = new OffsetFilter(); // You can omit both parameters

// Start from the third item, process to the end
$filter = new OffsetFilter(2, null);
$filter = new OffsetFilter(2); // You can omit the second parameter

// Start from the first item, process max three items
$filter = new OffsetFilter(0, 3);

// Start from the third item, process max five items (items 3 - 7)
$filter = new OffsetFilter(2, 5);

DateTimeThresholdFilter

This filter is useful if you want to do incremental imports. Specify a threshold DateTime instance, a column name (defaults to updated_at), and a DateTimeValueConverter that will be used to convert values read from the filtered items. The item strictly older than the threshold will be discarded.

use Ddeboer\DataImport\Filter\DateTimeThresholdFilter;
use Ddeboer\DataImport\ValueConverter\DateTimeValueConverter;

new DateTimeThresholdFilter(
    new DateTimeValueConverter(),
    new \DateTime('yesterday')
);

ValidatorFilter

It’s a common use case to validate the data before you save it to the database. This is exactly what the ValidatorFilter does. To use it, include Symfony’s Validator component in your project:

$ compose require symfony/validator

The ValidatorFilter works as follows:

use Ddeboer\DataImport\Filter\ValidatorFilter;

$filter = new ValidatorFilter($validator);
$filter->add('email', new Assert\Email());
$filter->add('sku', new Assert\NotBlank());

The default behaviour for the validator is to collect all violations and skip each invalid row. If you want to stop on the first failing row you can call ValidatorFilter::throwExceptions(), which throws a ValidationException containing the line number and the violation list.

Item converters

MappingItemConverter

Use the MappingItemConverter to add mappings to your workflow. Your keys from the input data will be renamed according to these mappings. Say you have input data:

$data = array(
    array(
        'foo' => 'bar',
        'baz' => array(
            'some' => 'value'
        )
    )
);

You can map the keys foo and baz in the following way:

use Ddeboer\DataImport\ItemConverter\MappingItemConverter;

$converter = new MappingItemConverter();
$converter
    ->addMapping('foo', 'fooloo')
    ->addMapping('baz', array('some' => 'else'));

$workflow->addItemConverter($converter)
    ->process();

Your output data will now be:

array(
    array(
        'fooloo' => 'bar',
        'baz'    => array(
            'else' => 'value'
        )
    )
);

NestedMappingItemConverter

Use the NestedMappingItemConverter to add mappings to your workflow if the input data contains nested arrays. Your keys from the input data will be renamed according to these mappings. Say you have input data:

$data = array(
    'foo'   => 'bar',
    'baz' => array(
        array(
            'another' => 'thing'
        ),
        array(
            'another' => 'thing2'
        ),
    )
);

You can map the keys another in the following way.

use Ddeboer\DataImport\ItemConverter\NestedMappingItemConverter;

$mappings = array(
    'foo'   => 'foobar',
    'baz' => array(
        'another' => 'different_thing'
    )
);

$converter = new NestedItemMappingConverter('baz');
$converter->addMapping($mappings);

$workflow->addItemConverter($converter)
    ->process();

Your output data will now be:

array(
    'foobar' => 'bar',
    'baz' => array(
        array(
            'different_thing' => 'thing'
        ),
        array(
            'different_thing' => 'thing2'
        ),
    )
);

Create an item converter

Implement ItemConverterInterface to create your own item converter:

use Ddeboer\DataImport\ItemConverter\ItemConverterInterface;

class MyItemConverter implements ItemConverterInterface
{
    public function convert($item)
    {
        // Do your conversion and return updated $item
        return $changedItem;
    }
}

CallbackItemConverter

Instead of implementing your own item converter, you can use a callback:

use Ddeboer\DataImport\ItemConverter\CallbackItemConverter;

// Use a fictional $translator service to translate each value
$converter = new CallbackItemConverter(function ($item) use ($translator) {
    foreach ($item as $key => $value) {
        $item[$key] = $translator->translate($value);
    }

    return $item;
});

Value converters

Value converters are used to convert specific fields (e.g., columns in database).

DateTimeValueConverter

There are two uses for the DateTimeValueConverter:

  1. Convert a date representation in a format you specify into a DateTime object.
  2. Convert a date representation in a format you specify into a different format.
Convert a date into a DateTime object.
use Ddeboer\DataImport\ValueConverter\DateTimeValueConverter;

$converter = new DateTimeValueConverter('d/m/Y H:i:s');
$workflow->addValueConverter('my_date_field', $converter);

If your date string is in a format specified at: http://www.php.net/manual/en/datetime.formats.date.php then you can omit the format parameter.

use Ddeboer\DataImport\ValueConverter\DateTimeValueConverter;

$converter = new DateTimeValueConverter();
$workflow->addValueConverter('my_date_field', $converter);
Convert a date string into a differently formatted date string.
use Ddeboer\DataImport\ValueConverter\DateTimeValueConverter;

$converter = new DateTimeValueConverter('d/m/Y H:i:s', 'd-M-Y');
$workflow->addValueConverter('my_date_field', $converter);

If your date is in a format specified at: http://www.php.net/manual/en/datetime.formats.date.php you can pass null as the first argument.

use Ddeboer\DataImport\ValueConverter\DateTimeValueConverter;

$converter = new DateTimeValueConverter(null, 'd-M-Y');
$workflow->addValueConverter('my_date_field', $converter);

DateTimeToStringValueConverter

The main use of DateTimeToStringValueConverter is to convert DateTime object into it's string representation in proper format. Default format is 'Y-m-d H:i:s';

use Ddeboer\DataImport\ValueConverter\DateTimeToStringValueConverter;

$converter = new DateTimeToStringValueConverter;
$converter->convert(\DateTime('2010-01-01 01:00:00'));  //will return string '2010-01-01 01:00:00'

ObjectConverter

Converts an object into a scalar value. To use this converter, you must include Symfony’s PropertyAccess component in your project:

$ composer require symfony/property-access
Using __toString()

If your object has a __toString() method, that value will be used:

use Ddeboer\DataImport\ValueConverter\ObjectConverter;

class SecretAgent
{
    public function __toString()
    {
        return '007';
    }
}

$converter = new ObjectConverter();
$string = $converter->convert(new SecretAgent());   // $string will be '007'
Using object accessors

If your object has no __toString() method, its accessors will be called instead:

class Villain
{
    public function getName()
    {
        return 'Bad Guy';
    }
}

class Organization
{
    public function getVillain()
    {
        return new Villain();
    }
}

use Ddeboer\DataImport\ValueConverter\ObjectConverter;

$converter = new ObjectConverter('villain.name');
$string = $converter->convert(new Organization());   // $string will be 'Bad Guy'

StringToObjectConverter

Looks up an object in the database based on a string value:

use Ddeboer\DataImport\ValueConverter\StringToObjectConverter;

$converter = new StringToObjectConverter($repository, 'name');
$workflow->addValueConverter('input_name', $converter);

CallbackValueConverter

Use this if you want to save the trouble of writing a dedicating class:

use Ddeboer\DataImport\ValueConverter\CallbackValueConverter;

$callable = function ($item) {
    return implode(',', $item);
};

$converter = new CallbackValueConverter($callable);
$output = $converter->convert(array('foo', 'bar')); // $output will be "foo,bar"

MappingValueConverter

Looks for a key in a hash you must provide in the constructor:

use Ddeboer\DataImport\ValueConverter\MappingValueConverter;

$converter = new MappingValueConverter(array(
    'source' => 'destination'
));

$converter->convert('source'); // destination
$converter->convert('unexpected value'); // throws an UnexpectedValueException

Examples

Import CSV file and write to database

This example shows how you can read data from a CSV file and write that to the database.

Assume we have the following CSV file:

event;beginDate;endDate
Christmas;20131225;20131226
New Year;20131231;20140101

And we want to write this data to a Doctrine entity:

namespace MyApp;

use Doctrine\ORM\Mapping as ORM;

/**
 * @ORM\Entity
 */
class Event
{
    /**
     * @ORM\Column()
     */
    protected $event;

    /**
     * @ORM\Column(type="datetime")
     */
    protected $beginDate;

    /**
     * @ORM\Column(type="datetime")
     */
    protected $endDate;

    public function setEvent($event)
    {
        $this->event = $event;
    }

    public function setBeginDate($date)
    {
        $this->beginDate = $date;
    }

    public function setEndDate($date)
    {
        $this->endDate = $date;
    }

    // And some getters
}

Then you can import the CSV and save it as your entity in the following way.

use Ddeboer\DataImport\Workflow;
use Ddeboer\DataImport\Reader\CsvReader;
use Ddeboer\DataImport\Writer\DoctrineWriter;
use Ddeboer\DataImport\ValueConverter\StringToDateTimeValueConverter;

// Create and configure the reader
$file = new \SplFileObject('input.csv');
$csvReader = new CsvReader($file);

// Tell the reader that the first row in the CSV file contains column headers
$csvReader->setHeaderRowNumber(0);

// Create the workflow from the reader
$workflow = new Workflow($csvReader);

// Create a writer: you need Doctrine’s EntityManager.
$doctrineWriter = new DoctrineWriter($entityManager, 'MyApp:Event');
$workflow->addWriter($doctrineWriter);

// Add a converter to the workflow that will convert `beginDate` and `endDate`
// to \DateTime objects
$dateTimeConverter = new StringToDateTimeValueConverter('Ymd');
$workflow
    ->addValueConverter('beginDate', $dateTimeConverter)
    ->addValueConverter('endDate', $dateTimeConverter);

// Process the workflow
$workflow->process();

Export to CSV file

This example shows how you can export data to a CSV file.

use Ddeboer\DataImport\Workflow;
use Ddeboer\DataImport\Reader\ArrayReader;
use Ddeboer\DataImport\Writer\CsvWriter;
use Ddeboer\DataImport\ValueConverter\CallbackValueConverter;

// Your input data
$reader = new ArrayReader(array(
    array(
        'first',        // This is for the CSV header
        'last',
        array(
            'first' => 'james',
            'last'  => 'Bond'
        ),
        array(
            'first' => 'hugo',
            'last'  => 'Drax'
        )
    ))
);

// Create the workflow from the reader
$workflow = new Workflow($reader);

// Add the writer to the workflow
$file = new \SplFileObject('output.csv', 'w');
$writer = new CsvWriter($file);
$workflow->addWriter($writer);

// As you can see, the first names are not capitalized correctly. Let's fix
// that with a value converter:
$converter = new CallbackValueConverter(function ($input) {
    return ucfirst($input);
});
$workflow->addValueConverter('first', $converter);

// Process the workflow
$workflow->process();

This will write a CSV file output.csv where the first names are capitalized:

first;last
James;Bond
Hugo;Drax

ArrayValueConverterMap

The ArrayValueConverterMap is used to filter values of a multi-level array.

The converters defined in the list are applied on every data-item's value that match the defined array_keys.

//...
$data = array(
    'products' => array(
        0 => array(
            'name' => 'some name',
            'price' => '€12,16',
        ),
        1 => array(
            'name' => 'some name',
            'price' => '€12,16',
        )
    )
);

// ...
// create the workflow and reader etc.
// ...

$workflow->addValueConverter(new ArrayValueConverterMap(array(
    'name' => array(new CharsetValueConverter('UTF-8', 'UTF-16')), // encode to UTF-8
    'price' => array(new CallbackValueConverter(function ($input) {
        return str_replace('€', '', $input); // remove € char
    }),
)));

// ..
// after filtering data looks as follows
$data = array(
    'products' => array(
        0 => array(
            'name' => 'some name', // in UTF-8
            'price' => '12,16',
        ),
        1 => array(
            'name' => 'some name',
            'price' => '12,16',
        )
    )
);

Running the tests

Clone this repository:

$ git clone https://github.com/ddeboer/data-import.git
$ cd data-import

Install dev dependencies:

$ composer install --dev

And run PHPUnit:

$ phpunit

License

DataImport is released under the MIT license. See the LICENSE file for details.

Comments
  • Introduce steps

    Introduce steps

    Hey guys!

    The current library is really cool but i think it is not really scalebale. If somone needs to filter before all validators are called and an another need the oposite. This make it really hard to use this library.

    My idea was to introduce steps. Now the library has a StepInterface which can modifiy/validate/whatever the item. Now can each developer decide, how his workflow looks like. Just instantiate a StepInterface and add them to the Workflow object.

    I also added a PriorityStepInterface. This interface have a getPriority and with this method you can prioritize steps. Please note, that the developer can override the priority in the addStep method.

    I think this will make the library more generic and easier to use.

    Please not, this is a PoC. Which means the code may not work and isn't finished yet.

    Please provide Feedback!

    opened by Baachi 70
  • No XML Reader

    No XML Reader

    I'm migrating a website from symfony 1 to 2, changing from MySQL to PostgreSQL in the process, and I'm willing to use this library to achieve this. mysqldump offers the possibility to dump to xml, but not to CSV.

    I implemented an xml reader that using @hakre 's XMLReaderIterator library (which offers the possiblity to read the xml chunk by chunk instead of loading the whole file in memory).

    Here is my implementation :

    <?php
    
    namespace MyProject\MyBundle\DdeboerDataImport;
    
    use Ddeboer\DataImport\Reader\ReaderInterface;
    
    class XmlReader implements ReaderInterface
    {
        private $filename;
        private $reader;
        private $fields;
    
        public function __construct($filename, $delimiter, $fields)
        {
            $this->filename = $filename;
            $this->fields   = $fields;
            $this->reader   = new \XmlReader();
            if (!is_file($this->filename)) {
                throw new \RunTimeException(sprintf(
                    'File "%s" not found.',
                    $this->filename
                ));
            }
            $this->reader->open($this->filename);
            $this->iterator = new \XmlElementIterator($this->reader, $delimiter);
        }
    
        // Iterator imlementation
    
        public function current()
        {
            $row = array();
    
            if ($this->iterator->current()->getSimpleXmlElement() != null) {
                foreach ($this->iterator->current()->getSimpleXmlElement() as $element) {
                    $row[(string) $element['name']] = (string) $element;
                }
            }
    
            return $row;
        }
    
        public function rewind()
        {
            $this->iterator->rewind();
        }
    
        public function next()
        {
            return $this->iterator->next();
        }
    
        public function valid()
        {
            return $this->iterator->valid();
        }
    
        public function key()
        {
            return $this->iterator->key();
        }
    
    
        // Countable implementation
    
        public function count()
        {
            return iterator_count($this->iterator);
        }
    
        public function getFields()
        {
            return $this->fields;
        }
    }
    

    Would you like me to contribute it ?

    opened by greg0ire 47
  • Move concrete implementations to separate packages

    Move concrete implementations to separate packages

    This package contains some concrete implementations for readers and writers. These implementations should be moved into separate packages.

    New packages could be:

    • ddeboer/data-import-doctrine
    • ddeboer/data-import-phpexcel
    • ddeboer/data-import-pdo
    • etc.

    Reason for this is the package structure. Currently the dependencies of the concrete implementations are only required in dev and are just suggested otherwise.

    Benefits:

    • no overhead for developers that does not need some concrete implementations
    • less conflicts for composer update etc. if e.g. PDO etc. is not available
    • stricter dependency management possible
    • easy to provide own implementations for other systems like propel or other ORMs
    question 
    opened by markuspoerschke 23
  • Return Result Object

    Return Result Object

    I was toying with the idea of return a Result object from the Workflow which could provide more information on what happened in the import.

    My motivation for this was that after one file finished importing, If it had any errors I want to move it to a failed folder, for someone to check later, or moving to an archive folder if successful.

    What do you guys think to this idea? Is there anything else we could add to it? I was thinking actually putting the errors in the Result object, but this could cause memory issues if there are lots of errors.

    If this is something you guys would consider, I'll write some tests and documentation for it

    PS. Not tested at all - I wanted to get a reaction first.

    opened by AydinHassan 23
  • Adding capability to process items with relationed entities in the DoctrineWriter

    Adding capability to process items with relationed entities in the DoctrineWriter

    Hi, I needed to process a csv containing items that are related with other entities in a OneToMany association, so I added the option to add multiple indexes to an entity and process them before persist.

    enhancement 
    opened by javierseixas 18
  • Interfaces

    Interfaces

    • Removes interface suffixes
    • Moves interfaces to the package namespace root (eg. Ddeboer\DataImport\Reader is IMO better than Ddeboer\DataImport\Reader\Reader)

    Be advised: This relies on #206. That should be merged first.

    opened by sagikazarmark 17
  • add setter for parameters

    add setter for parameters

    When using DI to build the PdoReader, it is not always to know what the parameters will be at construction time. For instance, if one parameter is a CLI parameter. The setter makes adding parameter after construction possible.

    opened by greg0ire 16
  • [Feature Request]Manage rejects

    [Feature Request]Manage rejects

    I think there should be a way to do something each time one of the filters rejects an item (e.g. put it in a rejects file that would be in the same format as the input file (+ one column if csv for an optional rejection reason).

    Here is how it could look : https://github.com/greg0ire/data-import/compare/reject

    opened by greg0ire 16
  • Small improvements

    Small improvements

    This PR actually started to be a small fix in a few docblocks and ended up...... well, you can see.

    These are just some basic improvents including:

    • removing comments like "Constructor"
    • removing method comments like: Get/Set "the same name as the variable itself"
    • removing trailing spaces
    • adding empty line to end of file
    • adding inheritdoc tag where possible (Based on examination of the code {@inheritdoc} is used more generally than {@inheritDoc} so I updated everything to the first version.)
    • fixing some PSR coding standard things
    • using sprintf where possible
    • using [] instead of array (PHP 5.4 array notation)

    Long list, but I have some further things on my list, which needs your feedback (@ddeboer, @Baachi)

    Private or protected properties

    There is a big inconsistency in the package related to variable visibility. Some classes use private, some use protected (while there is no indication why that specific visibility should be used). Generally I use protected (to allow extension) while I usually use decoration for example instead of inheritance. This is actually a really trivial one, but it seemed to worth mentioning it.

    Validator or ValidatorInterface

    The ValidatorFilter accepts a ValidatorInterface whereas the ValidatorStep only accepts a Validator instance. Any particular reason for that?

    Introduce ReaderFactory interface

    It seems natural to have an interface for that type of class as well

    Add .editorconfig

    I am a big fan of editorconfig which is supported by all major IDEs. How about adding one?

    Use integer type hint instead of int

    This is something that I argued about a lot with a few "PSR, PSR, PSR" (aka. do everything exactly the same way, any different logic is unacceptable) guy. I think we are writting software, thus the documentation should reflect that. int is a PHP keyword, the type name is integer. It is really unfortunate that in PHP integer is just s shortcut, the type is represented by the word int. Still, I suggest changing all occurences of int to integer (IIRC some IDEs use integer by default).

    Here come some BC breaking suggestions. Since the repo is going to be separated into a new organization (and keep in mind the steps PR is already a BC breaking one) I don't this is a problem.

    Move AbstractWriter logic into a trait

    Inheritance does not make sense here IMO.

    Remove fluent interface where possible

    Not sure if you are going to like this one. I think that fluent interfaces make sense in a builder like environment. See this post by @Ocramius.

    Remove Interface suffix

    This is another common pattern in the PHP community which I really see no point and dislike it. I know it is an internal standard for PSR guys, but... See this post by @mathiasverraes. It contains everything I could say against it.

    Use array typehint in StepInterface

    I am not sure about this one, but shouldn't the process method accept only arrays?

    opened by sagikazarmark 15
  • One to many reader

    One to many reader

    We required the ability to add two readers into the same workflow. We have Order and OrderLineData in two CSVs. This reader allows you to this by matching the join fields of two readers.

    The test class shows perfectly what we're trying to achieve here.

    opened by adam-paterson 14
  • DBALReader runs out of memory

    DBALReader runs out of memory

    We are reading large dataset (100K rows) and using DBALReader passing a connection object and the raw SQL to be used for select.

    When we run this workflow.process command though the system runs out of memory. Any suggestions of how we can handle this situation? is there a setting we can use to do the processing in batch mode?

    opened by ravindar 14
Releases(0.20.0)
Owner
David de Boer
Open sourcerer
David de Boer
Import/Export configuration data in Magento 2 via CLI.

ConfigImportExport This module provides new CLI commands for Magento 2 to import/export data in/from core_config_data. This module is inspired by the

semaio 135 Dec 9, 2022
Import/Export configuration data in Magento 2 via CLI.

ConfigImportExport This module provides new CLI commands for Magento 2 to import/export data in/from core_config_data. This module is inspired by the

semaio 117 Mar 23, 2022
Q2A plugin that allows users to import and export Q2A configuration

Configuration Manager [by Gabriel Zanetti] Description Configuration Manager is a Question2Answer plugin that allows users to import and export Q2A co

Gabriel Zanetti 1 Nov 22, 2021
Magento-bulk - Bulk Import/Export helper scripts and CLI utilities for Magento Commerce

Magento Bulk Bulk operations for Magento. Configuration Copy config.php.sample to config.php and edit it. Product Attribute Management List All Attrib

Bippo Indonesia 23 Dec 20, 2022
Improve default Magento 2 Import / Export features - cron jobs, CSV , XML , JSON , Excel

Improve default Magento 2 Import / Export features - cron jobs, CSV , XML , JSON , Excel , mapping of any format, Google Sheet, data and price modification, improved speed and a lot more!

Firebear Studio 173 Dec 17, 2022
Magento 2 - Improved Import / Export extension

Improve default Magento 2 Import / Export features - cron jobs, CSV , XML , JSON , Excel , mapping of any format, Google Sheet, data and price modification, improved speed and a lot more!

Firebear Studio 173 Dec 17, 2022
Gettext is a PHP (^7.2) library to import/export/edit gettext from PO, MO, PHP, JS files, etc.

Gettext Note: this is the documentation of the new 5.x version. Go to 4.x branch if you're looking for the old 4.x version Created by Oscar Otero http

Gettext 651 Dec 29, 2022
Multi-language field export/import tool for ProcessWire

Language field export/import for ProcessWire Typically the way you translate page field values in ProcessWire is to edit a page, view the text in one

Ryan Cramer 3 Aug 19, 2022
:globe_with_meridians: List of all countries with names and ISO 3166-1 codes in all languages and data formats.

symfony upgrade fixer • twig gettext extractor • wisdom • centipede • permissions handler • extraload • gravatar • locurro • country list • transliter

Saša Stamenković 5k Dec 22, 2022
NamelessMC is a free, easy to use & powerful website software for your Minecraft server, which includes a large range of features.

NamelessMC is a free, easy to use & powerful website software for your Minecraft server, which includes a large range of features

NamelessMC 519 Dec 31, 2022
PHP Project - Export your profile to vCard file

KamVCard PHP v1.00 Easy to have your own vCard file, fill in the form and hit the "Export" button UPDATE : Now you can add your picture too ! Preview

Med Reda Kamal 3 Nov 6, 2022
World countries - available in multiple languages, in CSV, JSON, PHP, SQL and XML formats

Constantly updated lists of world countries and their associated alpha-2, alpha-3 and numeric country codes as defined by the ISO 3166 standard, available in CSV, JSON , PHP, SQL and XML formats, in multiple languages and with national flags included; also available are the ISO 3166-2 codes of provinces/ states associated with the countries

Stefan Gabos 1k Dec 29, 2022
A comprehensive library for generating differences between two strings in multiple formats (unified, side by side HTML etc). Based on the difflib implementation in Python

PHP Diff Class Introduction A comprehensive library for generating differences between two hashable objects (strings or arrays). Generated differences

Chris Boulton 708 Dec 25, 2022
Neo Integrator adalah alat bantu untuk import data ke Neo Feeder dengan cara copy paste saja. bersifat free dan opensource. Semoga Bermanfaat.

Neo-Integrator Penerus dari SimpleFeeder, yang berubah jadi Excel2Feeder dan bertransformasi menjadi Neo-integrator Last Update : 16-05-2022 Capture :

null 9 Nov 26, 2022
LaraNx Seo enables your Laravel app to store SEO and social media meta tag data in database instead of your code

LaraNx Seo enables your Laravel app to store SEO and social media meta tag data in database instead of your code. Moving marketing data out of your code base and into your database where it is easily modified.

srg 13 Dec 29, 2022
Provide CSV, JSON, XML and YAML files as an Import Source for the Icinga Director and optionally ship hand-crafted additional Icinga2 config files

Icinga Web 2 Fileshipper module The main purpose of this module is to extend Icinga Director using some of it's exported hooks. Based on them it offer

Icinga 25 Sep 18, 2022
Import your Foursquare/Swarm checkins to your Micropub-enabled website

Swarm Checkins Import Import your Foursquare/Swarm checkins to your Micropub-enabled website Installation You'll need PHP and Composer to install this

Aaron Parecki 20 Dec 24, 2022
This is a class that allows you to import posts from Wix into WordPress.

Wix Posts to WordPress This is a class that allows you to import posts from Wix into WordPress. Requirements Please make sure you have the following i

Basework 16 Sep 8, 2022