A simple library for dealing with docx word processed documents

Overview

WordCat

Limited manipulation of docx word processed documents

A simple php library for manipulation of docx word processed document; in particular the library is designed to allow content from one document to be inserted into another document, and it also provides some features allowing for searching and replacing content, and inserting new text and images.

Dependencies

This library requires DOMDocument and SimpleXML.

Features

This library was designed to solve some fairly specific issues I was having with (mentioning no names) another popular PHP library when creating new documents from a given template; As such there are a lot of features that simply aren't implemented as I have had no need for them. I have added a "limitations" section below to list particularly glaring features which you may want but are missing, but if a feature you want isn't in the bullet list below, take it as read that the library won't do it for you!

  • Open a docx document
  • Extract and read XML files from the document
  • Write XML files to the document
  • Manipulate XML within the document's files
  • Save docx, either overwriting the original file or as a new document
  • Search for text, either as a binary/plain text search or regular expression
  • Replace text, either as a binary/plain text search or regular expression
  • Insert paragraphs containing plain text
  • Insert image files with a specified size (aspect ratio is preserved while constraining the image within the specified size)
  • Insert the contents of one docx into another

Limitations

As this library was designed to solver some fairly specific issues, it will not do a lot of things you may require; I may or may not come to add features

  • You cannot create a new document from scratch; you need to load an existing docx file to work on
  • The library is naive; most operations require some knowledge of the internal XML structure of docx files
  • The abstractions provided are designed to perform specific tasks (see tests/example.php to get a feel for this); to be more suitable for general use they will likely need breaking changes so make sure you lock to a specific version if you're using this library now!

Installation

You can install this library using composer:

composer require stejaysulli/php-word-cat

It should be possible to use the library without composer, but this has not been tested and you will need to provide your own method to autoload the files in the src directory.

Usage

A nice example of all the basic features are available in tests/example.php - It is adviseable to check that out for proper usage details. Just so you can get a feel for the kind of code you'll be writing with WordCat, here's a brief example though:

<?php
use WordCat\WordCat;

$wordcat = WordCat::instance("example1.doc");

// Find XML elements containing the text "apple", "orange" or "pear"
$wordcat->findText("apple")
        ->andFindText("orange")
        ->andFindText("pear")
        ->forSearch(function($element) {
            // Print the text content of each element:
            echo "{$element->textContent}\n";
        });

// Get an array of all the matching XML elements (DOMNode objects):
$results = $wordcat->getSearch();

// Clear the search results:
$wordcat->clearSearch();

// Do the same, but use a regex this time, and make it case insensitive:
$wordcat->findRegex("/(apple|orange|pear)/i")
        ->forSearch(function($element) {
            echo "{$element->textContent}\n";
        });

// Replace all instances of the word "wordcat" with "WordCat":
$wordcat->clearSearch()->replaceText("wordcat", "WordCat");

// Use a regex to replace anything that's formatted like a Y-m-d date with
// today's date:
$wordcat->clearSearch()->replaceRegex("/[0-9]{4}-[0-9]{2}-[0-9]{2}/", date("Y-m-d"));

// Insert some paragraphs after each paragraph containing some text:
$wordcat->findText("inserted paragraph next")
    ->forSearch(
        function($insertionPoint) use($wordcat, $imageFile) {
            // Insert a simple paragraph containing one text run:
            $p1 = $wordcat->insertParagraph(
                "This is the first inserted paragraph",
                $insertionPoint
            );
        }
    );


// Get an element to insert the document after:
$search = $wordcat->findText("INSERT DOCUMENT HERE")->getSearch();

// Open a document to insert
$wordcatSource = WordCat::instance("example2.docx");

// Check we have at least one insertion point:
if(count($search) > 0) {
    // Use the first insertion point:
    $wordcat->insertDocument($wordcatSource, $search[0]);
    // Remove the insertion point element to get rid of the "INSERT DOCUMENT
    // HERE" text:
    $wordcat->removeNode($search[0]);
}

// Save a new docx file
$wordcat->saveAs("test-output.docx");

// Close the WordCat instance
$wordcat->close();
$wordcatSource->close();
You might also like...
A PHP wrapper around Libreoffice for converting documents from one format to another.

Document Converter A PHP wrapper around Libreoffice for converting documents from one format to another. For example: Microsoft Word to PDF OpenOffice

A school platform to organize documents and files for students manged by teachers also user role management
A school platform to organize documents and files for students manged by teachers also user role management

A school platform to organize documents and files for students manged by teachers also user role management. The app is developed by the LARAVEL Framework.

Simple library that abstracts different metrics collectors. I find this necessary to have a consistent and simple metrics (functional) API that doesn't cause vendor lock-in.

Metrics Simple library that abstracts different metrics collectors. I find this necessary to have a consistent and simple metrics API that doesn't cau

Currency is a simple PHP library for current and historical currency exchange rates & crypto exchange rates. based on the free API exchangerate.host

Currency Currency is a simple PHP library for current and historical currency exchange rates & crypto exchange rates. based on the free API exchangera

[virion] It Implements Simple Using Form Library System

SimpleForm [virion] It Implements Simple Using Form Library System How To Use First, declare the use statement. use AidenKR\SimpleForm\SimpleForm; use

A simple library to increase the power of your environment variables.
A simple library to increase the power of your environment variables.

Environment A simple library (with all methods covered by php unit tests) to increase the power of your environment variables, contribute with this pr

A simple functional programming library for PHP
A simple functional programming library for PHP

bingo-functional A simple functional programming library for PHP. Requirement(s) PHP 7 or higher Rationale PHP, a language not commonly associated wit

A simple library for management the DOM (XML, HTML) document.

A simple library for management the DOM (XML, HTML) document.

Releases(v0.1.11)
Owner
Stephen J Sullivan
Stephen J Sullivan
PHP library for dealing with European VAT

ibericode/vat This is a simple PHP library to help you deal with Europe's VAT rules. Fetch VAT rates for any EU member state using ibericode/vat-rates

ibericode 389 Dec 31, 2022
This car rental project system project in PHP focuses mainly on dealing with customers regarding their car rental hours and certain transactions.

Car-Rental Online Car Rental Management System This car rental project system project in PHP focuses mainly on dealing with customers regarding their

Adarsh Kumar Singh 2 Sep 29, 2022
Static Web App to train Filipinos in using MS Word with the use of Filipino language

MS Word Filipino Isang static web application na layuning magturo ng paggamit ng MS Word sa wikang Filipino. Ito ay isang proyekto na bahagi ng panana

Jetsun Prince Torres 2 Sep 30, 2022
A web app for the resolution of a mobile game in wich you have 4 images and a list of letters, then a few boxes to fill with the word connecting the four images.

4images_1mot_solutions A web app for the resolution of a mobile game in wich you have 4 images and a list of letters, then a few boxes to fill with th

FOTSO Claude 3 Jan 13, 2022
PHP implementation of Rapid Automatic Keyword Exraction algorithm (RAKE) for extracting multi-word phrases from text

PHP implementation of Rapid Automatic Keyword Exraction algorithm (RAKE) for extracting multi-word phrases from text.

Assisted Mindfulness 7 Oct 19, 2022
A pure PHP library for reading and writing presentations documents

Branch Master : Branch Develop : PHPPresentation is a library written in pure PHP that provides a set of classes to write to different presentation fi

PHPOffice 1.2k Jan 2, 2023
JSONFinder - a library that can find json values in a mixed text or html documents, can filter and search the json tree, and converts php objects to json without 'ext-json' extension.

JSONFinder - a library that can find json values in a mixed text or html documents, can filter and search the json tree, and converts php objects to json without 'ext-json' extension.

Eboubaker Eboubaker 2 Jul 31, 2022
Skosmos is a web-based tool providing services for accessing controlled vocabularies, which are used by indexers describing documents and searchers looking for suitable keywords.

Skosmos is a web-based tool providing services for accessing controlled vocabularies, which are used by indexers describing documents and searchers looking for suitable keywords.

National Library of Finland 195 Dec 24, 2022
Algerian code generator for invoices, quotes or any commercial documents

Algerian invoice code generator The library is useful to generate code for invoices, quotes or any commercial transaction document. Goal Is to provide

Hippone Consulting 7 Jul 19, 2021
Json-normalizer: Provides generic and vendor-specific normalizers for normalizing JSON documents

json-normalizer Provides generic and vendor-specific normalizers for normalizing JSON documents. Installation Run $ composer require ergebnis/json-nor

null 64 Dec 31, 2022