A toolkit for using self-hosted Natural Language Processing with Elasticsearch and WordPress

Overview

Natural Language Processing Tools for WordPress

A toolkit for using self-hosted Natural Language Processing in WordPress

This plugin is a Proof of Concept and not ready for production

What does it do?

This plugin creates a custom Ingestor in your Elasticsearch cluster, prepared to process the post_content from your posts using Open Source machine learning models created by the contributors to Apache OpenNLP. It also modifies your index mapping to account for the new entities extracted.

With the standard installation instructions, it will look for names of people and places, as well as dates, in any text processed, and save them to the Elasticsearch document representing your post. If the sync_to filter in step 4 is used, the entities extracted are saved back to the WordPress database as taxonomy terms and/or post meta.

What does it not do?

  • It does not train any Machine Learning models using your data.
  • It does not process your data in the webserver where WordPress is running.

This plugin is just a connector, Elasticsearch is doing all the heavy lifting.

Requirements:

Installation

  1. Follow the installation steps for the Elasticsearch OpenNLP Ingest Processor: install a processor version matching your Elasticsearch version, and don't forget to download the built-in modules. Do this in the server running Elasticsearch, not your webserver. e.g:
$ bin/elasticsearch-plugin install https://github.com/spinscale/elasticsearch-ingest-opennlp/releases/download/7.13.3.1/ingest-opennlp-7.13.3.1.zip
$ bin/ingest-opennlp/download-models

Configure elasticsearch.yml to read the modules:

ingest.opennlp.model.file.persons: en-ner-persons.bin
ingest.opennlp.model.file.dates: en-ner-dates.bin
ingest.opennlp.model.file.locations: en-ner-locations.bin
  1. Make sure ElasticPress is active and a post index has been created.
  2. Install and activate this plugin.
  3. Add the following code to your functions.php to map the entities extracted to any existing taxonomies (optional):
add_filter( 'wnlptools_entity_copy_to', array( $this, 'wnlptools_copy_to' ), 10, 2 );
/**
 * Example usage: maps any locations extracted using NLP to the Category taxonomy
 *
 * Extracted entities are saved in the `entities` key of the stored document in Elasticsearch
 * so `entities.locations` contains all locations found in the document. However, this content
 * only exists in Elasticsearch.
 *
 * With this method we are going to copy these locations to an existing taxonomy so they can be
 * saved back to WordPress as categories.
 *
 * @param string $to     current mapping, defaults to an empty string
 * @param string $entity the entity mapped to $to
 *
 * @return string
 */
function wnlptools_sync_to( string $to, string $entity ) {
    if ( 'locations' === $entity ) {
        return 'terms.category';
    }

    if ( 'persons' === $entity ) {
        return 'meta.persons';
    }

    if ( 'dates' === $entity ) {
        return 'meta.dates';
    }

    return $to;
}

Current Limitations

  • The plugin only works with bulk indexing enabled.
  • The ingester is only applied to the cluster upon activating the plugin. If it gets removed you need to deactivate and reactivate
You might also like...
Wordpress plugin to allow websites to sell and distribute NFTs through the Enjin platform

MyMeta Basket is the world's first plug-and-play Wordpress/Enjin/Ethereum integration that allows you to start selling blockchain assets through your website within minutes. All you need is Wordpress, MyMeta Basket, and an Enjin subscription.

Create WordPress themes with beautiful OOP code and the Twig Template Engine
Create WordPress themes with beautiful OOP code and the Twig Template Engine

Timber helps you create fully-customized WordPress themes faster with more sustainable code. With Timber, you write your HTML using the Twig Template Engine separate from your PHP files.

Bedrock is a modern WordPress stack that helps you get started with the best development tools and project structure.
Bedrock is a modern WordPress stack that helps you get started with the best development tools and project structure.

WordPress boilerplate with modern development tools, easier configuration, and an improved folder structure

An example starter theme and block-type plugin that use @wordpress/scripts for JS & CSS

Brad’s Boilerplate This repo contains one folder that is an example theme, and another folder that is an example block-type plugin. Both folders use t

Add subtitles into your WordPress posts, pages, custom post types, and themes. No coding required.
Add subtitles into your WordPress posts, pages, custom post types, and themes. No coding required.

Add subtitles into your WordPress posts, pages, custom post types, and themes. No coding required. Simply activate Subtitles and you're ready to go.

Zero-Config plugin to disable FLoC in your WordPress Website.
Zero-Config plugin to disable FLoC in your WordPress Website.

Disable FLoC by WP Munich A simple zero-config plugin to opt-out of Google FLoC. This plugin is made with love and brought to you by the folks of WP M

b5st – A Bootstrap 5 Starter Theme, for WordPress

b5st – A Bootstrap 5 Starter Theme, for WordPress Version 1.1 https://github.com/SimonPadbury/b5st b5st is a simple, Gutenberg-compatible WordPress st

酱茄企业官网小程序,酱茄专为中小企业开发的轻量级企业建站小程序(基于uni-app+wordpress),后台操作简单,维护方便,无需过多配置就能搭建一个企业小程序。
酱茄企业官网小程序,酱茄专为中小企业开发的轻量级企业建站小程序(基于uni-app+wordpress),后台操作简单,维护方便,无需过多配置就能搭建一个企业小程序。

一、小程序介绍 酱茄企业官网小程序,酱茄专为中小企业开发的轻量级企业建站小程序(基于uni-app + wordpress),后台操作简单,维护方便,无需过多配置就能搭建一个企业小程序。

Wrapping all composer vendor packages inside your own namespace. Intended for WordPress plugins

Wrapping all composer vendor packages inside your own namespace. Intended for WordPress plugins.

Owner
Ricardo Moraleida
Ricardo Moraleida
Free, open-source, self-hosted CMS platform based on the Laravel PHP Framework.

Winter is a Content Management System (CMS) and web platform whose sole purpose is to make your development workflow simple again. It was born out of

Winter CMS 1.1k Jan 3, 2023
Twill is an open source CMS toolkit for Laravel that helps developers rapidly create a custom admin console that is intuitive, powerful and flexible. /// Chat with us and others on Spectrum: https://spectrum.chat/twill

About Twill Twill is an open source Laravel package that helps developers rapidly create a custom CMS that is beautiful, powerful, and flexible. By st

AREA 17 3k Jan 6, 2023
wallabag is a self-hostable PHP application allowing you to not miss any content anymore

What is wallabag? wallabag is a self-hostable PHP application allowing you to not miss any content anymore. Click, save and read it when you can. It e

wallabag 7.7k Jan 4, 2023
(Hard) Fork of WordPress Plugin Boilerplate, actively taking PRs and actively maintained. Following WordPress Coding Standards. With more features than the original.

Better WordPress Plugin Boilerplate This is a Hard Fork of the original WordPress Plugin Boilerplate. The Better WordPress Plugin Boilerplate actively

Beda Schmid 46 Dec 7, 2022
Fully CMS - Multi Language Content Management System - Laravel

Fully CMS Laravel 5.1 Content Managment System not stable! Features Laravel 5.1 Bootstrap Authentication Sentinel Ckeditor Bootstrap Code Prettify Fil

Sefa Karagöz 479 Dec 22, 2022
A minimal boilerplate theme for WordPress using TailwindCSS, with PostCSS and Laravel Mix.

A minimal boilerplate theme for WordPress using TailwindCSS, with PostCSS and Laravel Mix.

Pixel Devs 74 Nov 25, 2022
Divide / Split your WordPress Blog visitors into 4 links by using Re-skinning URL splitter

Re-skinning URL splitter Tool Divide / Split your Wordpress Blog visitors into 4 links by using Re-skinning URL splitter Re-skinning URL Splitter Feat

Mohammed cha 72 Nov 30, 2022
A module allowing you to write your Processwire template using MJML and get a converted HTML output using MJML API.

PageMjmlToHtml A module allowing you to write your Processwire template using MJML and get a converted HTML output using MJML API. This is considered

Romain Cazier 7 Oct 5, 2022
Security, performance, marketing, and design tools — Jetpack is made by WordPress experts to make WP sites safer and faster, and help you grow your traffic.

Jetpack Monorepo This is the Jetpack Monorepo. It contains source code for the Jetpack plugin, the Jetpack composer packages, and other things. How to

Automattic 1.4k Jan 7, 2023
A Simple and Lightweight WordPress Option Framework for Themes and Plugins

A Simple and Lightweight WordPress Option Framework for Themes and Plugins. Built in Object Oriented Programming paradigm with high number of custom fields and tons of options. Allows you to bring custom admin, metabox, taxonomy and customize settings to all of your pages, posts and categories. It's highly modern and advanced framework.

Codestar 241 Dec 23, 2022