A feed parser to normalise typical feed data into Atom-like constructs.

Overview

PHP5 Feed parser and normaliser

A feed parser to normalise typical feed data into Atom-like constructs. This parser supports / will support:

  • RSS 2.0 plus it's descendant 0.9x versions
  • RSS 1.0 (RDF Site Summary)
  • Atom (not yet)

It also supports namespaces, through extensions. So far, it supports the following namespaces:

  • iTunes
  • Yahoo!'s Media RSS
  • Feedburner
  • Dublin Core metadata (partial)
  • Syndication (from the OCS format)

The parser is just a layer on-top of the built-in SAX XML parser that keeps track of the element stack in a generic way. It offloads all of the real data work to PHP classes - one class per namespace. That means the parser itself can be extended to support any Feed based namespace.

The end result is a feed data structure where the three basic information elements in a feed -- title, link and summary -- are normalised to Atom-like equivalents, but all of the data from the XML feed is available within the data structure.

Also, when complete, the feed parser will allow feed and entry normalisation, so the actual contents of the feed structure can be tailored or simplified using Normalisation PHP classes.

In addition to this, domain-specific processing can be added so that sites that do things differently and oddly can be brought into line, and the end result is a feed data structure that's normalised to something more sane and easier for an application to consume.

List of issues

  • atom:links structure needs to be normalised to remove duplicates (such as duplicates of enclosures thanks to multiple namespaces)
  • where rss20:author is an email with a bracketed name, create a regex that will split the two up and populate both the atom:author name and email.
  • how to deal with attributes that are namespaced (like flickr:profile on rss20:author
  • how to deal with dublin core (or other external elements) when they are children of something other than feed and entry. And without dublin core having to know about other possible levels. Can we create a 'current parent' object that this info can be attached to? Something like $this->currentParent->{$elData->nsName} = $elData->text
  • how to capture RSS2.0's isPermaLink attribute on the rss20:guid field.
  • media:content - when video links don't supply a valid mime-type, but return an attribute with a value of 'video' or 'audio', how to map that adequately into a valid atom:link type.
  • A flag/option that normalises times into a user-specified timezone. At the moment, any conversions are made to GMT, which is a decent start, I guess.
  • How to handle invalid RFC-822 formats - do I write a custom method that gets called when we get 1 Jan 1970?
  • dc:creator on the Flickr RDF feed returns a bracketed website URL and an unbracketed name. That can be translated into name and url of atom:author.
  • When the rss20:author contains two people, whether to convert that into two atom:authors, and whether to remove the 'By' prefix on some rss20:author fields.
  • Need to type-check the title/summary type fields and process the content if needed.
  • deal with xml:base and xml:lang attributes in the Atom feed - at any point.
  • How to deal with proper XHTML content in an Atom Feed?
  • atom:source is being ignored for the moment.
You might also like...
Rinvex Bookable is a generic resource booking system for Laravel, with the required tools to run your SAAS like services efficiently

Rinvex Bookings is a generic resource booking system for Laravel, with the required tools to run your SAAS like services efficiently. It has a simple architecture, with powerful underlying to afford solid platform for your business.

Updated project with extra Features like WISHLIST, List Orders, add Reviews, updated routing, resolved search bug is available for Premium

Updated project with extra Features like WISHLIST, List Orders, add Reviews, updated routing, resolved search bug is available for Premium Projects We

Remindle is here to help you remember. Remember everything you want, by the means you prefer, in the frequency you like

platform Personal reminders About Remindle Remindle is a platform which helps you to remember all important events. You can set the means you’d like t

A mini social media like web app built using Laravel 8 & Vue JS 3

About Laravel Laravel is a web application framework with expressive, elegant syntax. We believe development must be an enjoyable and creative experie

Twitter like application made with Laravel in 10 hours. Demo at

Critter, A Twitter like application written with Laravel in under 10 hours by @msurguy Imagine Twitter is down again. It's dark outside, and how can y

This is a Reddit-like clone named Scroller, made for the project component of COSC 360 - Web Programming.

The COSC 360 Project Due Dates: See Milestone Dates Overview: The project is designed to help develop your skills for full stack development. With thi

This application gives you the ability to send a newsletter to multiple subscribers with use of SMTP or an external driver like Mailgun
This application gives you the ability to send a newsletter to multiple subscribers with use of SMTP or an external driver like Mailgun

Laravel Newsletter Laravel Newsletter is an open source project that can be used for sending newsletters to multiple subscribers, mailing lists, ... a

A tool to manage your families and friends recipes like a chef.
A tool to manage your families and friends recipes like a chef.

RecipeManager Api and Frontend to Manage your recipes. Written with Laravel and Vue.js. A tool to manage your families and friends recipes like a chef

Roundcube Webmail is a browser-based multilingual IMAP client with an application-like user interface.

Roundcube Webmail roundcube.net ATTENTION This is just a snapshot from the GIT repository and is NOT A STABLE version of Roundcube. It's not recommend

Owner
Chr1s HuntΞr
A strong man stands up for himself, a stronger man stands up for the others.
Chr1s HuntΞr
A PHP library to read and write feeds in JSONFeed, RSS or Atom format

feed-io feed-io is a PHP library built to consume and serve news feeds. It features: JSONFeed / Atom / RSS read and write support Feeds auto-discovery

null 236 Dec 22, 2022
RSS-Bridge is a PHP project capable of generating RSS and Atom feeds for websites that don't have one

RSS-Bridge is a PHP project capable of generating RSS and Atom feeds for websites that don't have one. It can be used on webservers or as a stand-alone application in CLI mode.

RSS Bridge Community 5.5k Dec 30, 2022
Add instagram feed to page from JSON Data

Custom Instagram Feed Add instagram feed to page from URL ?? Edit - As of 13th April 2021 - This code does not work. Solution is being looked into, ho

tdrayson 21 Aug 29, 2022
A Blogging Platform with a built-in Feed Aggregator. Built with AngularJS and Laravel.

ReMark ReMark is an open source publishing platform built with the informed content creator in mind. It works as: A blogging platform A feed aggregato

Ren 4 Nov 1, 2019
A simple RSS feed reader for Laravel 5

Laravel 5 - Feed Reader A simple RSS feed reader for Laravel 5 Features One command to read any RSS feed Different RSS feed profiles enabled Quick Sta

Andrew Judd 45 Oct 10, 2022
API for Laracasts Feed built with Lumen.

Lissandra Discontinued API for Laracasts Feed built with Lumen. Lissandra fetches the Laracasts XML RSS feed and converts it to JSON format. By doing

Laravelista 15 Oct 6, 2018
Simple-podcast-generator - 👉 A very simple way to host your podcast files and generate the RSS Podcast feed 🎙

Podcast RSS Feed Generator A very simple way to host your podcast files and generate the RSS Podcast feed ?? ?? Contents Summary Requirements Installa

♚ PH⑦ de Soria™♛ 11 Dec 2, 2022
Laravelium Feed package for Laravel.

Laravelium Feed package Laravelium Feed package for Laravel. Notes Dev branches are for development and are UNSTABLE (use on your own risk)! Installat

Laravelium 362 Nov 27, 2022
Infopanel is a simple tool getting some information from source. It works basically like a slider that shows only title, image, a little bit description and QR-Code for links.

Infopanel is a simple tool getting some information from source. It works basically like a slider that shows only title, image, a little bit description and QR-Code for links. It has its own GUI for the editing. The GUI provides a very simple role concept. This tool can be used for digital signage, Information panels, News or Events or similar.

null 4 Aug 22, 2022