EmailReplyParser is a PHP library for parsing plain text email content, based on GitHub's email_reply_parser library written in Ruby

Last update: Jul 27, 2022

EmailReplyParser

GitHub Actions Total Downloads Latest Stable Version

EmailReplyParser is a PHP library for parsing plain text email content, based on GitHub's email_reply_parser library written in Ruby.

Installation

The recommended way to install EmailReplyParser is through Composer:

composer require willdurand/email-reply-parser

Usage

Instantiate an EmailParser object and parse your email:



use EmailReplyParser\Parser\EmailParser;

$email = (new EmailParser())->parse($emailContent);

You get an Email object that contains a set of Fragment objects. The Email class exposes two methods:

  • getFragments(): returns all fragments;
  • getVisibleText(): returns a string which represents the content considered as "visible".

The Fragment represents a part of the full email content, and has the following API:



$fragment = current($email->getFragments());

$fragment->getContent();

$fragment->isSignature();

$fragment->isQuoted();

$fragment->isHidden();

$fragment->isEmpty();

Alternatively, you can rely on the EmailReplyParser to either parse an email or get its visible content in a single line of code:

$email = \EmailReplyParser\EmailReplyParser::read($emailContent);

$visibleText = \EmailReplyParser\EmailReplyParser::parseReply($emailContent);

Known Issues

Quoted Headers

Quoted headers aren't picked up if there's an extra line break:

On 
   
    , 
    
      wrote:

> blah

    
   

Also, they're not picked up if the email client breaks it up into multiple lines. GMail breaks up any lines over 80 characters for you.

On 
   
    , 
    
     
wrote:
> blah

    
   

The above On ....wrote: can be cleaned up with the following regex:

getContent() );">
$fragment_without_date_author = preg_replace(
  '/\nOn(.*?)wrote:(.*?)$/si',
  "",
  $fragment->getContent()
);

Note though that we're search for "on" and "wrote". Therefore, it won't work with other languages.

Possible solution: Remove "[email protected]" lines...

Weird Signatures

Lines starting with - or _ sometimes mark the beginning of signatures:

Hello

--
Rick

Not everyone follows this convention:

Hello

Mr Rick Olson
Galactic President Superstar Mc Awesomeville
GitHub

**********************DISCLAIMER***********************************
* Note: blah blah blah                                            *
**********************DISCLAIMER***********************************

Strange Quoting

Apparently, prefixing lines with > isn't universal either:

Hello

--
Rick

________________________________________
From: Bob [[email protected]]
Sent: Monday, March 14, 2011 6:16 PM
To: Rick

Unit Tests

Setup the test suite using Composer:

$ composer install

Run it using PHPUnit:

$ ./vendor/bin/simple-phpunit

Contributing

See CONTRIBUTING file.

Credits

  • GitHub
  • William Durand

License

EmailReplyParser is released under the MIT License. See the bundled LICENSE file for details.

GitHub

https://github.com/willdurand/emailreplyparser
Comments
  • 1. Blockquote considered visible

    I receive quite a lot of emails with blockquoted text which is in fact the whole original email that's being replied to.

    Quickly looking at the source of this package, I don't think <blockquote> is being search for. Imho this should be a non-visible part.

    What do you think?

    Reviewed by philippejadin at 2021-05-04 07:32
  • 2. $fragment->getContent() removes newlines returning a modified fragment from the original text.

    Hey. i noticed that the parser "cleans" the text from new lines.

    This is problematic since it modifies the input.

    For example if you want to print the output to html nl2br will not work. This is a huge problem.

    Compare the fragment extracted:

    "2015-06-18 10:40 GMT+02:00 REDACTED : > Bounce! > > 2015-06-18 10:06 Cristobal Wetzig : >> > > >> > >> > Reply >> 2015-06-18 10:38 GMT+02:00 REDACTED : >> >> ok >> 2015-06-18 10:06 Centerpartiet :\n>>> Reply"
    

    From the original

    "Pong\r\n\r\n2015-06-18 10:40 GMT+02:00 REDACTED :\r\n> Bounce!\r\n>\r\n> 2015-06-18 10:06 Cristobal Wetzig :\r\n>> > >\r\n>> >\r\n>> > Reply\r\n>> 2015-06-18 10:38 GMT+02:00 REDACTED :\r\n>>\r\n>> ok\r\n>> 2015-06-18 10:06 REDACTED :\r\n>>> Reply"
    
    Reviewed by cristobal-wetzig at 2015-06-18 09:40
  • 3. Committing CRLF line separators to the Git repository

    Hi, I was attempting to commit the EmailReplyParser library in Git when I got the following error:

    You are about to commit CRLF line separators to the Git repository. It is recommended to set the core.autocrlf Git attribute to input and and avoid line separator issues.

    I've done a bit of a Google and I can't find any conclusive advice.

    Based on the limited understanding of email that I have, formatting and line endings are crucial, so I'd rather not proceed without some guidance first.

    Reviewed by OctaneInteractive at 2015-12-06 14:27
  • 4. Failed in parsing email thread

    I just found some real world scenario where this lib fails to preserve formatting. Basically the problem is where we have a whole email thread as a content. All new lines are replaced with one space here. I made in commit https://github.com/perajovic/EmailReplyParser/commit/79aebccd4dac497066b5a151dc19906b9b2025c6 failing test with an example. Here is also a diff from test:

    1) EmailReplyParser\Tests\Parser\EmailParserTest::testEmailThreadPreservesNewLines
    Failed asserting that two strings are equal.
    --- Expected
    +++ Actual
    @@ @@
    -'On Nov 21, 2014, at 10:18, John Doe <[email protected]> wrote:
    -
    -> Ok. Thanks.
    ->
    -> On Nov 21, 2014, at 9:26, Jim Beam <[email protected]> wrote:
    ->
    ->>> On Nov 20, 2014, at 11:03 AM, John Doe <[email protected]> wrote:
    +'On Nov 21, 2014, at 10:18, John Doe <[email protected]> wrote:  > Ok. Thanks. > > On Nov 21, 2014, at 9:26, Jim Beam <[email protected]> wrote: > >>> On Nov 20, 2014, at 11:03 AM, John Doe <[email protected]> wrote:
     >>>
     >>> if you take a look at a short video from attachment, why full-typed filename does not stay in CMD+T pane?
     >>> When I type last character, it is not shown anymore.
     >>
     >> We think we’ve tracked down the cause of this issue, write back if you see the issue after the next update. (Which will be out shortly.)
     >>
     >> --
     >> Jim Beam – Acme Corp
     >>
     >'
    

    I'm aware that with this line we can normalize some poorly formatted emails, but here this is not the case.

    I hope I can fix this issue later today, but if someone have any idea how to do this right now, please write a comment.

    Reviewed by perajovic at 2014-11-24 11:23
  • 5. EmailReplyParser::parseReply issue

    Hello,

    I just got a bug on EmailReplyParser::parseReply. In some specific case, the reply is truncated in the middle.

    I tried to reduce at maximum a case, here it is :

    Bonjour ​Test​,
    
    ​Text paragraph 1​
    
    
    Le paragraphe qui pose problème.
    
    ​Text paragraph 2​
    
    
    - Le 2021-04-09 23:25, ​Test a écrit :
    

    The correct result should be :

    Bonjour ​Test​,
    
    ​Text paragraph 1​
    
    
    Le paragraphe qui pose problème.
    
    ​Text paragraph 2​
    

    The current result is :

    Bonjour ​Test​,
    
            ​Text paragraph 1​
    

    Context :

    • This is french. If I remove the first "Le" : problem solved. If I remove "écrit" : problem solved.
    • Le paragraphe qui pose problème means The paragraph which provoke issue.
    • Le 2021-04-09 23:25, ​Test a écrit : means On 2021-04-09 23:25, ​Test wrote :

    So the issue is here :

    '/^\s*(Le(?:(?!^>*\s*Le\b|\bécrit:).){0,1000}écrit(\s|\xc2\xa0):)$/ms', // Le DATE, NAME <EMAIL> a écrit :
    

    If I change it to have the same format as the english work : problem half solved (not truncated).

    '/^\s*(Le(?:(?!^>*\s*Le\b|\bécrit:).){0,1000}écrit:)$/ms', // Le DATE, NAME <EMAIL> a écrit :
    

    But I think that (\s|\xc2\xa0) was here for some reason ? @Spone ?

    see also : https://github.com/willdurand/EmailReplyParser/commit/e888d795f277916ec62c2e812b9a10a9614682e8#diff-5e5f4da6cc02b271ae11752a0b5e45ecb842d913728f7b7d9756d88d9219a0f8 https://github.com/willdurand/EmailReplyParser/commit/b71d9983c07b0455cb5b9727b90642a32bc11ae0#diff-5e5f4da6cc02b271ae11752a0b5e45ecb842d913728f7b7d9756d88d9219a0f8

    Reviewed by Nuranto at 2021-04-10 15:03
  • 6. Update readme with latest Composer tips

    1. composer install --dev is the default for some time now and the --dev flag is deprecated.
    2. Composer 1.0.0-alpha.9 comes with an important update to composer require. When you don't specify a version, Composer would automatically choose the latest stable version according to SemVer.
    Reviewed by hkdobrev at 2014-12-24 09:48
  • 7. Removing the need to write quote regex backwards for custom quote header

    With this pull request we don't need to duplicate our custom quote regex by have to write it backward.. Regular expression is already annoying as it is. =x

    Let me know if you see any problem. =]

    Reviewed by creativej at 2013-05-16 00:45
  • 8. Fix regexp to handle datetime lines starting wih a few characters before `On...`

    This fixes #77

    The rule is more permissive for prefixes, however more strict and secure as I disabled the s modifier which allowed multiline for some reason.

    Important Note : I only fixed FR. I prepared (commented) a failing test for EN, which failed. But if I do the same change for EN regexp, other tests are failing and I don't want to mess up things, so I'll let you have a look at this.

    Reviewed by Nuranto at 2021-04-14 13:33
  • 9. Additional signature and quote catches

    Catches signature and quotes for Windows 10 Mail.

    I've probably made this too generalized, but I needed to catch signature and quotes coming from Mail for Windows 10. They look like this...

    My reply here.
    
    Sent from Mail for Windows 10
    
    From: John Doe
    Sent: Tuesday, November 13, 2018 12:30 PM
    To: [email protected]
    Cc: [email protected]
    Subject: Subject Line Here
    
    The quoted message.
    

    Unfortunately, the existing regex's miss both the signature and the quote. The signature is missed because it lacks a "my" after "Sent from", so I just went ahead and removed the "my" requirement. The quote is missed because the "From:" line only includes a name and lacks an email address in brackets. I loosened it up to only bother checking for a line beginning with "From:".

    This is probably related to Issue #66. I haven't run the unit tests, and I'm pretty sure these fixes loosen things up too much, so not exactly suggesting this be merged in as-is. Hoping someone better at regex's comes up with a more proper fix, honestly :)

    Reviewed by gazugafan at 2018-11-14 01:16
  • 10. Improve support

    Hello,

    I write a additionnal regex because a mail does not be good parsed. This is a yahoo mail in French with space before HeaderQuoted and between keyword (here De) and colon ( : )

    I had a test from the email received which don't work. I have of course anonymized the data in.

    Thanks for your Lib, It's very useful. I wait for your reviews if needed

    Reviewed by jewome62 at 2016-12-15 15:05
  • 11. make SIG_REGEXP forward way

    more understandable if you need to debug something

    mentioned here: https://github.com/willdurand/EmailReplyParser/pull/42#issuecomment-184572081

    ps: tests pass!

    Reviewed by glensc at 2016-02-17 12:31
  • 12. Fragment isHidden and isSignature always true?

    I'm trying to send email from my server and then get a reply, in reply email when I get fragments there is always isHidden and isSignature as true.

    Doese somebody has idea what could be wrong? I assume in my HTML code is something wrong. How should I form my HTML content?

    Reviewed by mastermaeco1993 at 2021-10-03 13:19
  • 13. Add localization (and reduce regex duplication)

    Hello again,

    While working and searching in the Parser/EmailParser.php file, I thought about something:


    Instead of having things like

    private $quoteHeadersRegex = array(
        '/^.{0,5}(On(?:(?!\bOn\b|\bwrote(\s|\xc2\xa0)?:).){0,1000}wrote(\s|\xc2\xa0)?:)$/ms', // On DATE, NAME <EMAIL> wrote:
        '/^.{0,5}(Le\b(?:(?!\bLe\b|\bécrit(\s|\xc2\xa0)?:).){0,1000}écrit(\s|\xc2\xa0)?:)$/ms', // Le DATE, NAME <EMAIL> a écrit :
        '/^.{0,5}(El(?:(?!\bEl\b|\bescribió\s?:).){0,1000}escribió\s?:)$/ms', // El DATE, NAME <EMAIL> escribió:
        '/^.{0,5}(El(?:(?!\bEl\b|\bha escrit\s?:).){0,1000}ha escrit\s?:)$/ms', // El DATE, NAME <EMAIL> ha escrit:
        '/^.{0,5}(Il(?:(?!\bIl\b|\bscritto(\s|\xc2\xa0)?:).){0,1000}scritto(\s|\xc2\xa0)?:)$/ms', // Il DATE, NAME <EMAIL> ha scritto:
        [...]
        '/^\s*(From\s?:.+\s?(\[|<).+(\]|>))/mu', // "From: NAME <EMAIL>" OR "From : NAME <EMAIL>" OR "From : NAME<EMAIL>"(With support whitespace before start and before <)
        '/^\s*(发件人\s?:.+\s?(\[|<).+(\]|>))/mu', // "发件人: NAME <EMAIL>" OR "发件人 : NAME <EMAIL>" OR "发件人 : NAME<EMAIL>"(With support whitespace before start and before <)
        '/^\s*(De\s?:.+\s?(\[|<).+(\]|>))/mu', // "De: NAME <EMAIL>" OR "De : NAME <EMAIL>" OR "De : NAME<EMAIL>"  (With support whitespace before start and before <)
        '/^\s*(Van\s?:.+\s?(\[|<).+(\]|>))/mu', // "Van: NAME <EMAIL>" OR "Van : NAME <EMAIL>" OR "Van : NAME<EMAIL>"  (With support whitespace before start and before <)
        '/^\s*(Da\s?:.+\s?(\[|<).+(\]|>))/mu', // "Da: NAME <EMAIL>" OR "Da : NAME <EMAIL>" OR "Da : NAME<EMAIL>"  (With support whitespace before start and before <)
        [...]
    );
    

    couldn't we have only one variabilized line for each "type" of reply like that (of course it's only a draft):

    private $quoteHeadersRegex = array(
        '/^.{0,5}($on(?:(?!\b$on\b|\b$wrote(\s|\xc2\xa0)?:).){0,1000}$wrote(\s|\xc2\xa0)?:)$/ms', // On DATE, NAME <EMAIL> wrote:
        [...]
        '/^\s*($from\s?:.+\s?(\[|<).+(\]|>))/mu', // "From: NAME <EMAIL>" OR "From : NAME <EMAIL>" OR "From : NAME<EMAIL>"(With support whitespace before start and before <)
        [...]
    );
    

    Then we would run these Regex checks using a list of language files, so for example $wrote would be checked with "wrote", then "a écrit", then "escribió", ...


    Here are the advantages I see in that modification:

    • Adding a new language or variation is easier
    • You don't have to duplicate X times the same Regex, modifying one or two words each time
    • You're less likely to make a mistake in a Regex

    That was my two cents, thanks for reading 😉

    Reviewed by TBG-FR at 2021-08-04 09:01
  • 14. Handle O365 Outlook replies

    Hello there !

    While using a Mantis BT plugin which is using your awesome lib, I noticed the that the <EMAIL> is no longer present, and From : is prefixed with **, that's why I modified the according regex.


    Outlook O365 replies are formatted as follows:

    * * *
    
    **From :** Firstname LASTNAME 
    
    **Sent :** mardi 3 août 2021 18:41
    
    **To :** Firstname LASTNAME ; Firstname LASTNAME 
    
    **Object :** RE: Nouveau ticket par mail [0000051] 
    

    I still have some questions to improve this PR

    • Should I include * * * and the line-breaks to the regex (making them optional probably) ?
    • Should I check for ** after the From : ? (adding a second [\*]{0,2})
    • Should I replace [\*]{0,2} with [\*]* ? (So if Microsoft or another mailer decides to use more than two * it'll be handled)

    I didn't modified or ran the tests yet.

    Thanks in advance for your reviews !

    Reviewed by TBG-FR at 2021-08-04 08:44
  • 15. Still seeing my signature in parsed emails

    I'm not sure what is classed as a 'signature' from the point of view of this module. I'm assuming, based on my testing, that it expects at least a single dash or underscore underneath the email content? My company emails do not include this, but have my name, job role and then some 'small print' about the confidentiality.

    Basically, I'm assuming that if there's no visible separator between email content and signature, the whole this is classed as 'visible content' ?

    Reviewed by mjemerson at 2021-06-23 11:36
  • 16. Add some variations to signature Regex

    @willdurand I would like to add "Sent via the", "Sent from the", and "Sent from" to the signature Regex. These are standard Android/Galaxy signatures.

    /(?:^\s*--|^\s*__|^-\w|^-- $)|(?:^Sent (from|via) (my|the )?(?:\s*\w+){1,4}$)|(?:^={30,}$)$/s

    Is this acceptable?

    Reviewed by mcki0127 at 2019-01-05 19:02
  • 17. Regex with newline?

    I have some emails that come in with a pattern like:

    From: NAME
    Sent: blah blah blah
    

    These don't match the regex /^\s*(From\s?:.+\s?(\[|<).+(\]|>))/ because they lack the <EMAIL> segment. So rather than change the existing I tried a regex like:

    /^\s*(From\s?:.+\nSent:\s?.+)/

    But this fails to match in isQuoteHeader() due to the way newlines are replaced at the beginning of the parse() method. So I changed it to:

    /^\s*(From\s?:.+(\n| )Sent:\s?.+)/

    This matches either a newline or it's replacement a space- and seems to work but I end up with a single line and I'd rather not have the resulting text be altered. Also, I'm not entirely sure that this wouldn't end up matching things incorrectly in the isQuoteHeader() method. Can you offer any advice here on how to match a multiline pattern? Is there a cleaner way to handle this?

    Thanks again for the excellent library, it's really coming in handy.

    Reviewed by billynoah at 2018-03-21 21:37
Related tags
📧 Handy email creation and transfer library for PHP with both text and MIME-compliant support.

?? Handy email creation and transfer library for PHP with both text and MIME-compliant support.

Aug 4, 2022
This package adds support for verifying new email addresses: when a user updates its email address, it won't replace the old one until the new one is verified.

Laravel Verify New Email Laravel supports verifying email addresses out of the box. This package adds support for verifying new email addresses. When

Aug 7, 2022
Magento 2 Email Catcher or Email Logger Module.

Magento 2 Module Experius email catcher / - logger

Dec 16, 2021
SendPortal - Open-source self-hosted email marketing, subscriber and list management, email campaigns and more
SendPortal  - Open-source self-hosted email marketing, subscriber and list management, email campaigns and more

SendPortal includes subscriber and list management, email campaigns, message tracking, reports and multiple workspaces/domains in a modern, flexible and scalable application.

Aug 5, 2022
Cross-language email validation. Backed by a database of over 38 000 throwable email domains.
Cross-language email validation. Backed by a database of over 38 000 throwable email domains.

Cross-language temporary (disposable/throwaway) email detection library. Covers 38038+ fake email providers.

Aug 4, 2022
An AngularJS / Laravel app - Keyword Based Email forwarder | read/write emails through IMAP
An AngularJS / Laravel app - Keyword Based Email forwarder | read/write emails through IMAP

@MailTree Simple mail forwarder. Based on the specific email body/subject keywords forward mails to the list of predefined users. Install Imap Install

Aug 21, 2018
The classic email sending library for PHP
The classic email sending library for PHP

PHPMailer – A full-featured email creation and transfer class for PHP Features Probably the world's most popular code for sending email from PHP! Used

Aug 11, 2022
Small PHP library to valid email addresses using a number of methods.

Email Validator Small PHP library to valid email addresses using a number of methods. Features Validates email address Checks for example domains (e.g

Jul 3, 2022
Library for using online Email providers

Stampie Stampie have been moved to the "Flint" organization in order to get a better collaborative flow. Stampie is a simple API Wrapper for different

Oct 7, 2020
Library for using online Email providers

Stampie Stampie is a simple API Wrapper for different email providers such as Postmark and SendGrid. It is very easy to use and to integrate into your

Jun 16, 2022
Fetch is a library for reading email and attachments, primarily using the POP and IMAP protocols

Fetch Fetch is a library for reading email and attachments, primarily using the POP and IMAP protocols. Installing N.b. A note on Ubuntu 14.04 (probab

Jul 30, 2022
PHPMailer – A full-featured email creation and transfer class for PHP
 PHPMailer – A full-featured email creation and transfer class for PHP

PHPMailer – A full-featured email creation and transfer class for PHP Features Probably the world's most popular code for sending email from PHP! Used

Aug 16, 2022
Send email across all platforms using one interface

Send email across all platforms using one interface. Table Of Content Requirements Installation Providers AmazonSES Mailgun Mailjet Mandrill Postmark

Jul 27, 2022
Mail sending module for Mezzio and Laminas MVC with support for file attachment and template email composition
Mail sending module for Mezzio and Laminas MVC with support for file attachment and template email composition

This module provides an easy and flexible way to send emails from Mezzio and Laminas MVC applications (formerly known as Zend Expressive and Zend MVC). It allows you to pre-configure emails and transports, and then send those emails at runtime.

Jan 16, 2022
Omnisend: Ecommerce Email Marketing and SMS Platform

Omnisend Omnisend: Ecommerce Email Marketing and SMS Platform Version v1.x Support all PHP Version >=5.6 v2.x Support all PHP Version >=7.0 Installati

Jan 6, 2022
Sending Email via Microsoft Exchange Web Services made Easy!

Send Mail via Exchange Web Services! Sending Email via Microsoft Exchange Web Services (EWS) made easy! Installation Install via Composer composer req

Jul 19, 2022
Mailcoach is a self-hosted email list manager - in a modern jacket.
Mailcoach is a self-hosted email list manager - in a modern jacket.

Welcome to Mailcoach Mailcoach is a self-hosted email list manager - in a modern jacket. It features: Subscribers and lists management Subscribe, doub

Jan 31, 2022
Disposable email address validator for Laravel

Laravel Disposable Email Adds a validator to Laravel for checking whether a given email address isn't originating from disposable email services such

Aug 10, 2022
EMAIL, PASSWORD AND USERNAME GENERATOR
EMAIL, PASSWORD AND USERNAME GENERATOR

Email-Generator EMAIL, PASSWORD AND USERNAME GENERATOR Install on desktop : Install XAMPP Added environment variable system path => C:\xampp\php downl

Jan 8, 2022