PlantUML parser for PHP

Overview

This package builds AST of class definitions from plantuml files. This package works only with php.

Installation

Via Composer

composer require puml2php/puml-parser

Usage

sample PlantUML source file.

@startuml
package Lexer {
    interface Tokenizeable
    package Lexer/Arrow {
        abstract class ArrowTokenizer implements Tokenizeable
        class LeftArrowTokenizer {
            + publicProperty : array
            # protectedProperty : string
            - privateProperty
        }
    }

    enum Enum {
      CASE1
      CASE2
      CASE3
    }

    LeftArrowTokenizer--|>ArrowTokenizer
    NoneDefinitionClass ..|> Tokenizeable
}
@enduml

Basically, it is assumed that each class definition will be manipulated after it is converted to DTO.

<?php

use PumlParser\Lexer\Lexer;
use PumlParser\Lexer\PumlTokenizer;
use PumlParser\Parser\Parser;

$lexer  = new Lexer(new PumlTokenizer());
$parser = new Parser($lexer);
$ast    = $parser->parse(__DIR__ . '/sample.puml');

foreach ($ast->toDtos() as $definition) {
    echo "----------\n";

    echo "name: " . $definition->getName() . "\n";
    echo "package: " . $definition->getPackage() . "\n";

    if ($definition->getType() === 'enum') {
        foreach ($definition->getCases() as $case) {
            echo "case: " . $case . "\n";
        }
    } else {
        foreach ($definition->getProperties() as $property) {
            echo "property name: " . $property->getName() . " , visibility:  " . $property->getVisibility() . "\n";
        }
    }
}

$ php sample.php
----------
name: Tokenizeable
package: Lexer
----------
name: ArrowTokenizer
package: Lexer\Arrow
----------
name: LeftArrowTokenizer
package: Lexer\Arrow
property name: publicProperty , visibility:  public
property name: protectedProperty , visibility:  protected
property name: privateProperty , visibility:  private
----------
name: Enum
package: Lexer
case: CASE1
case: CASE2
case: CASE3
----------
name: NoneDefinitionClass
package: Lexer

Support for three parsing results. They are json, array, and Dto.

<?php

use PumlParser\Lexer\Lexer;
use PumlParser\Lexer\PumlTokenizer;
use PumlParser\Parser\Parser;

$lexer  = new Lexer(new PumlTokenizer());
$parser = new Parser($lexer);
$ast    = $parser->parse(__DIR__ . '/sample.puml');

dump $ast->toDtos()

array(4) {
  [0]=>
  object(PumlParser\Dto\Definition)#59 (6) {
    ["name":"PumlParser\Dto\Definition":private]=>
    string(12) "Tokenizeable"
    ["type":"PumlParser\Dto\Definition":private]=>
    string(9) "interface"
    ["package":"PumlParser\Dto\Definition":private]=>
    string(5) "Lexer"
    ["properties":"PumlParser\Dto\Definition":private]=>
    array(0) {
    }
    ["parents":"PumlParser\Dto\Definition":private]=>
    array(0) {
    }
    ["interfaces":"PumlParser\Dto\Definition":private]=>
    array(0) {
    }
  }
  [1]=>
  object(PumlParser\Dto\Definition)#62 (6) {
    ["name":"PumlParser\Dto\Definition":private]=>
    string(14) "ArrowTokenizer"
    ["type":"PumlParser\Dto\Definition":private]=>
    string(14) "abstract class"
    ["package":"PumlParser\Dto\Definition":private]=>
    string(11) "Lexer\Arrow"
    ["properties":"PumlParser\Dto\Definition":private]=>
    array(0) {
    }
    ["parents":"PumlParser\Dto\Definition":private]=>
    array(0) {
    }
    ["interfaces":"PumlParser\Dto\Definition":private]=>
    array(1) {
      [0]=>
      object(PumlParser\Dto\Definition)#46 (6) {
        ["name":"PumlParser\Dto\Definition":private]=>
        string(12) "Tokenizeable"
        ["type":"PumlParser\Dto\Definition":private]=>
        string(9) "interface"
        ["package":"PumlParser\Dto\Definition":private]=>
        string(5) "Lexer"
        ["properties":"PumlParser\Dto\Definition":private]=>
        array(0) {
        }
        ["parents":"PumlParser\Dto\Definition":private]=>
        array(0) {
        }
        ["interfaces":"PumlParser\Dto\Definition":private]=>
        array(0) {
        }
      }
    }
  }
  [2]=>
  object(PumlParser\Dto\Definition)#61 (6) {
    ["name":"PumlParser\Dto\Definition":private]=>
    string(18) "LeftArrowTokenizer"
    ["type":"PumlParser\Dto\Definition":private]=>
    string(5) "class"
    ["package":"PumlParser\Dto\Definition":private]=>
    string(11) "Lexer\Arrow"
    ["properties":"PumlParser\Dto\Definition":private]=>
    array(3) {
      [0]=>
      object(PumlParser\Dto\PropertyDefinition)#34 (2) {
        ["name":"PumlParser\Dto\PropertyDefinition":private]=>
        string(14) "publicProperty"
        ["visibility":"PumlParser\Dto\PropertyDefinition":private]=>
        string(6) "public"
      }
      [1]=>
      object(PumlParser\Dto\PropertyDefinition)#33 (2) {
        ["name":"PumlParser\Dto\PropertyDefinition":private]=>
        string(17) "protectedProperty"
        ["visibility":"PumlParser\Dto\PropertyDefinition":private]=>
        string(9) "protected"
      }
      [2]=>
      object(PumlParser\Dto\PropertyDefinition)#60 (2) {
        ["name":"PumlParser\Dto\PropertyDefinition":private]=>
        string(15) "privateProperty"
        ["visibility":"PumlParser\Dto\PropertyDefinition":private]=>
        string(7) "private"
      }
    }
    ["parents":"PumlParser\Dto\Definition":private]=>
    array(1) {
      [0]=>
      object(PumlParser\Dto\Definition)#26 (6) {
        ["name":"PumlParser\Dto\Definition":private]=>
        string(14) "ArrowTokenizer"
        ["type":"PumlParser\Dto\Definition":private]=>
        string(14) "abstract class"
        ["package":"PumlParser\Dto\Definition":private]=>
        string(11) "Lexer\Arrow"
        ["properties":"PumlParser\Dto\Definition":private]=>
        array(0) {
        }
        ["parents":"PumlParser\Dto\Definition":private]=>
        array(0) {
        }
        ["interfaces":"PumlParser\Dto\Definition":private]=>
        array(1) {
          [0]=>
          object(PumlParser\Dto\Definition)#57 (6) {
            ["name":"PumlParser\Dto\Definition":private]=>
            string(12) "Tokenizeable"
            ["type":"PumlParser\Dto\Definition":private]=>
            string(9) "interface"
            ["package":"PumlParser\Dto\Definition":private]=>
            string(5) "Lexer"
            ["properties":"PumlParser\Dto\Definition":private]=>
            array(0) {
            }
            ["parents":"PumlParser\Dto\Definition":private]=>
            array(0) {
            }
            ["interfaces":"PumlParser\Dto\Definition":private]=>
            array(0) {
            }
          }
        }
      }
    }
    ["interfaces":"PumlParser\Dto\Definition":private]=>
    array(0) {
    }
  }
  [3]=>
  object(PumlParser\Dto\Definition)#41 (6) {
    ["name":"PumlParser\Dto\Definition":private]=>
    string(19) "NoneDefinitionClass"
    ["type":"PumlParser\Dto\Definition":private]=>
    string(5) "class"
    ["package":"PumlParser\Dto\Definition":private]=>
    string(5) "Lexer"
    ["properties":"PumlParser\Dto\Definition":private]=>
    array(0) {
    }
    ["parents":"PumlParser\Dto\Definition":private]=>
    array(0) {
    }
    ["interfaces":"PumlParser\Dto\Definition":private]=>
    array(1) {
      [0]=>
      object(PumlParser\Dto\Definition)#56 (6) {
        ["name":"PumlParser\Dto\Definition":private]=>
        string(12) "Tokenizeable"
        ["type":"PumlParser\Dto\Definition":private]=>
        string(9) "interface"
        ["package":"PumlParser\Dto\Definition":private]=>
        string(5) "Lexer"
        ["properties":"PumlParser\Dto\Definition":private]=>
        array(0) {
        }
        ["parents":"PumlParser\Dto\Definition":private]=>
        array(0) {
        }
        ["interfaces":"PumlParser\Dto\Definition":private]=>
        array(0) {
        }
      }
    }
  }
}

dump $ast->toJson()

[
    {
        "interface": {
            "Name": "Tokenizeable",
            "Package": "Lexer",
            "Propaties": [],
            "Parents": [],
            "Interfaces": []
        }
    },
    {
        "abstract class": {
            "Name": "ArrowTokenizer",
            "Package": "Lexer/Arrow",
            "Propaties": [],
            "Parents": [],
            "Interfaces": [
                {
                    "interface": {
                        "Name": "Tokenizeable",
                        "Package": "Lexer",
                        "Propaties": [],
                        "Parents": [],
                        "Interfaces": []
                    }
                }
            ]
        }
    },
    {
        "class": {
            "Name": "LeftArrowTokenizer",
            "Package": "Lexer/Arrow",
            "Propaties": [
                {
                    "name": "publicProperty",
                    "visibility": "public"
                },
                {
                    "name": "protectedProperty",
                    "visibility": "protected"
                },
                {
                    "name": "privateProperty",
                    "visibility": "private"
                }
            ],
            "Parents": [
                {
                    "abstract class": {
                        "Name": "ArrowTokenizer",
                        "Package": "Lexer/Arrow",
                        "Propaties": [],
                        "Parents": [],
                        "Interfaces": [
                            {
                                "interface": {
                                    "Name": "Tokenizeable",
                                    "Package": "Lexer",
                                    "Propaties": [],
                                    "Parents": [],
                                    "Interfaces": []
                                }
                            }
                        ]
                    }
                }
            ],
            "Interfaces": []
        }
    },
    {
        "class": {
            "Name": "NoneDefinitionClass",
            "Package": "Lexer",
            "Propaties": [],
            "Parents": [],
            "Interfaces": [
                {
                    "interface": {
                        "Name": "Tokenizeable",
                        "Package": "Lexer",
                        "Propaties": [],
                        "Parents": [],
                        "Interfaces": []
                    }
                }
            ]
        }
    }
]

dump $ast->toArray()

array(4) {
  [0]=>
  array(1) {
    ["interface"]=>
    array(5) {
      ["Name"]=>
      string(12) "Tokenizeable"
      ["Package"]=>
      string(5) "Lexer"
      ["Propaties"]=>
      array(0) {
      }
      ["Parents"]=>
      array(0) {
      }
      ["Interfaces"]=>
      array(0) {
      }
    }
  }
  [1]=>
  array(1) {
    ["abstract class"]=>
    array(5) {
      ["Name"]=>
      string(14) "ArrowTokenizer"
      ["Package"]=>
      string(11) "Lexer/Arrow"
      ["Propaties"]=>
      array(0) {
      }
      ["Parents"]=>
      array(0) {
      }
      ["Interfaces"]=>
      array(1) {
        [0]=>
        array(1) {
          ["interface"]=>
          array(5) {
            ["Name"]=>
            string(12) "Tokenizeable"
            ["Package"]=>
            string(5) "Lexer"
            ["Propaties"]=>
            array(0) {
            }
            ["Parents"]=>
            array(0) {
            }
            ["Interfaces"]=>
            array(0) {
            }
          }
        }
      }
    }
  }
  [2]=>
  array(1) {
    ["class"]=>
    array(5) {
      ["Name"]=>
      string(18) "LeftArrowTokenizer"
      ["Package"]=>
      string(11) "Lexer/Arrow"
      ["Propaties"]=>
      array(3) {
        [0]=>
        array(2) {
          ["name"]=>
          string(14) "publicProperty"
          ["visibility"]=>
          string(6) "public"
        }
        [1]=>
        array(2) {
          ["name"]=>
          string(17) "protectedProperty"
          ["visibility"]=>
          string(9) "protected"
        }
        [2]=>
        array(2) {
          ["name"]=>
          string(15) "privateProperty"
          ["visibility"]=>
          string(7) "private"
        }
      }
      ["Parents"]=>
      array(1) {
        [0]=>
        array(1) {
          ["abstract class"]=>
          array(5) {
            ["Name"]=>
            string(14) "ArrowTokenizer"
            ["Package"]=>
            string(11) "Lexer/Arrow"
            ["Propaties"]=>
            array(0) {
            }
            ["Parents"]=>
            array(0) {
            }
            ["Interfaces"]=>
            array(1) {
              [0]=>
              array(1) {
                ["interface"]=>
                array(5) {
                  ["Name"]=>
                  string(12) "Tokenizeable"
                  ["Package"]=>
                  string(5) "Lexer"
                  ["Propaties"]=>
                  array(0) {
                  }
                  ["Parents"]=>
                  array(0) {
                  }
                  ["Interfaces"]=>
                  array(0) {
                  }
                }
              }
            }
          }
        }
      }
      ["Interfaces"]=>
      array(0) {
      }
    }
  }
  [3]=>
  array(1) {
    ["class"]=>
    array(5) {
      ["Name"]=>
      string(19) "NoneDefinitionClass"
      ["Package"]=>
      string(5) "Lexer"
      ["Propaties"]=>
      array(0) {
      }
      ["Parents"]=>
      array(0) {
      }
      ["Interfaces"]=>
      array(1) {
        [0]=>
        array(1) {
          ["interface"]=>
          array(5) {
            ["Name"]=>
            string(12) "Tokenizeable"
            ["Package"]=>
            string(5) "Lexer"
            ["Propaties"]=>
            array(0) {
            }
            ["Parents"]=>
            array(0) {
            }
            ["Interfaces"]=>
            array(0) {
            }
          }
        }
      }
    }
  }
}

License

The MIT License (MIT). Please see LICENSE for more information.

I was looking into the parsing abilities of your changes in 2.1. It looks quite promising to me. It failed though to parse this file:

@startuml

hide empty methods

package Heptacom\HeptaConnect\Playground\Dataset {
    class Cap {
        + type : CapType
    }
}
@enduml

It fails with this stacktrace:

InvalidArgumentException: 

vendor/puml2php/puml-parser/src/Lexer/Token/Tokens.php:36
vendor/puml2php/puml-parser/src/Lexer/Token/Tokens.php:50
vendor/puml2php/puml-parser/src/Parser/Parser.php:149
vendor/puml2php/puml-parser/src/Parser/Parser.php:73
vendor/puml2php/puml-parser/src/Parser/Parser.php:114
vendor/puml2php/puml-parser/src/Parser/Parser.php:68
vendor/puml2php/puml-parser/src/Parser/Parser.php:50

I can follow the stacktrace but the exception is not helpful to understand where the tokenizing begins to fail. So the best would be to solve these two steps:

have a understandable error message. There are already exceptions that can tell exact positions in uml code (I saw some related to #10 ) . Maybe we can get this in there as well
understand why the above plantuml code fails and fix either the incoming uml code or tokenizer

A New Markdown parser for PHP5.4

Ciconia - A New Markdown Parser for PHP The Markdown parser for PHP5.4, it is fully extensible. Ciconia is the collection of extension, so you can rep

357 Jan 3, 2023

A lightweight lexical string parser for BBCode styled markup.

Decoda A lightweight lexical string parser for BBCode styled markup. Requirements PHP 5.6.0+ Multibyte Composer Contributors "Marten-Plain" emoticons

194 Dec 27, 2022

Simple URL parser

urlparser Simple URL parser This is a simple URL parser, which returns an array of results from url of kind /module/controller/param1:value/param2:val

1 Oct 29, 2021

This is a simple, streaming parser for processing large JSON documents

Streaming JSON parser for PHP This is a simple, streaming parser for processing large JSON documents. Use it for parsing very large JSON documents to

687 Jan 4, 2023

UpToDocs scans a Markdown file for PHP code blocks, and executes each one in a separate process.

UpToDocs UpToDocs scans a Markdown file for PHP code blocks, and executes each one in a separate process. Include this in your CI workflows, to make s

56 Nov 26, 2022

This is a simple php project to help a friend how parse a xml file.

xml-parser-with-laravie Requirements PHP 7.4+ Composer 2+ How to to setup to test? This is very simple, just follow this commands git clone https://gi

2 Dec 3, 2021

Plug and play flat file markdown blog for your Laravel-projects

Ampersand Plug-and-play flat file markdown blog tool for your Laravel-project. Create an article or blog-section on your site without the hassle of se

22 Dec 5, 2022

Convert HTML to Markdown with PHP

HTML To Markdown for PHP Library which converts HTML to Markdown for your sanity and convenience. Requires: PHP 7.2+ Lead Developer: @colinodell Origi

1.5k Dec 28, 2022

A simple PHP library for handling Emoji

Emoji Emoji images from unicode characters and names (i.e. :sunrise:). Built to work with Twemoji images. use HeyUpdate\Emoji\Emoji; use HeyUpdate\Emo

54 May 23, 2022

Capture complete property line
I played around with the latest update and found an issue. This changes behaviour so it might be a breaking change. Maybe this is ok as a bug fix. You should determine whether this is a bugfix or not, but it helps me to work further:

Again my testing payload is

@startuml hide empty methods package Heptacom\HeptaConnect\Playground\Dataset { class Cap { + type : CapType } } @enduml

Before my change the property is interpreted as:

With my change it looks like this

It allows property lines to be captured completely when whitespace separates multiple entries. Without this change the property is not completely covered. I assume it will result in maybe some unwanted results because I am merging multiple values by a single space, which is not the exact whitespace value that has been originally be skipped by the tokenizing. I also assume that when you follow the plantuml guides to change the class body that the group separators could now be part of a property (see here their example).

Please add your feedback how to process my finding.
opened by JoshuaBehrens 4
Missing properties/fields

I really like to write a code generator with this library but your parser does not read the fields yet with it :/ With your package it would be possible to use php only to work with plantuml files. I watch this repo and maybe I can switch from xmi to this package :)

opened by JoshuaBehrens 3
Parsing of small plantuml fails on (presumably) missing visibility
I was looking into the parsing abilities of your changes in 2.1. It looks quite promising to me. It failed though to parse this file:

@startuml hide empty methods package Heptacom\HeptaConnect\Playground\Dataset { class Cap { + type : CapType } } @enduml

It fails with this stacktrace:

InvalidArgumentException: vendor/puml2php/puml-parser/src/Lexer/Token/Tokens.php:36 vendor/puml2php/puml-parser/src/Lexer/Token/Tokens.php:50 vendor/puml2php/puml-parser/src/Parser/Parser.php:149 vendor/puml2php/puml-parser/src/Parser/Parser.php:73 vendor/puml2php/puml-parser/src/Parser/Parser.php:114 vendor/puml2php/puml-parser/src/Parser/Parser.php:68 vendor/puml2php/puml-parser/src/Parser/Parser.php:50

I can follow the stacktrace but the exception is not helpful to understand where the tokenizing begins to fail. So the best would be to solve these two steps:

have a understandable error message. There are already exceptions that can tell exact positions in uml code (I saw some related to #10 ) . Maybe we can get this in there as well

understand why the above plantuml code fails and fix either the incoming uml code or tokenizer
opened by JoshuaBehrens 2

v3.2.1(Apr 30, 2022)
What's Changed

Return value of Tokens::next is set to self by @tasuku43 in https://github.com/tasuku43/puml-parser-php/pull/17

Full Changelog: https://github.com/tasuku43/puml-parser-php/compare/v3.2.0...v3.2.1
Source code(tar.gz)
Source code(zip)
v3.2.0(Apr 28, 2022)
What's Changed

Added support for property types by @tasuku43 in https://github.com/tasuku43/puml-parser-php/pull/16

Full Changelog: https://github.com/tasuku43/puml-parser-php/compare/v3.1.3...v3.2.0
Source code(tar.gz)
Source code(zip)
v3.1.3(Apr 27, 2022)
What's Changed

Capture complete property line by @JoshuaBehrens in https://github.com/tasuku43/puml-parser-php/pull/15

Full Changelog: https://github.com/tasuku43/puml-parser-php/compare/v3.1.2...v3.1.3
Source code(tar.gz)
Source code(zip)
v3.1.2(Apr 24, 2022)
What's Changed

Fixed a problem that resulted in an infinite loop when a property type was specified by @tasuku43 in https://github.com/tasuku43/puml-parser-php/pull/14

Full Changelog: https://github.com/tasuku43/puml-parser-php/compare/v3.1.1...v3.1.2
Source code(tar.gz)
Source code(zip)
v3.1.0(Dec 2, 2021)

Source code(tar.gz)
Source code(zip)
v3.0.0(Dec 2, 2021)

Source code(tar.gz)
Source code(zip)
v2.1.0(Nov 30, 2021)

Source code(tar.gz)
Source code(zip)
v2.0.0(Nov 30, 2021)

Source code(tar.gz)
Source code(zip)