ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

Overview

ANTLR v4

Java 7+ License

Build status

Github CI Build Status (MacOSX) AppVeyor CI Build Status (Windows) Circle CI Build Status (Linux) Travis-CI Build Status (Swift-Linux)

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build parse trees and also generates a listener interface (or visitor) that makes it easy to respond to the recognition of phrases of interest.

Given day-job constraints, my time working on this project is limited so I'll have to focus first on fixing bugs rather than changing/improving the feature set. Likely I'll do it in bursts every few months. Please do not be offended if your bug or pull request does not yield a response! --parrt

Donate

Authors and major contributors

Useful information

You might also find the following pages useful, particularly if you want to mess around with the various target languages.

The Definitive ANTLR 4 Reference

Programmers run into parsing problems all the time. Whether it’s a data format like JSON, a network protocol like SMTP, a server configuration file for Apache, a PostScript/PDF file, or a simple spreadsheet macro language—ANTLR v4 and this book will demystify the process. ANTLR v4 has been rewritten from scratch to make it easier than ever to build parsers and the language applications built on top. This completely rewritten new edition of the bestselling Definitive ANTLR Reference shows you how to take advantage of these new features.

You can buy the book The Definitive ANTLR 4 Reference at amazon or an electronic version at the publisher's site.

You will find the Book source code useful.

Additional grammars

This repository is a collection of grammars without actions where the root directory name is the all-lowercase name of the language parsed by the grammar. For example, java, cpp, csharp, c, etc...

Comments
  • New extended Unicode escape \u{10ABCD} to support Unicode literals > U+FFFF

    New extended Unicode escape \u{10ABCD} to support Unicode literals > U+FFFF

    Fixes #276 .

    This used to be a WIP PR, but it's now ready for review.

    This PR introduces a new extended Unicode escape \u{10ABCD} in ANTLR4 grammars to support Unicode literal values > U+FFFF.

    The serialized ATN represents any atom or range with a Unicode value > U+FFFF as a set. Any such set is serialized in the ATN with 32-bit arguments.

    I bumped the UUID, since this changes the serialized ATN format.

    I included lots of tests and made sure everything is passing on Linux, Mac, and Windows.

    type:feature unicode 
    opened by bhamiltoncx 115
  • Terrible Golang performance

    Terrible Golang performance

    Stackoverflow: https://stackoverflow.com/questions/72266899/golang-performance-issues

    Google group: https://groups.google.com/g/antlr-discussion/c/OdhAIsy2GfI

    Example code: https://github.com/movelazar/perf-repro

    A simple rule such as:

    1 EQ 2 OR
    1 EQ 2 OR
    1 EQ 2 OR
    1 EQ 2 OR
    1 EQ 2
    

    takes exponentially longer to parse the more 1 EQ 2 OR clauses there are. This does not happen in python (by my testing) or CSharp, Dart, Java (by stackoverflow comment).

    On my machine, # of lines vs parse time:

    11: 0.5s
    12: 1.2s
    13: 3.2s
    14: 8.1s
    15: 21.9s
    16: 57.5s
    

    Given that Python doesn't face this issue I can't imagine I'm doing something terrible in my grammar.

    Issue goes away if I put parens on things but that's not a real solution.

    On 4.10.1, first noticed with 4.9.1.

    Any help is greatly appreciated. Surprised I can't find others with this issue.

    type:bug target:go comp:performance 
    opened by movelazar 101
  • splitting version numbers for targets

    splitting version numbers for targets

    Hiya: @pboyer, @mike-lischke, @janyou, @ewanmellor, @hanjoes, @ericvergnaud, @lingyv-li, @marcospassos

    Eric has raised the point that it would be nice to be able to make quick patches to the various runtimes; e.g., there is a stopping bug now in the JavaScript target. He proposes something along these lines:

    • any change in the tool or the runtime algorithm bumps the middle version #: 4.9 -> 4.10 -> 4.11
    • any bug fix in a runtime we bump the last digit of that runtime only: 4.9 -> 4.9.1 -> 4.9.2
    • if bumping the java runtime for bug fix we also bump the tool since it contains the runtime

    This is in optimal as people have criticized me in the past for bumping, say, 4.6 to 4.7 for some minor changes. It also has the problem that 4.9.x will not mean the same thing in two different targets possibly, as each target will now have their own version number.

    Rather than break up all of the targets into separate repositories or similar, can you guys think of a better solution? Any suggestions? The goal here is to allow more rapid target releases, and independent of me having to do a major release of the tool.

    type:question 
    opened by parrt 94
  • Improve memory usage and perf of CodePointCharStream: Use 8-bit, 16-bit, or 32-bit buffer

    Improve memory usage and perf of CodePointCharStream: Use 8-bit, 16-bit, or 32-bit buffer

    This greatly improves the memory usage and performance of CodePointCharStream by ensuring the internal storage uses either an 8-bit buffer (for Unicode code points <= U+00FF), 16-bit buffer (for Unicode code points <= U+FFFF), or a 32-bit buffer (Unicode code points > U+FFFF).

    I split out the internal storage into a class CodePointBuffer which has a CodePointBuffer.Builder class which has the logic to upgrade from 8-bit to 16-bit to 32-bit storage.

    I found the perf hotspot in CodePointCharStream on master was the virtual method calls from CharStream.LA(offset) into IntBuffer.

    Refactoring it into CodePointBuffer didn't help (in fact, it added another virtual method call).

    To fix the perf, I made CodePointCharStream an abstract class and made three concrete subclasses: CodePoint8BitCharStream, CodePoint16BitCharStream, and CodePoint32BitCharStream which directly access the array of underlying code points in the CodePointBuffer without virtual method calls.

    lexers target:java comp:performance 
    opened by bhamiltoncx 85
  • initial discussion to start integration of new targets

    initial discussion to start integration of new targets

    As promised, I am now ready to integrate the new ANTLR target languages you folks have been working on. This issue is meant to get everybody in sync, check status, and discuss the proper order of integration and resolve issues etc.

    There are two administrative details to get out of the way first:

    1. Please let me know if there is another github user that should be added to one of the categories. Or, of course, if you would like your user ID removed from this discussion.
    2. Nothing can be merged into antlr/antlr4 unless every single committer has added themselves to the contributors.txt file. It's onerous, particularly for simple commits, but it is requirement for anything merged into the master. Eclipse foundation lawyers tell me that we have one of the cleanest licenses out there and it contributes to ANTLR's widespread use because companies are not afraid to use the software. See the genesis of such heinous requirements in SCO v IBM. This means lead target authors have to go back through their committers list quickly and ask them to sign the contributors file with a new commit. Or, they can remove that commit and enter their own version of the functionality, being careful not to violate copyright on the previous.

    As we proceed, please keep in mind that I have a difficult role, balancing the needs of multiple targets and keeping discussions in the civil and practical zone. Decisions I make come from the perspective of over 25 years managing and leading this project. I look forward to incorporating your hard work into the main antlr repo.

    C++ current location

    • @mike-lischke
    • @DanMcLaughlin
    • @nburles
    • @davesisson

    Go current location, previous discussion

    • @pboyer

    Swift current location: unclear, previous discussion

    • @jeffreyguenther
    • @hanjoes
    • @janyou
    • @ewanmellor

    Likely interested/supporting humans (scraped from github issues):

    • @RYDB3RG
    • @wjkohnen
    • @willfaught
    • @parrt
    • @sharwell
    • @ericvergnaud
    type:improvement target:swift target:cpp target:go 
    opened by parrt 84
  • Add a new CharStream that converts the symbols to upper or lower case.

    Add a new CharStream that converts the symbols to upper or lower case.

    This is useful for many of the case insensitive grammars found at https://github.com/antlr/grammars-v4/ which assume the input would be all upper or lower case. Related discussion can be found at https://github.com/antlr/antlr4test-maven-plugin/issues/1

    It would be used like so:

    input, _ := antlr.NewFileStream("filename")
    
    in = antlr.NewCaseChangingStream(is, true) // true forces upper case symbols, false forces lower case.
    
    lexer := parser.NewAbnfLexer(in)
    

    While writing this, I found other people have written their own similar implementations (go, java). It makes sense to place this in the core, so everyone can use it.

    I would love for the grammar to have a option that says the lexer should upper/lower case all input, and then this code could be moved into the generated Lexer, and no user would need to explicitly use a CaseChangingStream (similar to what's discussed in #1002).

    lexers comp:runtime target:java target:javascript target:go 
    opened by bramp 69
  • Swift Target

    Swift Target

    I did a quick search and I didn't see anything written about this yet. What's the likelihood of a Swift target for ANTLR?

    There are C#, Javascript, and Python targets at the moment.

    What does it take to implement a target? Given that Swift is more Java-like, it seems like it should be possible. Maybe start with a code translator if there is one for (Java to Swift), and iterate towards a more idiomatic implementation.

    type:question 
    opened by jeffreyguenther 69
  • Clean up ATN serialization: rm UUID and shifting by value of 2

    Clean up ATN serialization: rm UUID and shifting by value of 2

    • I think we don't need the UUID in the serialization, since it has not changed in a decade. We can bump the version number and remove the UU ID
    • I did some tests and there seems to be no reason to shift the values in the serialized ATN by 2 for the purposes of improving the UTF-8 encoding for the Java target.

    If you guys agree, we can make this small change for cleanup purposes. I'm happy to do it if you guys don't want to. The second fix will require changes to each target but it's trivial to fix.

    atn-analysis type:cleanup 
    opened by parrt 68
  • Preparing for 4.9.3 release

    Preparing for 4.9.3 release

    It's that time of year again! @pboyer, @mike-lischke, @janyou, @ewanmellor, @hanjoes, @ericvergnaud, @lingyv-li, @marcospassos Shall we do a 4.9.3 release?

    I went through and marked all of the merged PRs and related issues with 4.9.3 and try to tag them according to their target. Would you guys like to go through the PRs to see if there's something that should be merged quickly?

    opened by parrt 68
  • [CSharp] #2021 fixes nuget packaging options to avoid missing dll exceptions

    [CSharp] #2021 fixes nuget packaging options to avoid missing dll exceptions

    @ericvergnaud Hi, I modified csproj options a bit, now I can get a working nuget package locally without the issue we described in #2021. I added .net 3.5 as a target to "main" csproj along with netstandard, since it's easier to keep track of requirements for both sets of api's when editing code and, ideally, both targets can be packed into a nuget package with a single command. Right now it's possible only on Windows via msbuild /t:pack or Visual Studio; unfortunately, due to https://github.com/Microsoft/msbuild/issues/1333, right now dotnet build pack does not work for .net 3.5 target the way it should, so I adjusted the existing script to create packages from .nuspec and different solutions for different targets.

    comp:build target:csharp 
    opened by listerenko 68
  • A few updates to the Unicode documentation.

    A few updates to the Unicode documentation.

    It should be made clear that the recommended use of CharStreams.fromPath() is a Java-only solution. The other targets just have their ANTLRInputStream class extended to support full Unicode.

    comp:doc 
    opened by mike-lischke 61
  • Assorted problems in calling the wrong wrapper for reachesIntoOuterContext

    Assorted problems in calling the wrong wrapper for reachesIntoOuterContext

    This is a serious bug in the CSharp runtime, ParserATNSimulator.cs.

    reachesIntoOuterContext is an integer that tracks the depth of how far we dip into the outer context. This field must be interpreted with the bit map mask SUPPRESS_PRECEDENCE_FILTER, and careful attention placed on whether to access the field raw or with the bit map mask.

    In Java, this code accesses the field raw.

    In CSharp, the field is accessed through a method that applies the bit map mask. This is a serious problem in that it causes ATN parser trace divergence.

    I did a cursory check on the other targets, and I am not confident the field access is done correctly across targets.

    The fix is the change ParserATNInterpreter.cs:

    diff --git a/runtime/CSharp/src/Atn/ParserATNSimulator.cs b/runtime/CSharp/src/Atn/ParserATNSimulator.cs
    index 4a7a6a3d5..8463bfcc3 100644
    --- a/runtime/CSharp/src/Atn/ParserATNSimulator.cs
    +++ b/runtime/CSharp/src/Atn/ParserATNSimulator.cs
    @@ -1579,7 +1579,7 @@ namespace Antlr4.Runtime.Atn
     						// This assignment also propagates the
     						// isPrecedenceFilterSuppressed() value to the new
     						// configuration.
    -						c.reachesIntoOuterContext = config.OuterContextDepth;
    +                                                c.reachesIntoOuterContext = config.reachesIntoOuterContext;
     						ClosureCheckingStopState(c, configSet, closureBusy, collectPredicates,
     												 fullCtx, depth - 1, treatEofAsEpsilon);
     					}
    

    (Note, the source code should really not be using tabs. Tab expansion is editor specific.)

    atn-analysis type:bug target:csharp 
    opened by kaby76 3
  • Python runtime test failure with 4.10 and later

    Python runtime test failure with 4.10 and later

    After upgrading the antlr4 python runtime past 4.9.1 (tested 4.10.1 and 4.11.1) in nixpkgs, we've been seeing its test suite fail with

    Traceback (most recent call last):
      File "/build/source/runtime/Python3/tests/ctest.py", line 10, in <module>
        from parser.cparser import CParser
      File "/build/source/runtime/Python3/tests/parser/cparser.py", line 631, in <module>
        class CParser ( Parser ):
      File "/build/source/runtime/Python3/tests/parser/cparser.py", line 635, in CParser
        atn = ATNDeserializer().deserialize(serializedATN())
      File "/nix/store/ch8i929c63av55h9nxkinifh61mazf1h-python3.10-antlr4-python3-runtime-4.11.1/lib/python3.10/site-packages/antlr4/atn/ATNDeserializer.py", line 28, in deserialize
        self.checkVersion()
      File "/nix/store/ch8i929c63av55h9nxkinifh61mazf1h-python3.10-antlr4-python3-runtime-4.11.1/lib/python3.10/site-packages/antlr4/atn/ATNDeserializer.py", line 50, in checkVersion
        raise Exception("Could not deserialize ATN with version " + str(version) + " (expected " + str(SERIALIZED_VERSION) + ").")
    Exception: Could not deserialize ATN with version  (expected 4).
    

    The version returned seems to be \x03, while the test suite expects it to be 4.

    While similar to #3997 and #3895 we are running the test suite, not our own code.

    We're running python ctest.py from the tests directory.

    opened by mweinelt 4
  • Suggestion for the Test Rig GUI tool

    Suggestion for the Test Rig GUI tool

    This is not a defect or reporting a problem. It is a suggestion that the GUI tool have a new button added between "OK" and "Export as PNG". That can be named "Reload".

    The idea being that I can edit/save a test source file and just click "reload" rather than doing what I do which is close the tool and go back to my command window and rerun the batch file that starts the GUI.

    Thoughts?

    opened by Korporal 0
  •  invalid syntax `self.from = None` in python3 generated parsers

    invalid syntax `self.from = None` in python3 generated parsers

    Please include the following information

    python3 antlr4 version is 4.9.3

    • smallest possible grammar and code that reproduces the behavior
    import sys
    import antlr4
    from PrestoSqlLexer import PrestoSqlLexer
    from PrestoSqlParser import PrestoSqlParser
    
    
    def main():
        input_stream = antlr4.CharStream('select 1')
        lexer = PrestoSqlLexer(input_stream)
        tokens = antlr4.CommonTokenStream(lexer)
        tokens.fill()
        print([token.text for token in tokens.tokens][:-1])
    
    
    if __name__ == '__main__':
        main()
    
    • description of the expected behavior and actual behavior Pointers to suspicious code regions are also very welcome.

    error:

    Traceback (most recent call last):
      File "/Users/clients/autocomplete/parseQuery.py", line 4, in <module>
        from PrestoSqlParser import PrestoSqlParser
      File "/Users/clients/autocomplete/PrestoSqlParser.py", line 1837
        self.from = None # QualifiedNameContext
             ^^^^
    SyntaxError: invalid syntax
    
    opened by chengchengpei 2
Releases(4.11.1)
Owner
Antlr Project
The Project organization for the ANTLR parser generator.
Antlr Project
An object oriented PHP driver for FFMpeg binary

php-ffmpeg An Object-Oriented library to convert video/audio files with FFmpeg / AVConv. Check another amazing repo: PHP FFMpeg extras, you will find

null 4.4k Jan 2, 2023
A full PHP implementation of Minecraft's Named Binary Tag (NBT) format.

php-nbt A full PHP implementation of Minecraft's Named Binary Tag (NBT) format. In contrast to other implementations, this library provides full suppo

Aternos 8 Oct 7, 2022
🛬🧾 A PHP8 TacView ACMI file format parser

A PHP8 TacView ACMI file format parser This package offers parsing support for TacView ACMI flight recordings version 2.1, it follows the standard des

Ignacio Muñoz Fernandez 1 Jan 18, 2022
High-quality cloud service for text translation. 100+ Languages

CloudAPI.Stream High-quality cloud service for text translation. 100+ Languages https://cloudapi.stream/ Class initialization $CAS = new CAS; Key inst

CloudAPI.Stream 1 May 8, 2022
PHP library that provides a filesystem abstraction layer − will be a feast for your files!

Gaufrette Gaufrette provides a filesystem abstraction layer. Why use Gaufrette? Imagine you have to manage a lot of medias in a PHP project. Lets see

KNP Labs 2.4k Jan 7, 2023
PHP runtime & extensions header files for PhpStorm

phpstorm-stubs STUBS are normal, syntactically correct PHP files that contain function & class signatures, constant definitions, etc. for all built-in

JetBrains 1.2k Dec 25, 2022
Tailwind plugin to generate purge-safe.txt files

Tailwind plugin to generate safelist.txt files With tailwind-safelist-generator, you can generate a safelist.txt file for your theme based on a set of

Spatie 89 Dec 23, 2022
This small PHP package assists in the loading and parsing of VTT files.

VTT Transcriptions This small PHP package assists in the loading and parsing of VTT files. Usage use Laracasts\Transcriptions\Transcription; $transcr

Laracasts 52 Nov 28, 2022
Associate files with Eloquent models

Associate files with Eloquent models This package can associate all sorts of files with Eloquent models. It provides a simple API to work with. To lea

Spatie 5.2k Jan 4, 2023
This package used to upload files using laravel-media-library before saving model.

Laravel Media Uploader This package used to upload files using laravel-media-library before saving model. In this package all uploaded media will be p

Ahmed Fathy 312 Dec 12, 2022
A python program to cut longer MP3 files (i.e. recordings of several songs) into the individual tracks.

I'm writing a python script to cut longer MP3 files (i.e. recordings of several songs) into the individual tracks called ReCut. So far there are two

Dönerspiess 1 Oct 27, 2021
A Flysystem proxy adapter that enables compression and encryption of files and streams on the fly

Slam / flysystem-compress-and-encrypt-proxy Compress and Encrypt files and streams before saving them to the final Flysystem destination. Installation

Filippo Tessarotto 27 Jun 4, 2022
FileGator - a free, open-source, self-hosted web application for managing files and folders.

FileGator - Powerful Multi-User File Manager FileGator is a free, open-source, self-hosted web application for managing files and folders. You can man

FileGator 1.3k Jan 3, 2023
FileNamingResolver - A lightweight library which helps to resolve a file/directory naming of uploaded files using various naming strategies

FileNamingResolver - A lightweight library which helps to resolve a file/directory naming of uploaded files using various naming strategies

Victor Bocharsky 111 May 19, 2022
PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language

php-text-analysis PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP l

null 464 Dec 28, 2022
A pure PHP library for reading and writing word processing documents

Master: Develop: PHPWord is a library written in pure PHP that provides a set of classes to write to and read from different document file formats. Th

PHPOffice 6.5k Jan 7, 2023
A language detection library for PHP. Detects the language from a given text string.

language-detection Build Status Code Coverage Version Total Downloads Minimum PHP Version License This library can detect the language of a given text

Patrick Schur 738 Dec 28, 2022