Compiler
文件大小: unknow
源码售价: 5 个金币 积分规则     积分充值
资源说明:A failed attempt to write a compiler for a new Java-like programming language. For the rewrite that can compile to native code, see my Plinth repository.
This compiler is for a language of my own invention. It is designed to be
somewhat Java-like, but with features from many languages including C++, C#,
Python, Haskell and probably others.

It will have a lot of features that I would like Java to have, which are listed
in the "Language Features" section below. However, one of the main reasons for
this language is that it will compile to native code (LLVM bitcode is the
current plan), and thus be an alternative to C and C++ for desktop applications.

======================
  Compiler Structure
====================== 

The compiler is divided into several sections.

parser
  A basic LR parser library.
parser.lalr
  An LALR(1) parser generator, and a sample implementation of the parser
  library's interfaces.

compiler.language
  The main language package, no source files yet, but will eventually contain
  the driver code.

compiler.language.parser
  The language's parser.
  The tokenizer is called LanguageTokenizer, and transforms the text it reads
  into the format expected by the LALR(1) parser.
  The set of LALR terminals and non-terminals is contained in the ParseType
  enum, and each non-terminal is generated by a Rule subclass in the 'rules'
  subpackage.
  Running the LALR(1) parser generator on the language's set of rules takes a
  while, so a generated parser is also included here, along with the generator
  for it, as it needs to be regenerated whenever the rules are changed.
compiler.language.parser.rules
  The rules for the language's non-terminals. This is split into 7 subpackages,
  as there are over 150 Rule subclasses altogether.

compiler.language.ast
  The Abstract Syntax Tree, which is generated by the language parser when
  parsing a source file.
  This package contains only the AST data structure. Each class has a
  constructor, getters, and a toString() method for pretty-printing.

compiler.language.translator.conceptual
  This converts the provided program from an AST to the conceptual hierarchy.
  This mainly involves resolving all of the names that the AST uses, and
  otherwise mapping from the AST to the conceptual hierarchy.

compiler.language.conceptual
  This is a data structure which represents the code more-or-less as the
  programmer meant it. Once a program is in this structure, many checks must
  be performed on it to ensure that it is valid before converting it to a more
  low-level structure.


=====================
  Language Features
=====================

Better Access Specifiers
  Access specifiers in Java can be slightly esoteric. Package was the default
  for historical reasons, and while making protected include it makes sense
  once you think about it from a practical perspective, making package and
  protected separate things which can be combined makes more sense logically.
  Therefore, the access specifiers in this language are:
    * private
    * package
    * protected
    * package protected / protected package
    * public
  Also, the defaults are more sensible. Constructors, methods and properties,
  are public by default, while fields are private.

Better binary compatibility (if possible)
  Allowing the addition of virtual methods and fields in newer versions of a
  program without breaking binary compatibility, using a new "since" modifier.
  To use the since modifier, a version number must be provided, like this:
    protected since(1.12.3) void foo() {}
  This has not yet been implemented, but should be possible to implement by
  sorting the members by their since specifier when writing the fields and
  virtual function tables into code.

Cleaned-up / different syntax
  Several pieces of syntax may be unfamiliar if you are coming from Java:
  * Casting is: cast expression()
  * Final does not apply to methods, instead 'sealed' is used as in C#, in
    order to separate the two different meanings of 'final' in Java
  * if statements, while loops, for loops, etc do not require brackets around
    their conditions, but do require braces around their blocks.
  * switch statements break at the end of each case by default, and fallthrough
    is a keyword like break and continue, which will fall through to the next
    case. Also, an empty switch is possible, in which case boolean conditions
    may be used in case statements. The first case to fire is executed.
  * Assignment and inc/decrements do not return a value, and are statements.
  * &&= and ||= are included as operators (as well as ^^ and ^^= for xor).
  * Tabs are not allowed, as they almost always mess up indentation
    (see the Java source code using non-8-width tabs for an example)

Default arguments
  Default arguments is one of the very nice features of Python. In particular,
  keyword arguments are a much nicer way of handling them than the way they are
  handled in C++.
  To add default arguments to a method, write it as follows:
    void foo(int a, int @b = 3, float @x = 5.0 / 3, String @c = "hello") {}
  Then call the method with the default arguments in any order, and only
  specify the required ones:
    foo(3, @c = "world", @b = 90);
  This is not quite as powerful as python's default arguments, for lack of a
  built-in map data structure, but is equally flexible, and only requires that
  the default expressions/values are known at compile time.

First class functions
  First class functions is a nice feature of many languages, and having them
  here would be no exception. They can provide a much nicer syntax for event
  listeners, as well as a way of performing operations on a collection of
  objects. Combined with generics, they could be extremely useful.
  The syntax for a closure's type is:
    {int, String -> void}
    {void -> boolean}
  They can be assigned from other methods, and can be created inline like this:
    {int, String -> void} myFunction = closure(int a, String b -> void)
    {
      println("" + a + b);
    };
  Calling them is just like calling any other function:
    myFunction(1, "a");
  Unfortunately, there is no plan for first class functions to support default
  arguments.

Generics (with Run-Time Type Information)
  Generics are a very useful feature in the languages they are implemented in.
  This model of generics is based on Java's model, as it allows for flexible
  type systems without being overly powerful.
  One main difference between this and Java's generics is that it will support
  run-time type information, so that arrays of generic objects can be created
  at run time, among other benefits.
  The other difference is that Generics here support primitive types such as
  data types (int, float, etc), tuples and closures as well as just pointer
  types.

Immutability
  This is very similar to const-correctness in C++. The only difference is that
  the associated keywords have a more definitive meaning and are not used to
  mean multiple things.
  The 'final' keyword means that a variable's value cannot be changed, for a
  pointer type this means that it cannot be altered to point at another object.
  The 'immutable' keyword means that a method cannot change the state of the
  object, and can be used on a class to mean that it has a constant state.
  An immutable pointer to an object is denoted by a Hash before the type name:
    #Foo foo = new Foo();
  This immutable pointer to an object can then never be used to modify the
  object's state. Calling a non-immutable method on an immutable object pointer
  results in a compiler error.
  Sometimes, a programmer wishes to break immutability for a good reason, for
  example to cache some data. This is the reason for the 'mutable' keyword:
  any field marked as mutable can be modified even from an immutable method.
  In addition, the elements of an array can be marked as immutable:
    char#[] chars = new char[] {'a', 'b', 'c'};
    #Foo#[] array = new Foo[] {new Foo(), null};

Properties
  This is much like the properties system in C#. A property is defined as follows:
    property int foo = 5;                     // public getter and setter
    property String bar = "a" private setter; // public getter, private setter
    final property unsigned int baz = 2;      // no setter, as it is final
    property Asdf asdf getter { return new Asdf(); } private setter;
  As shown, the getter and setter keywords can be omitted if desired, as
  'public getter public setter' is assumed.
  To assign to or retrieve from a property, it is treated as a variable:
    foo = 4; bar = bar + "."; baz++;
  The main use of properties is that they have virtual getters and setters by
  default. Therefore, a subclass can override a superclass' implementation as
  follows:
    property int foo = 2 setter { foo = value + 3; } getter { return foo; };
  To make it impossible to override a property, the 'sealed' modifier is used
  as it is for methods:
    sealed property int zed = 26;

Return type as a disambiguator
  The return type of a function serves to disambiguate between multiple
  methods with the same name. So the following is valid within a class:
    void foo() {}
    String foo() {}
  To call these methods, the following can be done:
    cast foo();
    (cast<{void -> void}> foo)(); // cast to a closure and call
    String s = foo();
  This is useful where the same interface is implemented on a single class
  twice, each with a different generic type argument which specifies a method
  return type.

Tuples
  The syntax for tuples is as follows. Any number of elements is supported.
    (Asdf, int, String) myTuple = null, -1, "z";
  This is very useful for declaring multiple variables:
    (int, int) x, y;
  Returning multiple values from a function:
    x, y = foo();
  Swapping two variables (provided their types match):
    a, b = b, a;
  As well as various other things.
  Also useful here is a way of throwing values away in an assignment:
    _, x, _ = bar(); // discarding the first and last results of bar()
  There is also a way of indexing an element of a tuple:
    (String, int) t = baz(); String s = t ! 0; int i = t ! 1;

Unsigned and Signed integer types
  Unsigned integer types can be very useful, in addition to signed ones.
  Having unsigned int as an option increases the flexibility of the language,
  and means that users do not have to work around the absence of this feature
  using larger signed data types to store large unsigned value.
    unsigned int a = 2;
    signed int b = -1; // same as: int b = -1;
  The data types are all signed by default, as it causes fewer programming
  errors in loop counters. So the following will work even though an array's
  length is stored unsigned:
    for int i = array.length - 1; i >= 0; i-- { println(i); }

本源码包内暂不包含可直接显示的源代码文件,请下载源码包。