资源说明:A failed attempt to write a compiler for a new Java-like programming language. For the rewrite that can compile to native code, see my Plinth repository.
This compiler is for a language of my own invention. It is designed to be
somewhat Java-like, but with features from many languages including C++, C#,
Python, Haskell and probably others.
It will have a lot of features that I would like Java to have, which are listed
in the "Language Features" section below. However, one of the main reasons for
this language is that it will compile to native code (LLVM bitcode is the
current plan), and thus be an alternative to C and C++ for desktop applications.
======================
Compiler Structure
======================
The compiler is divided into several sections.
parser
A basic LR parser library.
parser.lalr
An LALR(1) parser generator, and a sample implementation of the parser
library's interfaces.
compiler.language
The main language package, no source files yet, but will eventually contain
the driver code.
compiler.language.parser
The language's parser.
The tokenizer is called LanguageTokenizer, and transforms the text it reads
into the format expected by the LALR(1) parser.
The set of LALR terminals and non-terminals is contained in the ParseType
enum, and each non-terminal is generated by a Rule subclass in the 'rules'
subpackage.
Running the LALR(1) parser generator on the language's set of rules takes a
while, so a generated parser is also included here, along with the generator
for it, as it needs to be regenerated whenever the rules are changed.
compiler.language.parser.rules
The rules for the language's non-terminals. This is split into 7 subpackages,
as there are over 150 Rule subclasses altogether.
compiler.language.ast
The Abstract Syntax Tree, which is generated by the language parser when
parsing a source file.
This package contains only the AST data structure. Each class has a
constructor, getters, and a toString() method for pretty-printing.
compiler.language.translator.conceptual
This converts the provided program from an AST to the conceptual hierarchy.
This mainly involves resolving all of the names that the AST uses, and
otherwise mapping from the AST to the conceptual hierarchy.
compiler.language.conceptual
This is a data structure which represents the code more-or-less as the
programmer meant it. Once a program is in this structure, many checks must
be performed on it to ensure that it is valid before converting it to a more
low-level structure.
=====================
Language Features
=====================
Better Access Specifiers
Access specifiers in Java can be slightly esoteric. Package was the default
for historical reasons, and while making protected include it makes sense
once you think about it from a practical perspective, making package and
protected separate things which can be combined makes more sense logically.
Therefore, the access specifiers in this language are:
* private
* package
* protected
* package protected / protected package
* public
Also, the defaults are more sensible. Constructors, methods and properties,
are public by default, while fields are private.
Better binary compatibility (if possible)
Allowing the addition of virtual methods and fields in newer versions of a
program without breaking binary compatibility, using a new "since" modifier.
To use the since modifier, a version number must be provided, like this:
protected since(1.12.3) void foo() {}
This has not yet been implemented, but should be possible to implement by
sorting the members by their since specifier when writing the fields and
virtual function tables into code.
Cleaned-up / different syntax
Several pieces of syntax may be unfamiliar if you are coming from Java:
* Casting is: cast expression()
* Final does not apply to methods, instead 'sealed' is used as in C#, in
order to separate the two different meanings of 'final' in Java
* if statements, while loops, for loops, etc do not require brackets around
their conditions, but do require braces around their blocks.
* switch statements break at the end of each case by default, and fallthrough
is a keyword like break and continue, which will fall through to the next
case. Also, an empty switch is possible, in which case boolean conditions
may be used in case statements. The first case to fire is executed.
* Assignment and inc/decrements do not return a value, and are statements.
* &&= and ||= are included as operators (as well as ^^ and ^^= for xor).
* Tabs are not allowed, as they almost always mess up indentation
(see the Java source code using non-8-width tabs for an example)
Default arguments
Default arguments is one of the very nice features of Python. In particular,
keyword arguments are a much nicer way of handling them than the way they are
handled in C++.
To add default arguments to a method, write it as follows:
void foo(int a, int @b = 3, float @x = 5.0 / 3, String @c = "hello") {}
Then call the method with the default arguments in any order, and only
specify the required ones:
foo(3, @c = "world", @b = 90);
This is not quite as powerful as python's default arguments, for lack of a
built-in map data structure, but is equally flexible, and only requires that
the default expressions/values are known at compile time.
First class functions
First class functions is a nice feature of many languages, and having them
here would be no exception. They can provide a much nicer syntax for event
listeners, as well as a way of performing operations on a collection of
objects. Combined with generics, they could be extremely useful.
The syntax for a closure's type is:
{int, String -> void}
{void -> boolean}
They can be assigned from other methods, and can be created inline like this:
{int, String -> void} myFunction = closure(int a, String b -> void)
{
println("" + a + b);
};
Calling them is just like calling any other function:
myFunction(1, "a");
Unfortunately, there is no plan for first class functions to support default
arguments.
Generics (with Run-Time Type Information)
Generics are a very useful feature in the languages they are implemented in.
This model of generics is based on Java's model, as it allows for flexible
type systems without being overly powerful.
One main difference between this and Java's generics is that it will support
run-time type information, so that arrays of generic objects can be created
at run time, among other benefits.
The other difference is that Generics here support primitive types such as
data types (int, float, etc), tuples and closures as well as just pointer
types.
Immutability
This is very similar to const-correctness in C++. The only difference is that
the associated keywords have a more definitive meaning and are not used to
mean multiple things.
The 'final' keyword means that a variable's value cannot be changed, for a
pointer type this means that it cannot be altered to point at another object.
The 'immutable' keyword means that a method cannot change the state of the
object, and can be used on a class to mean that it has a constant state.
An immutable pointer to an object is denoted by a Hash before the type name:
#Foo foo = new Foo();
This immutable pointer to an object can then never be used to modify the
object's state. Calling a non-immutable method on an immutable object pointer
results in a compiler error.
Sometimes, a programmer wishes to break immutability for a good reason, for
example to cache some data. This is the reason for the 'mutable' keyword:
any field marked as mutable can be modified even from an immutable method.
In addition, the elements of an array can be marked as immutable:
char#[] chars = new char[] {'a', 'b', 'c'};
#Foo#[] array = new Foo[] {new Foo(), null};
Properties
This is much like the properties system in C#. A property is defined as follows:
property int foo = 5; // public getter and setter
property String bar = "a" private setter; // public getter, private setter
final property unsigned int baz = 2; // no setter, as it is final
property Asdf asdf getter { return new Asdf(); } private setter;
As shown, the getter and setter keywords can be omitted if desired, as
'public getter public setter' is assumed.
To assign to or retrieve from a property, it is treated as a variable:
foo = 4; bar = bar + "."; baz++;
The main use of properties is that they have virtual getters and setters by
default. Therefore, a subclass can override a superclass' implementation as
follows:
property int foo = 2 setter { foo = value + 3; } getter { return foo; };
To make it impossible to override a property, the 'sealed' modifier is used
as it is for methods:
sealed property int zed = 26;
Return type as a disambiguator
The return type of a function serves to disambiguate between multiple
methods with the same name. So the following is valid within a class:
void foo() {}
String foo() {}
To call these methods, the following can be done:
cast foo();
(cast<{void -> void}> foo)(); // cast to a closure and call
String s = foo();
This is useful where the same interface is implemented on a single class
twice, each with a different generic type argument which specifies a method
return type.
Tuples
The syntax for tuples is as follows. Any number of elements is supported.
(Asdf, int, String) myTuple = null, -1, "z";
This is very useful for declaring multiple variables:
(int, int) x, y;
Returning multiple values from a function:
x, y = foo();
Swapping two variables (provided their types match):
a, b = b, a;
As well as various other things.
Also useful here is a way of throwing values away in an assignment:
_, x, _ = bar(); // discarding the first and last results of bar()
There is also a way of indexing an element of a tuple:
(String, int) t = baz(); String s = t ! 0; int i = t ! 1;
Unsigned and Signed integer types
Unsigned integer types can be very useful, in addition to signed ones.
Having unsigned int as an option increases the flexibility of the language,
and means that users do not have to work around the absence of this feature
using larger signed data types to store large unsigned value.
unsigned int a = 2;
signed int b = -1; // same as: int b = -1;
The data types are all signed by default, as it causes fewer programming
errors in loop counters. So the following will work even though an array's
length is stored unsigned:
for int i = array.length - 1; i >= 0; i-- { println(i); }
本源码包内暂不包含可直接显示的源代码文件,请下载源码包。
English
