This commit is contained in:
NikolajDanger
2022-06-07 21:22:24 +02:00
commit 10aafda84c
10 changed files with 726 additions and 0 deletions

3
.vscode/settings.json vendored Normal file
View File

@@ -0,0 +1,3 @@
{
"esbonio.sphinx.confDir": ""
}

272
README.md Normal file
View File

@@ -0,0 +1,272 @@
# About
`CENTURION` is the programming language for the modern roman.
# Documentation
## Example code
### Hello World
```
DESIGNA x UT "Hello World!"
DICE x
```
### Recursive Fibonacci number function
```
DEFINI fib x UT {
SI x EST NULLUS TUNC {
REDI NULLUS
} ALUID SI x EST I TUNC {
REDI I
} ALUID {
REDI (INVOCA fib (x-II) + INVOCA fib (x-I))
}
}
```
### Number guessing game
```
VOCA FORS
DESIGNA correct UT FORTIS_NUMERUS I C
DESIGNA guess UT NULLUS
DUM FALSITAS FACE {
DESIGNA guess UT AUDI_NUMERUS
SI guess MINUS correct TUNC {
DICE "Too low!"
} ALUID SI guess PLUS correct TUNC {
DICE "Too high!"
} ALUID {
ERUMPE
}
}
DICE "You guessed correctly!"
```
## Variables
Variables are set with the `DESIGNA` and `UT` keywords. Type is inferred.
```
DESIGNA x UT XXVI
```
Variable can consist of lower-case letters, numbers, as well as `_`.
## Data types
### NULLUS
`NULLUS` is a special kind of data type in `CENTURION`, similar to the `null` value in many other languages. `NULLUS` can be 0 if evaluated as an int or float, or an empty string if evaluated as a string. `NULLUS` cannot be evaluated as a boolean.
### Strings
Strings are written as text in quotes (`'` or `"`).
```
DESIGNA x UT "this is a string"
```
### Integers
Integers must be written in roman numerals using the following symbols:
|Symbol|Value|
|------|-----|
|`I`|1|
|`V`|5|
|`X`|10|
|`L`|50|
|`C`|100|
|`D`|500|
|`M`|1000|
Each of the symbols written by themself is equal to the value of the symbol. Different symbols written from largest to smallest are equal to the sum of the symbols. Two to three of the same symbol written consecutively is equal to the sum of those symbols (only true for `I`s, `X`s, `C`s or `M`s ). A single `I` written before a `V` or `X` is equal to 1 less than the value of the second symbol. Similarly, an `X` written before a `L` or `C` is 10 less than the second symbol, and a `C` written before a `D` or `M` is 100 less than the second symbol.
Because of the restrictions of roman numerals, numbers above 3.999 are impossible to write in the base `CENTURION` syntax. If numbers of that size are required, see the `MAGNUM` module.
The number 0 can be expressed with the keyword `NULLUS`.
#### Negative numbers
Negative numbers can be expressed as `NULLUS` minus the value. For an explicit definition of negative numbers, see the `SUBNULLA` module.
### Floats
The base `CENTURION` syntax does not allow for floats. However, the `FRACTIO` module adds a syntax for fractions.
### Booleans
Booleans are denoted with the keywords `VERITAS` for true and `FALSITAS` for false.
### Arrays
Arrays are defined using square brackets (`[]`).
## Conditionals
### SI/TUNC
If-then statements are denoted with the keywords `SI` (if) and `TUNC` (then). Thus, the code
```
DESIGNA x UT VERITAS
SI x TUNC {
DICE I
REDI NULLLUS
}
DICE NULLUS
> I
```
Will return `I` (1), as the conditional evaluates `x` to be true.
### Boolean expressions
In conditionals, `EST` functions as an equality evaluation, and `MINUS` (<) and `PLUS` (>) function as inequality evaluation.
### ALUID
When using `SI`/`TUNC` statements, you can also use `ALUID` as an "else".
```
DESIGNA x UT VERITAS
SI x TUNC {
DICE I
} ALUID {
DICE NULLUS
}
> I
```
`SI` statements may follow immediately after `ALUID`.
```
DESIGNA x UT II
SI x EST I TUNC
DICE I
ALUID SI x EST II TUNC
DICE II
ALUID
DICE III
> II
```
### Boolean operators
The keyword `ET` can be used as a boolean "and". The keyword `AUT` can be used as a boolean "or".
```
DESIGNA x UT VERITAS
DESIGNA y UT FALSITAS
SI x ET y TUNC {
DICE I
} ALUID SI x AUT y TUNC {
DICE II
} ALUID {
DICE III
}
> II
```
## Loops
### DONICUM loops
```
DESIGNA x UT NULLUM
DONICUM y UT NULLUM USQUE X FACE {
DESIGNA x UT x + y
}
DICE x
> XLV
```
### DUM loops
```
DESIGNA x UT NULLUM
DUM x PLUS X FACE {
DESIGNA x UT x+I
}
DICE x
> XI
```
### PER loops
```
DESIGNA x UT [I, II, III, IV, V]
PER y IN x FACE {
DICE y
}
> I
> II
> III
> IV
> V
```
## Functions
Functions are defined with the `DEFINI` and `UT` keywords. The `REDI` keyword is used to return. `REDI` must have exactly one parameter. `REDI` can also be used to end the program, if used outside of a function.
Calling a function is done with the `INVOCA` keyword.
```
DEFINI square x UT {
REDI (x*x)
}
DICE (INVOCA square XI)
> CXXI
```
## Built-ins
### DICE
### AUDI
### AUDI_NUMERUS
### ERUMPE
### LONGITUDO
## Modules
Modules are additions to the base `CENTURION` syntax. They add or change certain features. Modules are included in your code by having
```VOCA %MODULE NAME%```
In the beginning of your source file.
Unlike many other programming languages with modules, the modules in `CENTURION` are not libraries that can be "imported" from other scripts written in the language. They are features of the compiler, disabled by default.
### FORS
```VOCA FORS```
The `FORS` module allows you to add randomness to your `CENTURION` program. It adds 2 new built-in functions: `FORTIS_NUMERUS int int` and `FORTIS_ELECTIONIS ['a]`.
`FORTIS_NUMERUS int int` picks a random int in the (inclusive) range of the two given ints.
`FORTIS_ELECTIONIS ['a]` picks a random element from the given array. `FORTIS_ELECTIONIS array` is identical to ```array[FORTIS_NUMERUS NULLUS ((LONGITUDO array)-I)]```.
### FRACTIO
```VOCA FRACTIO```
The `FRACTIO` module adds floats, in the form of base 12 fractions.
In the `FRACTIO` module, `.` represents 1/12, `:` represents 1/6 and `S` represents 1/2. The symbols must be written from highest to lowest. So 3/4 would be written as "`S:.`".
Fractions can be written as an extension of integers. So 3.5 would be "`IIIS`".
The symbol `|` can be used to denote that the following fraction symbols are 1 "level down" is base 12. So after the first `|`, the fraction symbols denote 144ths instead of 12ths. So 7 and 100/144 would be "`VIIS:|::`", as "7 + 100/144" is also "7+9/12+4/144".
A single "set" of fraction symbols can only represent up to 11/12, as 12/12 can be written as 1.
### MAGNUM
```VOCA MAGNUM```
`MAGNUM` adds the ability to write integers larger than `MMMCMXCIX` (3.999) in your code, by adding the thousands operator, "`_`".
When `_` is added _after_ a numeric symbol, the symbol becomes 1.000 times larger. The operator can be added to the same symbol multiple times. So "`V_`" is 5.000, and "`V__`" is 5.000.000. The strict rules for integers still apply, so 4.999 cannot be written as "`IV_`", but must instead be written as "`MV_CMXCIX`".
All integer symbols except `I` may be given a `_`.
### SUBNULLA
```VOCA SUBNULLA```
The `SUBNULLA` module adds the ability to write negative numbers as `-II` instead of `NULLUS-II`.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

181
ast_nodes.py Normal file
View File

@@ -0,0 +1,181 @@
from rply.token import BaseBox
def rep_join(l):
format_string = ',\n'.join(
[repr(i) if not isinstance(i, str) else i for i in l]
).replace('\n', '\n ')
if format_string != "":
format_string = f"\n {format_string}\n"
return format_string
class ExpressionStatement(BaseBox):
def __init__(self, expression) -> None:
self.expression = expression
def __repr__(self) -> str:
return self.expression.__repr__()
def eval(self, vtable, ftable, modules):
self.expression.eval(vtable, ftable, modules)
return vtable, ftable
class String(BaseBox):
def __init__(self, value) -> None:
self.value = value
def __repr__(self):
return f"String({self.value})"
class Numeral(BaseBox):
def __init__(self, value) -> None:
self.value = value
def __repr__(self):
return f"Numeral({self.value})"
class Bool(BaseBox):
def __init__(self, value) -> None:
self.value = value
def __repr__(self):
return f"Bool({self.value})"
class ModuleCall(BaseBox):
def __init__(self, module_name) -> None:
self.module_name = module_name
def __repr__(self) -> str:
return f"{self.module_name}"
class ID(BaseBox):
def __init__(self, name: str) -> None:
self.name = name
def __repr__(self) -> str:
return f"ID({self.name})"
class Designa(BaseBox):
def __init__(self, variable: ID, value) -> None:
self.id = variable
self.value = value
def __repr__(self) -> str:
id_string = repr(self.id).replace('\n', '\n ')
value_string = repr(self.value).replace('\n', '\n ')
return f"Designa(\n {id_string},\n {value_string}\n)"
def eval(self, vtable, ftable, modules):
vtable[self.id.name] = self.value.eval(vtable, ftable, modules)
return vtable, ftable
class Defini(BaseBox):
def __init__(self, name, parameters, statements) -> None:
self.name = name
self.parameters = parameters
self.statements = statements
def __repr__(self) -> str:
parameter_string = f"parameters([{rep_join(self.parameters)}])"
statements_string = f"statements([{rep_join(self.statements)}])"
def_string = rep_join(
[f"{repr(self.name)}", parameter_string, statements_string]
)
return f"Defini({def_string})"
class Redi(BaseBox):
def __init__(self, values) -> None:
self.values = values
def __repr__(self) -> str:
values_string = f"[{rep_join(self.values)}]"
return f"Redi({values_string})"
class Nullus(BaseBox):
def __repr__(self) -> str:
return "Nullus()"
def eval(self, *_):
return 0
class BinOp(BaseBox):
def __init__(self, left, right, op) -> None:
self.left = left
self.right = right
self.op = op
def __repr__(self) -> str:
binop_string = rep_join([self.left, self.right, self.op])
return f"BinOp({binop_string})"
class SiStatement(BaseBox):
def __init__(self, test, statements, else_part) -> None:
self.test = test
self.statements = statements
self.else_part = else_part
def __repr__(self) -> str:
test = repr(self.test)
statements = f"statements([{rep_join(self.statements)}])"
else_part = f"statements([{rep_join(self.else_part)}])"
si_string = rep_join([test, statements, else_part])
return f"Si({si_string})"
class DumStatement(BaseBox):
def __init__(self, test, statements) -> None:
self.test = test
self.statements = statements
def __repr__(self) -> str:
test = repr(self.test)
statements = f"statements([{rep_join(self.statements)}])"
dum_string = rep_join([test, statements])
return f"Dum({dum_string})"
def eval(self, vtable, ftable, modules):
while not self.test.eval(vtable, ftable, modules):
pass
return vtable, ftable
class Invoca(BaseBox):
def __init__(self, name, parameters) -> None:
self.name = name
self.parameters = parameters
def __repr__(self) -> str:
parameters_string = f"parameters([{rep_join(self.parameters)}])"
invoca_string = rep_join([self.name, parameters_string])
return f"Invoca({invoca_string})"
class BuiltIn(BaseBox):
def __init__(self, builtin, parameters) -> None:
self.builtin = builtin
self.parameters = parameters
def __repr__(self) -> str:
parameter_string = f"parameters([{rep_join(self.parameters)}])"
builtin_string = rep_join([self.builtin, parameter_string])
return f"Builtin({builtin_string})"
def eval(self, vtable, ftable, _):
return None
class Program(BaseBox):
def __init__(self, module_calls: list[ModuleCall], statements) -> None:
self.modules = module_calls
self.statements = statements
def __repr__(self) -> str:
modules_string = f"modules([{rep_join(self.modules)}])"
statements_string = f"statements([{rep_join(self.statements)}])"
return f"{modules_string},\n{statements_string}"
def eval(self):
vtable = {}
ftable = {}
modules = [module.module_name for module in self.modules]
for statement in self.statements:
vtable, ftable = statement.eval(vtable, ftable, modules)

86
lexer.py Normal file
View File

@@ -0,0 +1,86 @@
from rply import LexerGenerator
keyword_tokens = [("KEYWORD_"+i, i) for i in [
"ALUID",
"DEFINI",
"DESIGNA",
"DONICUM",
"DUM",
"EST",
"FACE",
"FALSITAS",
"INVOCA",
"MINUS",
"NULLUS",
"PER",
"PLUS",
"REDI",
"SI",
"TUNC",
"USQUE",
"UT",
"VERITAS",
"VOCA"
]]
builtin_tokens = [("BUILTIN", i) for i in [
"AUDI_NUMERUS",
"AUDI",
"DICE",
"ERUMPE",
"FORTIS_NUMERUS",
"FORTIS_ELECTIONIS",
"LONGITUDO"
]]
data_tokens = [
("DATA_STRING", r"\".*?\""),
("DATA_NUMERAL", r"[IVXLCDM]+")
]
module_tokens = [("MODULE", i) for i in [
"FORS",
"FRACTIO",
"MAGNUM",
"SUBNULLA"
]]
symbol_tokens = [
("SYMBOL_LPARENS", r"\("),
("SYMBOL_RPARENS", r"\)"),
("SYMBOL_LBRACKET", r"\["),
("SYMBOL_RBRACKET", r"\]"),
("SYMBOL_LCURL", r"\{"),
("SYMBOL_RCURL", r"\}"),
("SYMBOL_PLUS", r"\+"),
("SYMBOL_MINUS", r"\-"),
("SYMBOL_TIMES", r"\*"),
("SYMBOL_DIVIDE", r"\/")
]
whitespace_tokens = [
("NEWLINE", r"\n+")
]
all_tokens = (
keyword_tokens +
builtin_tokens +
module_tokens +
symbol_tokens +
data_tokens +
whitespace_tokens +
[("ID", r"([a-z]|_)+")]
)
class Lexer():
def __init__(self):
self.lexer = LexerGenerator()
def _add_tokens(self):
for token in all_tokens:
self.lexer.add(*token)
self.lexer.ignore(r" +")
def get_lexer(self):
self._add_tokens()
return self.lexer.build()

31
main.py Normal file
View File

@@ -0,0 +1,31 @@
from lexer import Lexer
from parser import Parser
text_input = """
VOCA FORS
DESIGNA correct UT FORTIS_NUMERUS I C
DESIGNA guess UT NULLUS
DUM FALSITAS FACE {
DESIGNA guess UT AUDI_NUMERUS
SI guess MINUS correct TUNC {
DICE "Too low!"
} ALUID SI guess PLUS correct TUNC {
DICE "Too high!"
} ALUID {
ERUMPE
}
}
DICE "You guessed correctly!"
"""
lexer = Lexer().get_lexer()
pg = Parser()
pg.parse()
parser = pg.get_parser()
tokens = lexer.lex(text_input)
print(parser.parse(tokens))

153
parser.py Normal file
View File

@@ -0,0 +1,153 @@
from rply import ParserGenerator
from lexer import all_tokens
import ast_nodes
ALL_TOKENS = list(set([i[0] for i in all_tokens]))
class Parser():
def __init__(self):
self.pg = ParserGenerator(ALL_TOKENS)
def parse(self):
@self.pg.production('program : opt_newline module_calls statements')
def program(tokens):
return ast_nodes.Program(tokens[-2], tokens[-1])
@self.pg.production('opt_newline : ')
@self.pg.production('opt_newline : NEWLINE')
def opt_newline(_):
return None
@self.pg.production('module_calls : ')
@self.pg.production('module_calls : module_call NEWLINE module_calls')
def module_calls(calls):
if len(calls) == 0:
return []
else:
return [calls[0]] + calls[2]
@self.pg.production('module_call : KEYWORD_VOCA MODULE')
def module_call(tokens):
return ast_nodes.ModuleCall(tokens[1].value)
@self.pg.production('statements : ')
@self.pg.production('statements : statement NEWLINE statements')
def statements(calls):
if len(calls) == 0:
return []
else:
return [calls[0]] + calls[2]
@self.pg.production('statement : KEYWORD_DESIGNA id KEYWORD_UT expression')
def statement_designa(tokens):
return ast_nodes.Designa(tokens[1], tokens[3])
@self.pg.production('statement : expression')
def statement_expression(tokens):
return ast_nodes.ExpressionStatement(tokens[0])
@self.pg.production('expressions : ')
@self.pg.production('expressions : expression expressions')
def expressions(calls):
if len(calls) == 0:
return []
else:
return [calls[0]] + calls[1]
@self.pg.production('ids : ')
@self.pg.production('ids : id ids')
def ids(calls):
if len(calls) == 0:
return []
else:
return [calls[0]] + calls[1]
@self.pg.production('expression : id')
def expression_id(tokens):
return tokens[0]
@self.pg.production('statement : KEYWORD_DEFINI id ids KEYWORD_UT SYMBOL_LCURL opt_newline statements opt_newline SYMBOL_RCURL')
def defini(tokens):
return ast_nodes.Defini(tokens[1], tokens[2], tokens[6])
@self.pg.production('statement : KEYWORD_REDI expressions')
def redi(tokens):
return ast_nodes.Redi(tokens[1])
@self.pg.production('expression : DATA_STRING')
def expression_string(tokens):
return ast_nodes.String(tokens[0].value)
@self.pg.production('expression : DATA_NUMERAL')
def expression_numeral(tokens):
return ast_nodes.Numeral(tokens[0].value)
@self.pg.production('expression : KEYWORD_FALSITAS')
@self.pg.production('expression : KEYWORD_VERITAS')
def expression_bool(tokens):
return ast_nodes.Bool(tokens[0].name == "KEYWORD_VERITAS")
@self.pg.production('expression : KEYWORD_NULLUS')
def expression_nullus(_):
return ast_nodes.Nullus()
@self.pg.production('expression : expression SYMBOL_MINUS expression')
@self.pg.production('expression : expression SYMBOL_PLUS expression')
@self.pg.production('expression : expression KEYWORD_EST expression')
@self.pg.production('expression : expression KEYWORD_MINUS expression')
@self.pg.production('expression : expression KEYWORD_PLUS expression')
def binop(tokens):
return ast_nodes.BinOp(tokens[0], tokens[2], tokens[1].name)
@self.pg.production('expression : BUILTIN expressions')
def expression_builtin(tokens):
return ast_nodes.BuiltIn(tokens[0].value, tokens[1])
@self.pg.production("id : ID")
def id_expression(tokens):
return ast_nodes.ID(tokens[0].value)
@self.pg.production('expression : KEYWORD_INVOCA id expressions')
def invoca(tokens):
return ast_nodes.Invoca(tokens[1], tokens[2])
@self.pg.production('statement : si_statement')
def si_statement(tokens):
return tokens[0]
@self.pg.production('statement : dum_statement')
def dum_statement(tokens):
return tokens[0]
@self.pg.production('si_statement : KEYWORD_SI expression KEYWORD_TUNC SYMBOL_LCURL opt_newline statements opt_newline SYMBOL_RCURL opt_newline aluid_statement')
def si(tokens):
return ast_nodes.SiStatement(tokens[1], tokens[5], tokens[9])
@self.pg.production('dum_statement : KEYWORD_DUM expression KEYWORD_FACE SYMBOL_LCURL opt_newline statements opt_newline SYMBOL_RCURL')
def dum(tokens):
return ast_nodes.DumStatement(tokens[1], tokens[5])
@self.pg.production('aluid_statement : ')
def aluid_empty(_):
return None
@self.pg.production('aluid_statement : KEYWORD_ALUID si_statement')
def aluid_si(tokens):
return [tokens[1]]
@self.pg.production('aluid_statement : KEYWORD_ALUID SYMBOL_LCURL opt_newline statements opt_newline SYMBOL_RCURL aluid_statement')
def aluid(tokens):
return tokens[3]
@self.pg.production('expression : SYMBOL_LPARENS expression SYMBOL_RPARENS')
def parens(tokens):
return tokens[1]
@self.pg.error
def error_handle(token):
raise ValueError(token)
def get_parser(self):
return self.pg.build()