commit 10aafda84c92a8c44ab28a8a33fd439a53504087 Author: NikolajDanger Date: Tue Jun 7 21:22:24 2022 +0200 :sparkles: diff --git a/.vscode/settings.json b/.vscode/settings.json new file mode 100644 index 0000000..a7d0fc7 --- /dev/null +++ b/.vscode/settings.json @@ -0,0 +1,3 @@ +{ + "esbonio.sphinx.confDir": "" +} \ No newline at end of file diff --git a/README.md b/README.md new file mode 100644 index 0000000..6ab536a --- /dev/null +++ b/README.md @@ -0,0 +1,272 @@ +# About +`CENTURION` is the programming language for the modern roman. + +# Documentation + +## Example code +### Hello World +``` +DESIGNA x UT "Hello World!" +DICE x +``` + +### Recursive Fibonacci number function + +``` +DEFINI fib x UT { + SI x EST NULLUS TUNC { + REDI NULLUS + } ALUID SI x EST I TUNC { + REDI I + } ALUID { + REDI (INVOCA fib (x-II) + INVOCA fib (x-I)) + } +} +``` + +### Number guessing game + +``` +VOCA FORS + +DESIGNA correct UT FORTIS_NUMERUS I C +DESIGNA guess UT NULLUS + +DUM FALSITAS FACE { + DESIGNA guess UT AUDI_NUMERUS + SI guess MINUS correct TUNC { + DICE "Too low!" + } ALUID SI guess PLUS correct TUNC { + DICE "Too high!" + } ALUID { + ERUMPE + } +} + +DICE "You guessed correctly!" +``` + +## Variables +Variables are set with the `DESIGNA` and `UT` keywords. Type is inferred. + +``` +DESIGNA x UT XXVI +``` + +Variable can consist of lower-case letters, numbers, as well as `_`. + +## Data types +### NULLUS +`NULLUS` is a special kind of data type in `CENTURION`, similar to the `null` value in many other languages. `NULLUS` can be 0 if evaluated as an int or float, or an empty string if evaluated as a string. `NULLUS` cannot be evaluated as a boolean. + +### Strings + +Strings are written as text in quotes (`'` or `"`). + +``` +DESIGNA x UT "this is a string" +``` + +### Integers +Integers must be written in roman numerals using the following symbols: + +|Symbol|Value| +|------|-----| +|`I`|1| +|`V`|5| +|`X`|10| +|`L`|50| +|`C`|100| +|`D`|500| +|`M`|1000| + +Each of the symbols written by themself is equal to the value of the symbol. Different symbols written from largest to smallest are equal to the sum of the symbols. Two to three of the same symbol written consecutively is equal to the sum of those symbols (only true for `I`s, `X`s, `C`s or `M`s ). A single `I` written before a `V` or `X` is equal to 1 less than the value of the second symbol. Similarly, an `X` written before a `L` or `C` is 10 less than the second symbol, and a `C` written before a `D` or `M` is 100 less than the second symbol. + +Because of the restrictions of roman numerals, numbers above 3.999 are impossible to write in the base `CENTURION` syntax. If numbers of that size are required, see the `MAGNUM` module. + +The number 0 can be expressed with the keyword `NULLUS`. + +#### Negative numbers +Negative numbers can be expressed as `NULLUS` minus the value. For an explicit definition of negative numbers, see the `SUBNULLA` module. + +### Floats +The base `CENTURION` syntax does not allow for floats. However, the `FRACTIO` module adds a syntax for fractions. + +### Booleans +Booleans are denoted with the keywords `VERITAS` for true and `FALSITAS` for false. + +### Arrays +Arrays are defined using square brackets (`[]`). + +## Conditionals +### SI/TUNC +If-then statements are denoted with the keywords `SI` (if) and `TUNC` (then). Thus, the code + +``` +DESIGNA x UT VERITAS +SI x TUNC { + DICE I + REDI NULLLUS +} + +DICE NULLUS + +> I +``` + +Will return `I` (1), as the conditional evaluates `x` to be true. + +### Boolean expressions +In conditionals, `EST` functions as an equality evaluation, and `MINUS` (<) and `PLUS` (>) function as inequality evaluation. + +### ALUID + +When using `SI`/`TUNC` statements, you can also use `ALUID` as an "else". + +``` +DESIGNA x UT VERITAS +SI x TUNC { + DICE I +} ALUID { + DICE NULLUS +} + +> I +``` + +`SI` statements may follow immediately after `ALUID`. + +``` +DESIGNA x UT II +SI x EST I TUNC + DICE I +ALUID SI x EST II TUNC + DICE II +ALUID + DICE III + +> II +``` + +### Boolean operators + +The keyword `ET` can be used as a boolean "and". The keyword `AUT` can be used as a boolean "or". + +``` +DESIGNA x UT VERITAS +DESIGNA y UT FALSITAS +SI x ET y TUNC { + DICE I +} ALUID SI x AUT y TUNC { + DICE II +} ALUID { + DICE III +} + +> II +``` + +## Loops +### DONICUM loops + +``` +DESIGNA x UT NULLUM +DONICUM y UT NULLUM USQUE X FACE { + DESIGNA x UT x + y +} +DICE x + +> XLV +``` + +### DUM loops +``` +DESIGNA x UT NULLUM +DUM x PLUS X FACE { + DESIGNA x UT x+I +} +DICE x + +> XI +``` + +### PER loops +``` +DESIGNA x UT [I, II, III, IV, V] +PER y IN x FACE { + DICE y +} + +> I +> II +> III +> IV +> V +``` + +## Functions +Functions are defined with the `DEFINI` and `UT` keywords. The `REDI` keyword is used to return. `REDI` must have exactly one parameter. `REDI` can also be used to end the program, if used outside of a function. + +Calling a function is done with the `INVOCA` keyword. + +``` +DEFINI square x UT { + REDI (x*x) +} + +DICE (INVOCA square XI) + +> CXXI +``` + +## Built-ins +### DICE +### AUDI +### AUDI_NUMERUS +### ERUMPE +### LONGITUDO + +## Modules +Modules are additions to the base `CENTURION` syntax. They add or change certain features. Modules are included in your code by having + +```VOCA %MODULE NAME%``` + +In the beginning of your source file. + +Unlike many other programming languages with modules, the modules in `CENTURION` are not libraries that can be "imported" from other scripts written in the language. They are features of the compiler, disabled by default. + +### FORS +```VOCA FORS``` + +The `FORS` module allows you to add randomness to your `CENTURION` program. It adds 2 new built-in functions: `FORTIS_NUMERUS int int` and `FORTIS_ELECTIONIS ['a]`. + +`FORTIS_NUMERUS int int` picks a random int in the (inclusive) range of the two given ints. + +`FORTIS_ELECTIONIS ['a]` picks a random element from the given array. `FORTIS_ELECTIONIS array` is identical to ```array[FORTIS_NUMERUS NULLUS ((LONGITUDO array)-I)]```. + +### FRACTIO +```VOCA FRACTIO``` + +The `FRACTIO` module adds floats, in the form of base 12 fractions. + +In the `FRACTIO` module, `.` represents 1/12, `:` represents 1/6 and `S` represents 1/2. The symbols must be written from highest to lowest. So 3/4 would be written as "`S:.`". + +Fractions can be written as an extension of integers. So 3.5 would be "`IIIS`". + +The symbol `|` can be used to denote that the following fraction symbols are 1 "level down" is base 12. So after the first `|`, the fraction symbols denote 144ths instead of 12ths. So 7 and 100/144 would be "`VIIS:|::`", as "7 + 100/144" is also "7+9/12+4/144". + +A single "set" of fraction symbols can only represent up to 11/12, as 12/12 can be written as 1. + +### MAGNUM +```VOCA MAGNUM``` + +`MAGNUM` adds the ability to write integers larger than `MMMCMXCIX` (3.999) in your code, by adding the thousands operator, "`_`". + +When `_` is added _after_ a numeric symbol, the symbol becomes 1.000 times larger. The operator can be added to the same symbol multiple times. So "`V_`" is 5.000, and "`V__`" is 5.000.000. The strict rules for integers still apply, so 4.999 cannot be written as "`IV_`", but must instead be written as "`MV_CMXCIX`". + +All integer symbols except `I` may be given a `_`. + +### SUBNULLA +```VOCA SUBNULLA``` + +The `SUBNULLA` module adds the ability to write negative numbers as `-II` instead of `NULLUS-II`. diff --git a/__pycache__/ast.cpython-310.pyc b/__pycache__/ast.cpython-310.pyc new file mode 100644 index 0000000..dc40acc Binary files /dev/null and b/__pycache__/ast.cpython-310.pyc differ diff --git a/__pycache__/ast_nodes.cpython-310.pyc b/__pycache__/ast_nodes.cpython-310.pyc new file mode 100644 index 0000000..52ca644 Binary files /dev/null and b/__pycache__/ast_nodes.cpython-310.pyc differ diff --git a/__pycache__/lexer.cpython-310.pyc b/__pycache__/lexer.cpython-310.pyc new file mode 100644 index 0000000..dad3329 Binary files /dev/null and b/__pycache__/lexer.cpython-310.pyc differ diff --git a/__pycache__/parser.cpython-310.pyc b/__pycache__/parser.cpython-310.pyc new file mode 100644 index 0000000..6f5c41d Binary files /dev/null and b/__pycache__/parser.cpython-310.pyc differ diff --git a/ast_nodes.py b/ast_nodes.py new file mode 100644 index 0000000..1326c1d --- /dev/null +++ b/ast_nodes.py @@ -0,0 +1,181 @@ +from rply.token import BaseBox + +def rep_join(l): + format_string = ',\n'.join( + [repr(i) if not isinstance(i, str) else i for i in l] + ).replace('\n', '\n ') + + if format_string != "": + format_string = f"\n {format_string}\n" + + return format_string + +class ExpressionStatement(BaseBox): + def __init__(self, expression) -> None: + self.expression = expression + + def __repr__(self) -> str: + return self.expression.__repr__() + + def eval(self, vtable, ftable, modules): + self.expression.eval(vtable, ftable, modules) + return vtable, ftable + +class String(BaseBox): + def __init__(self, value) -> None: + self.value = value + + def __repr__(self): + return f"String({self.value})" + +class Numeral(BaseBox): + def __init__(self, value) -> None: + self.value = value + + def __repr__(self): + return f"Numeral({self.value})" + +class Bool(BaseBox): + def __init__(self, value) -> None: + self.value = value + + def __repr__(self): + return f"Bool({self.value})" + +class ModuleCall(BaseBox): + def __init__(self, module_name) -> None: + self.module_name = module_name + + def __repr__(self) -> str: + return f"{self.module_name}" + +class ID(BaseBox): + def __init__(self, name: str) -> None: + self.name = name + + def __repr__(self) -> str: + return f"ID({self.name})" + +class Designa(BaseBox): + def __init__(self, variable: ID, value) -> None: + self.id = variable + self.value = value + + def __repr__(self) -> str: + id_string = repr(self.id).replace('\n', '\n ') + value_string = repr(self.value).replace('\n', '\n ') + return f"Designa(\n {id_string},\n {value_string}\n)" + + def eval(self, vtable, ftable, modules): + vtable[self.id.name] = self.value.eval(vtable, ftable, modules) + return vtable, ftable + +class Defini(BaseBox): + def __init__(self, name, parameters, statements) -> None: + self.name = name + self.parameters = parameters + self.statements = statements + + def __repr__(self) -> str: + parameter_string = f"parameters([{rep_join(self.parameters)}])" + statements_string = f"statements([{rep_join(self.statements)}])" + def_string = rep_join( + [f"{repr(self.name)}", parameter_string, statements_string] + ) + return f"Defini({def_string})" + +class Redi(BaseBox): + def __init__(self, values) -> None: + self.values = values + + def __repr__(self) -> str: + values_string = f"[{rep_join(self.values)}]" + return f"Redi({values_string})" + +class Nullus(BaseBox): + def __repr__(self) -> str: + return "Nullus()" + + def eval(self, *_): + return 0 + +class BinOp(BaseBox): + def __init__(self, left, right, op) -> None: + self.left = left + self.right = right + self.op = op + + def __repr__(self) -> str: + binop_string = rep_join([self.left, self.right, self.op]) + return f"BinOp({binop_string})" + +class SiStatement(BaseBox): + def __init__(self, test, statements, else_part) -> None: + self.test = test + self.statements = statements + self.else_part = else_part + + def __repr__(self) -> str: + test = repr(self.test) + statements = f"statements([{rep_join(self.statements)}])" + else_part = f"statements([{rep_join(self.else_part)}])" + si_string = rep_join([test, statements, else_part]) + return f"Si({si_string})" + +class DumStatement(BaseBox): + def __init__(self, test, statements) -> None: + self.test = test + self.statements = statements + + def __repr__(self) -> str: + test = repr(self.test) + statements = f"statements([{rep_join(self.statements)}])" + dum_string = rep_join([test, statements]) + return f"Dum({dum_string})" + + def eval(self, vtable, ftable, modules): + while not self.test.eval(vtable, ftable, modules): + pass + + return vtable, ftable + +class Invoca(BaseBox): + def __init__(self, name, parameters) -> None: + self.name = name + self.parameters = parameters + + def __repr__(self) -> str: + parameters_string = f"parameters([{rep_join(self.parameters)}])" + invoca_string = rep_join([self.name, parameters_string]) + return f"Invoca({invoca_string})" + +class BuiltIn(BaseBox): + def __init__(self, builtin, parameters) -> None: + self.builtin = builtin + self.parameters = parameters + + def __repr__(self) -> str: + parameter_string = f"parameters([{rep_join(self.parameters)}])" + builtin_string = rep_join([self.builtin, parameter_string]) + return f"Builtin({builtin_string})" + + def eval(self, vtable, ftable, _): + return None + +class Program(BaseBox): + def __init__(self, module_calls: list[ModuleCall], statements) -> None: + self.modules = module_calls + self.statements = statements + + def __repr__(self) -> str: + modules_string = f"modules([{rep_join(self.modules)}])" + statements_string = f"statements([{rep_join(self.statements)}])" + return f"{modules_string},\n{statements_string}" + + def eval(self): + vtable = {} + ftable = {} + modules = [module.module_name for module in self.modules] + + for statement in self.statements: + vtable, ftable = statement.eval(vtable, ftable, modules) diff --git a/lexer.py b/lexer.py new file mode 100644 index 0000000..9d11dbb --- /dev/null +++ b/lexer.py @@ -0,0 +1,86 @@ +from rply import LexerGenerator + +keyword_tokens = [("KEYWORD_"+i, i) for i in [ + "ALUID", + "DEFINI", + "DESIGNA", + "DONICUM", + "DUM", + "EST", + "FACE", + "FALSITAS", + "INVOCA", + "MINUS", + "NULLUS", + "PER", + "PLUS", + "REDI", + "SI", + "TUNC", + "USQUE", + "UT", + "VERITAS", + "VOCA" +]] + +builtin_tokens = [("BUILTIN", i) for i in [ + "AUDI_NUMERUS", + "AUDI", + "DICE", + "ERUMPE", + "FORTIS_NUMERUS", + "FORTIS_ELECTIONIS", + "LONGITUDO" +]] + +data_tokens = [ + ("DATA_STRING", r"\".*?\""), + ("DATA_NUMERAL", r"[IVXLCDM]+") +] + +module_tokens = [("MODULE", i) for i in [ + "FORS", + "FRACTIO", + "MAGNUM", + "SUBNULLA" +]] + +symbol_tokens = [ + ("SYMBOL_LPARENS", r"\("), + ("SYMBOL_RPARENS", r"\)"), + ("SYMBOL_LBRACKET", r"\["), + ("SYMBOL_RBRACKET", r"\]"), + ("SYMBOL_LCURL", r"\{"), + ("SYMBOL_RCURL", r"\}"), + ("SYMBOL_PLUS", r"\+"), + ("SYMBOL_MINUS", r"\-"), + ("SYMBOL_TIMES", r"\*"), + ("SYMBOL_DIVIDE", r"\/") +] + +whitespace_tokens = [ + ("NEWLINE", r"\n+") +] + +all_tokens = ( + keyword_tokens + + builtin_tokens + + module_tokens + + symbol_tokens + + data_tokens + + whitespace_tokens + + [("ID", r"([a-z]|_)+")] +) + +class Lexer(): + def __init__(self): + self.lexer = LexerGenerator() + + def _add_tokens(self): + for token in all_tokens: + self.lexer.add(*token) + self.lexer.ignore(r" +") + + def get_lexer(self): + self._add_tokens() + return self.lexer.build() diff --git a/main.py b/main.py new file mode 100644 index 0000000..70fb4fb --- /dev/null +++ b/main.py @@ -0,0 +1,31 @@ +from lexer import Lexer +from parser import Parser + +text_input = """ +VOCA FORS + +DESIGNA correct UT FORTIS_NUMERUS I C +DESIGNA guess UT NULLUS + +DUM FALSITAS FACE { + DESIGNA guess UT AUDI_NUMERUS + SI guess MINUS correct TUNC { + DICE "Too low!" + } ALUID SI guess PLUS correct TUNC { + DICE "Too high!" + } ALUID { + ERUMPE + } +} + +DICE "You guessed correctly!" +""" + +lexer = Lexer().get_lexer() +pg = Parser() +pg.parse() +parser = pg.get_parser() + +tokens = lexer.lex(text_input) + +print(parser.parse(tokens)) diff --git a/parser.py b/parser.py new file mode 100644 index 0000000..92953f9 --- /dev/null +++ b/parser.py @@ -0,0 +1,153 @@ +from rply import ParserGenerator + +from lexer import all_tokens +import ast_nodes + +ALL_TOKENS = list(set([i[0] for i in all_tokens])) + +class Parser(): + def __init__(self): + self.pg = ParserGenerator(ALL_TOKENS) + + def parse(self): + @self.pg.production('program : opt_newline module_calls statements') + def program(tokens): + return ast_nodes.Program(tokens[-2], tokens[-1]) + + @self.pg.production('opt_newline : ') + @self.pg.production('opt_newline : NEWLINE') + def opt_newline(_): + return None + + @self.pg.production('module_calls : ') + @self.pg.production('module_calls : module_call NEWLINE module_calls') + def module_calls(calls): + if len(calls) == 0: + return [] + else: + return [calls[0]] + calls[2] + + @self.pg.production('module_call : KEYWORD_VOCA MODULE') + def module_call(tokens): + return ast_nodes.ModuleCall(tokens[1].value) + + @self.pg.production('statements : ') + @self.pg.production('statements : statement NEWLINE statements') + def statements(calls): + if len(calls) == 0: + return [] + else: + return [calls[0]] + calls[2] + + @self.pg.production('statement : KEYWORD_DESIGNA id KEYWORD_UT expression') + def statement_designa(tokens): + return ast_nodes.Designa(tokens[1], tokens[3]) + + @self.pg.production('statement : expression') + def statement_expression(tokens): + return ast_nodes.ExpressionStatement(tokens[0]) + + @self.pg.production('expressions : ') + @self.pg.production('expressions : expression expressions') + def expressions(calls): + if len(calls) == 0: + return [] + else: + return [calls[0]] + calls[1] + + @self.pg.production('ids : ') + @self.pg.production('ids : id ids') + def ids(calls): + if len(calls) == 0: + return [] + else: + return [calls[0]] + calls[1] + + @self.pg.production('expression : id') + def expression_id(tokens): + return tokens[0] + + @self.pg.production('statement : KEYWORD_DEFINI id ids KEYWORD_UT SYMBOL_LCURL opt_newline statements opt_newline SYMBOL_RCURL') + def defini(tokens): + return ast_nodes.Defini(tokens[1], tokens[2], tokens[6]) + + @self.pg.production('statement : KEYWORD_REDI expressions') + def redi(tokens): + return ast_nodes.Redi(tokens[1]) + + @self.pg.production('expression : DATA_STRING') + def expression_string(tokens): + return ast_nodes.String(tokens[0].value) + + @self.pg.production('expression : DATA_NUMERAL') + def expression_numeral(tokens): + return ast_nodes.Numeral(tokens[0].value) + + @self.pg.production('expression : KEYWORD_FALSITAS') + @self.pg.production('expression : KEYWORD_VERITAS') + def expression_bool(tokens): + return ast_nodes.Bool(tokens[0].name == "KEYWORD_VERITAS") + + @self.pg.production('expression : KEYWORD_NULLUS') + def expression_nullus(_): + return ast_nodes.Nullus() + + @self.pg.production('expression : expression SYMBOL_MINUS expression') + @self.pg.production('expression : expression SYMBOL_PLUS expression') + @self.pg.production('expression : expression KEYWORD_EST expression') + @self.pg.production('expression : expression KEYWORD_MINUS expression') + @self.pg.production('expression : expression KEYWORD_PLUS expression') + def binop(tokens): + return ast_nodes.BinOp(tokens[0], tokens[2], tokens[1].name) + + @self.pg.production('expression : BUILTIN expressions') + def expression_builtin(tokens): + return ast_nodes.BuiltIn(tokens[0].value, tokens[1]) + + @self.pg.production("id : ID") + def id_expression(tokens): + return ast_nodes.ID(tokens[0].value) + + @self.pg.production('expression : KEYWORD_INVOCA id expressions') + def invoca(tokens): + return ast_nodes.Invoca(tokens[1], tokens[2]) + + @self.pg.production('statement : si_statement') + def si_statement(tokens): + return tokens[0] + + @self.pg.production('statement : dum_statement') + def dum_statement(tokens): + return tokens[0] + + @self.pg.production('si_statement : KEYWORD_SI expression KEYWORD_TUNC SYMBOL_LCURL opt_newline statements opt_newline SYMBOL_RCURL opt_newline aluid_statement') + def si(tokens): + return ast_nodes.SiStatement(tokens[1], tokens[5], tokens[9]) + + @self.pg.production('dum_statement : KEYWORD_DUM expression KEYWORD_FACE SYMBOL_LCURL opt_newline statements opt_newline SYMBOL_RCURL') + def dum(tokens): + return ast_nodes.DumStatement(tokens[1], tokens[5]) + + @self.pg.production('aluid_statement : ') + def aluid_empty(_): + return None + + @self.pg.production('aluid_statement : KEYWORD_ALUID si_statement') + def aluid_si(tokens): + return [tokens[1]] + + @self.pg.production('aluid_statement : KEYWORD_ALUID SYMBOL_LCURL opt_newline statements opt_newline SYMBOL_RCURL aluid_statement') + def aluid(tokens): + return tokens[3] + + @self.pg.production('expression : SYMBOL_LPARENS expression SYMBOL_RPARENS') + def parens(tokens): + return tokens[1] + + @self.pg.error + def error_handle(token): + raise ValueError(token) + + + def get_parser(self): + return self.pg.build()