Luma 0.1.0
A low-level compiled alternative to C, C++, and more!
Loading...
Searching...
No Matches
Classes | Enumerations | Functions
lexer.h File Reference

Tokenizer (lexer) definitions and API for lexical analysis of source code. More...

#include "../c_libs/memory/memory.h"
#include <stdint.h>
Include dependency graph for lexer.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

struct  Lexer
 Lexer state object for scanning source code. More...
 
struct  Token
 Represents a single token extracted by the lexer. More...
 
struct  SymbolEntry
 Maps symbol text to token type for quick lookup. More...
 
struct  KeywordEntry
 Maps keyword text to token type for quick lookup. More...
 

Enumerations

enum  LumaTokenType {
  TOK_EOF , TOK_ERROR , TOK_IDENTIFIER , TOK_KEYWORD ,
  TOK_NUMBER , TOK_NUM_FLOAT , TOK_STRING , TOK_CHAR_LITERAL ,
  TOK_INT , TOK_DOUBLE , TOK_UINT , TOK_FLOAT ,
  TOK_BOOL , TOK_STRINGT , TOK_VOID , TOK_CHAR ,
  TOK_IF , TOK_ELIF , TOK_ELSE , TOK_LOOP ,
  TOK_RETURN , TOK_BREAK , TOK_CONTINUE , TOK_STRUCT ,
  TOK_ENUM , TOK_MOD , TOK_IMPORT , TOK_TRUE ,
  TOK_FALSE , TOK_PUBLIC , TOK_PRIVATE , TOK_VAR ,
  TOK_CONST , TOK_FN , TOK_PRINT , TOK_PRINTLN ,
  TOK_INPUT , TOK_ALLOC , TOK_FREE , TOK_CAST ,
  TOK_SIZE_OF , TOK_AS , TOK_DEFER , TOK_IN ,
  TOK_SWITCH , TOK_SYSTEM , TOK_IMPL , TOK_SYSCALL ,
  TOK_STATIC , TOK_USE , TOK_OS , TOK_LINK ,
  TOK_RETURNES_OWNERSHIP , TOK_TAKES_OWNERSHIP , TOK_DLL_IMPORT , TOK_LIB_IMPORT ,
  TOK_SYMBOL , TOK_LPAREN , TOK_RPAREN , TOK_LBRACE ,
  TOK_RBRACE , TOK_LBRACKET , TOK_RBRACKET , TOK_SEMICOLON ,
  TOK_COMMA , TOK_DOT , TOK_AT , TOK_EQUAL ,
  TOK_PLUS , TOK_MINUS , TOK_STAR , TOK_SLASH ,
  TOK_LT , TOK_GT , TOK_LE , TOK_GE ,
  TOK_EQEQ , TOK_NEQ , TOK_AMP , TOK_PIPE ,
  TOK_CARET , TOK_TILDE , TOK_AND , TOK_OR ,
  TOK_RESOLVE , TOK_COLON , TOK_BANG , TOK_QUESTION ,
  TOK_PLUSPLUS , TOK_MINUSMINUS , TOK_SHIFT_LEFT , TOK_SHIFT_RIGHT ,
  TOK_RANGE , TOK_RIGHT_ARROW , TOK_LEFT_ARROW , TOK_ELLIPSIS ,
  TOK_MODL , TOK_WHITESPACE , TOK_COMMENT , TOK_DOC_COMMENT ,
  TOK_MODULE_DOC , TOK_DOCUMENT
}
 Enumeration of all possible token types recognized by the lexer. More...
 

Functions

void report_lexer_error (Lexer *lx, const char *error_type, const char *file, const char *msg, const char *line_text, int line, int col, int tk_length)
 Reports a lexer error by adding an error to the global error list.
 
const char * get_line_text_from_source (const char *source, int target_line)
 Retrieves the text of a specific line from the source code.
 
void init_lexer (Lexer *lexer, const char *source, ArenaAllocator *arena)
 Initializes the lexer with source code and memory arena.
 
Token next_token (Lexer *lexer)
 Returns the next token parsed from the source code.
 

Detailed Description

Tokenizer (lexer) definitions and API for lexical analysis of source code.

Provides token types, lexer state, token structure, and functions to initialize a lexer, retrieve tokens, and report lexer errors.

Enumeration Type Documentation

◆ LumaTokenType

Enumeration of all possible token types recognized by the lexer.

Enumerator
TOK_EOF 

End of file/input

TOK_ERROR 

Error token

TOK_IDENTIFIER 

Identifier (variable/function names)

TOK_KEYWORD 

Reserved keyword

TOK_NUMBER 

Numeric literal

TOK_NUM_FLOAT 

Floating point numeric literal

TOK_STRING 

String literal

TOK_CHAR_LITERAL 

Character literal

TOK_INT 

int

TOK_DOUBLE 

double

TOK_UINT 

unsigned int

TOK_FLOAT 

float

TOK_BOOL 

bool

TOK_STRINGT 

str (string type)

TOK_VOID 

void

TOK_CHAR 

char

TOK_IF 

if keyword

TOK_ELIF 

elif keyword

TOK_ELSE 

else keyword

TOK_LOOP 

loop keyword

TOK_RETURN 

return keyword

TOK_BREAK 

break keyword

TOK_CONTINUE 

continue keyword

TOK_STRUCT 

struct keyword

TOK_ENUM 

enum keyword

TOK_MOD 

mod keyword

TOK_IMPORT 

import keyword

TOK_TRUE 

true keyword

TOK_FALSE 

false keyword

TOK_PUBLIC 

pub keyword

TOK_PRIVATE 

private keyword

TOK_VAR 

let keyword

TOK_CONST 

const keyword

TOK_FN 

fn keyword

TOK_PRINT 

output keyword

TOK_PRINTLN 

println keyword

TOK_INPUT 

input keyword

TOK_ALLOC 

alloc(size_t size)

TOK_FREE 

free(void *ptr, size_t size)

TOK_CAST 

cast<Type>(value you want to cast too)

TOK_SIZE_OF 

size_of<TYPE>

TOK_AS 

as keyword (for use in modules)

TOK_DEFER 

defer keyword

TOK_IN 

in keyword

TOK_SWITCH 

switch keyword

TOK_SYSTEM 

system keyword

TOK_IMPL 

implement keyword

TOK_SYSCALL 
TOK_STATIC 
TOK_USE 

@use

TOK_OS 

@os

TOK_LINK 
TOK_RETURNES_OWNERSHIP 
TOK_TAKES_OWNERSHIP 

#returns_ownership

TOK_DLL_IMPORT 

#takes_ownership

TOK_LIB_IMPORT 

#dll_import

TOK_SYMBOL 

#lib_import Fallback symbol

TOK_LPAREN 

(

TOK_RPAREN 

)

TOK_LBRACE 

{

TOK_RBRACE 

}

TOK_LBRACKET 

[

TOK_RBRACKET 

]

TOK_SEMICOLON 

;

TOK_COMMA 

,

TOK_DOT 
TOK_AT 

@

TOK_EQUAL 

=

TOK_PLUS 
TOK_MINUS 
TOK_STAR 
TOK_SLASH 

/

TOK_LT 

<

TOK_GT 

TOK_LE 

<=

TOK_GE 

>=

TOK_EQEQ 

==

TOK_NEQ 

!=

TOK_AMP 

&

TOK_PIPE 

|

TOK_CARET 

^

TOK_TILDE 

~

TOK_AND 

&&

TOK_OR 

||

TOK_RESOLVE 

::

TOK_COLON 

:

TOK_BANG 

!

TOK_QUESTION 

?

TOK_PLUSPLUS 

++

TOK_MINUSMINUS 

TOK_SHIFT_LEFT 

<<

TOK_SHIFT_RIGHT 

TOK_RANGE 

..

TOK_RIGHT_ARROW 

->

TOK_LEFT_ARROW 

<-

TOK_ELLIPSIS 

...

TOK_MODL 

%

TOK_WHITESPACE 

whitespace

TOK_COMMENT 

comment

TOK_DOC_COMMENT 

/// documentation comment

TOK_MODULE_DOC 

//! module documentation comment

TOK_DOCUMENT 

Function Documentation

◆ get_line_text_from_source()

const char * get_line_text_from_source ( const char *  source,
int  target_line 
)

Retrieves the text of a specific line from the source code.

Parameters
sourceFull source code string
target_lineThe 1-based line number to extract
Returns
Pointer to static buffer containing the line text

Retrieves the text of a specific line from the source code.

Parameters
sourceFull source code string
target_lineLine number to extract (1-based)
Returns
Pointer to static buffer containing the line text

◆ init_lexer()

void init_lexer ( Lexer lexer,
const char *  source,
ArenaAllocator arena 
)

Initializes the lexer with source code and memory arena.

Parameters
lexerPointer to Lexer to initialize
sourceSource code string
arenaArena allocator to use for memory allocations

Initializes the lexer with source code and memory arena.

Parameters
lexerPointer to Lexer struct to initialize
sourceSource code string
arenaArena allocator to allocate tokens and strings

◆ next_token()

Token next_token ( Lexer lx)

Returns the next token parsed from the source code.

Parameters
lexerPointer to initialized Lexer
Returns
The next Token found in the input stream

Returns the next token parsed from the source code.

Parameters
lxPointer to Lexer instance
Returns
Next token found in the input

◆ report_lexer_error()

void report_lexer_error ( Lexer lx,
const char *  error_type,
const char *  file,
const char *  msg,
const char *  line_text,
int  line,
int  col,
int  tk_length 
)

Reports a lexer error by adding an error to the global error list.

Parameters
lxPointer to Lexer
error_typeString describing the type of error
fileFile path of the source file
msgError message string
line_textSource code line text where error occurred
lineLine number of error
colColumn number of error
tk_lengthLength of the erroneous token

Reports a lexer error by adding an error to the global error list.

Parameters
lxLexer instance pointer
error_typeDescription of the error type
fileSource file path
msgError message
line_textSource code line text where error occurred
lineLine number of the error
colColumn number of the error
tk_lengthLength of the token causing the error