Lexer
Lexical tokenization is conversion of a text into (semantically or syntactically) meaningful lexical tokens belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives, punctuations etc. In case of a programming language, the categories include identifiers, operators, grouping symbols, data types and language keywords. Lexical tokenization is related to the type of tokenization used in large language models (LLMs) but with two differences. First, lexical tokenization is usually based on a lexical grammar, whereas LLM tokenizers are usually probability-based. Second, LLM tokenizers perform a second step that converts the tokens into numerical values.
ENTRAILLES
- 2025-10-31T00:00:00.000000Z
Longue Vie
- 2024-11-15T00:00:00.000000Z
NERD TERRORIST
- 2023-05-26T00:00:00.000000Z
C.E.O
- 2025-08-22T00:00:00.000000Z
Les yeux
- 2025-02-14T00:00:00.000000Z
CONSULTATION
- 2024-03-29T00:00:00.000000Z
Cologne
- 2023-08-21T00:00:00.000000Z
Dead Again
- 2022-06-19T00:00:00.000000Z
GTFB
- 2022-05-20T00:00:00.000000Z
Jetlag
- 2022-03-18T00:00:00.000000Z
10K
- 2022-01-03T00:00:00.000000Z
Similar Artists