Hold up. you can't just have the train take the parenthesis off screen and hand wave away what happens to those cars. What happens when your forced to keep the garbage of a computation because you can't delete anything?
How would one typically implement this tokenization? Pre-pass on the input? My initial thought was to push an operand-terminator token when encountering an operator, but it was unclear to me whether it should be pushed to the stack or the output.
Parsing is usually implemented in two steps: first there's lexing where the input is turned into a sequence of tokens, then there's proper parsing where the sequence of tokens is turned into an abstract syntax tree. This shunting yard algorithm is part of parsing. Lexing happens before that.
For example, this input:
13- 2 *5 + 4
would be lexed into this sequence of tokens:
"13" "-" "2" "*" "5" "+" "4"
You could use the shunting yard algorithm to put these in a convenient postfix order:
"13" "2" "5" "*" "-" "4" "+"
It's really easy to then turn it into an abstract syntax tree (AST). Go through each token in order: if it's a number you make it into a node and push it onto a stack; if it's a binary operator you pop two nodes off the stack and make a node out of those and the binary operator. If you write down this AST as an s-expression you get:
While the adjacent comment provides a proper answer to the broader question I thought I'd answer you more directly. A function that consumes one or more characters and then outputs a token will suffice. You turn the character stream into a token stream (or an altered character stream) on the fly. See lisp reader macros for example.
I'm not familiar with the algorithm and I don't see much explanation on the site (at least on mobile) but AFAICT this is turning parenthesized infix expressions into reverse Polish notation. In other words, it takes human-readable mathematical formulas and converts them into something a simple stack-based machine can compute in one forward pass.
It also appears that separate digits aren't interpreted as decimal numerals (i.e. (1)(3) is the sequence 1,3 and not 13) which can look a bit misleading.
Yeah, it's a parsing algorithm for parsing infix (etc.) expressions. If you've seen a parsing library refer to "precedence climbing" or "operator precedence parsing", it's doing this (or something very similar).
If you want to enter the number `13`, that should be one token, but there's no way to make a `13` token in this UI. You need to stick to single digits for this site to work correctly.
> It also appears that separate digits aren't interpreted as decimal numerals (i.e. (1)(3) is the sequence 1,3 and not 13) which can look a bit misleading.
I didn't find it misleading; it says it operates on tokens, not digits.
This is great! As a kid I used to love HO scale model trains. There is an old movie of me on my birthday looking at my train set with eyes as big as plates. Fast forward, and I love networking and programming. I recognized that networking is just trains on Ethernet or Wifi, but now realize even the programming is just the same. Still making things go places and go around after all this time.
Go to the national train museum in Scranton, PA. I got to see (through critical thinking, not necessarily an outright display) how the modern career of software engineering has its roots in the telegraph lines, built along the rail lines, which eventually became the phone lines they used when timetables broke down. Follow that through to Bell Labs and C, etc....
Note the shunting yard algorithm is an iterative (as opposed to recursive) version of Pratt parsing (and also precedence climbing which is virtually identical). However as normally stated it does not do proper error checking - it will accept totally invalid input.
IIAOPSW | 18 hours ago
(https://en.wikipedia.org/wiki/Reversible_computing)
dcrazy | 18 hours ago
justinpombrio | 17 hours ago
dcrazy | 16 hours ago
justinpombrio | 13 hours ago
For example, this input:
would be lexed into this sequence of tokens: You could use the shunting yard algorithm to put these in a convenient postfix order: It's really easy to then turn it into an abstract syntax tree (AST). Go through each token in order: if it's a number you make it into a node and push it onto a stack; if it's a binary operator you pop two nodes off the stack and make a node out of those and the binary operator. If you write down this AST as an s-expression you get:fc417fc802 | 12 hours ago
alter_igel | 17 hours ago
It also appears that separate digits aren't interpreted as decimal numerals (i.e. (1)(3) is the sequence 1,3 and not 13) which can look a bit misleading.
justinpombrio | 17 hours ago
If you want to enter the number `13`, that should be one token, but there's no way to make a `13` token in this UI. You need to stick to single digits for this site to work correctly.
lelanthran | 15 hours ago
I didn't find it misleading; it says it operates on tokens, not digits.
abd-nh | 17 hours ago
pimlottc | 16 hours ago
dcrazy | 16 hours ago
rembicilious | 16 hours ago
talkingtab | 14 hours ago
xerox13ster | 11 hours ago
timhh | 14 hours ago
That isn't a fundamental limit of the algorithm though; you can easily add error checking, as I did here: https://github.com/Timmmm/expr/blob/1b0aef8f91460974d526b5ba...
I'm not really sure why this is omitted.