phreda4 has been doing cool stuff with ColorForth-likes for ages and for some reason barely gets any attention for it. Always brings a smile to my face to see it submitted here
Inspired by this article, I tried to read some tutorials on Forth. My question is whether concatenative languages are AI-coding friendly. Apart from the training data availability, the question is also whether LLMs can correctly understand long flows of concatenated operations. Any ideas?
Then you get to definitions like ": open ( string -- handle 1 | 0) ... ;" which describes returning algebraic type Maybe Handle unboxed on the stack. Algebraic types are fun, they can easily represent Peano arithmetic and get us into the realm Goedel incompleteness theorem very quickly.
Or you can deduce signature for EXEC EXEC sequence. EXEC's stack effect can be described as ( \alpha (\alpha -- \beta) -- \beta), where \greekletter is a placeholder for a stack part of arbitrary length. Notice that this type comment has nested brackets and does not adhere to Forth stack-effect comment convention.
When I thought about this more than fifteen years ago, I've got at least two equally valid types for the EXEC EXEC: one where xt at top of stack consumes all its input and leaves no output ( \alpha (\alpha -- \gamma) \beta (\beta -- ) -- \gamma) and when first EXEC produces something for second to execute upon ( \alpha \beta (\beta -- \gamma (\alpha \gamma -- \theta) -- \theta).
One can argue that second type of EXEC EXEC subsume first one, if greek-letter-named-stack-parts are allowed to be empty.
Still it shows that typing Forth, at the very least, needs unification on the Peano's arithmetic level, implementing deduction from length zero to unbounded length.
So, in my opinion, for LLM to dependably combine typed Forth/concatenative definitions, it needs to call external tool like Prolog to properly deduce type(s) of the sequence of Forth's (or concatenative language's) definitions.
This is a Prolog system to learn programs in polynomial time. For one example, it can one-shot-learn a grammar, without being "trained" on millions of samples.
So, should one use a LLM that either needs a paid access or just slow to run, or freely go where "old school" systems like Eurisco [1] and Cyc went?
Eurisco demonstrated superhuman abilities in 1982-83. It also demonstrated knowledge transfer at the time, where rules from VLSI place-and-route algorithms were used to design winning Traveler TCS fleet.
They can produce idioms that resemble the flow of Forth code but when asked to produce a working algorithm, they get lost very quickly because there's a combination of reading "backwards" (push order) and forwards (execution order) needed to maintain context. At any time a real Forth program may inject a word into the stack flow that completely alters the meaning of following words, so reading and debugging Forth are nearly the same thing - you have to walk through the execution step by step unless you've intentionally made patterns that will decouple context - and when you do, you've also entered into developing syntax and the LLM won't have training data on that.
I suggest using Rosetta Code as a learning resource for Forth idioms.
Thanks for your reply.
In fact, I've grown tired of programming by myself — I do 95% of my coding with Claude Code. But the remaining 5% of bugs can't be solved by the AI agent, which forces me to step in myself. In those cases, I'm thrown into a codebase I've never touched before, and code readability becomes key. That's what drew me to this article and to Forth.
I would look into the Rosetta.
Not relevant for modern LLMs, but concatenative, stack-based languages are very good at genetic programming applications. Concatenation tends to yield more viable programs when things are being mutated over time.
Don't use AI, it writes Forth like it writes C. It has got better at following Standard, in Gforth style, but it is awful at the spirit of Forth: factoring programs into a vocabulary of tiny, reusable pieces.
I posted a Forth programming challenge. I was very disappointed to get two AI answers and one human. I think the humans sussed out the solution and described an algorithm to Opus, but, the AI strategy produced one large page-filling word.
A top-level word filling one page, doing everything there except some subroutines mimicking C Standard Library.
In Forth, that chunk ought to be many smaller words. Heck, even in C (at least it fit in a page.)
Perhaps you just haven't used the correct AI yet? Perhaps none of us have in that Forth doesn't have much of a large dataset to train from?
Can you link to the programming challenge? It would be interesting to see if recursive language models that use double-blind latent space might work better.
Shame it didn't keep the coolest part of colorForth - the colors!
You change the meaning of word by changing their colors (is it a a 'runtime' function, macro or number? green, cyan or yellow), and the when you input the colors you also let the editor in a sense pre-compile the code so the interpreter becomes insanely fast.
The colors are in reality a byte prefix that acts as an index into a jump table so hardly any interpreting needs to happen, almost like a half-jit'ed language.
Also uses a weird encoding for text instead of ascii - it's a variable sized shannon encoding to make the most frequent english characters take fewer bits, from 4 to 7 bits.
This is imo the real spirit of Forth - simplify, simplify, simplify, make it an exact custom fit for your needs, screw standards.
not the author but afaiu r3 uses the "color" concept:
tokens are tagged by type via 8bits (number literal, string, word call, word address, base word, …)
and the interpreter dispatches using these bits
it just doesn't use the colors visually in the editor and uses prefixes instead (" for string, : for code definition, ' for address of a word, …) which also means the representation in the editor matches that of the r3 source in files.
vanderZwan | a day ago
tkdc926 | a day ago
tl2do | a day ago
adastra22 | 23 hours ago
thesz | 22 hours ago
Or you can deduce signature for EXEC EXEC sequence. EXEC's stack effect can be described as ( \alpha (\alpha -- \beta) -- \beta), where \greekletter is a placeholder for a stack part of arbitrary length. Notice that this type comment has nested brackets and does not adhere to Forth stack-effect comment convention.
When I thought about this more than fifteen years ago, I've got at least two equally valid types for the EXEC EXEC: one where xt at top of stack consumes all its input and leaves no output ( \alpha (\alpha -- \gamma) \beta (\beta -- ) -- \gamma) and when first EXEC produces something for second to execute upon ( \alpha \beta (\beta -- \gamma (\alpha \gamma -- \theta) -- \theta).
One can argue that second type of EXEC EXEC subsume first one, if greek-letter-named-stack-parts are allowed to be empty.
Still it shows that typing Forth, at the very least, needs unification on the Peano's arithmetic level, implementing deduction from length zero to unbounded length.
So, in my opinion, for LLM to dependably combine typed Forth/concatenative definitions, it needs to call external tool like Prolog to properly deduce type(s) of the sequence of Forth's (or concatenative language's) definitions.
And here we enter a realm interesting in itself.
Here it is: https://github.com/stassa/louise
This is a Prolog system to learn programs in polynomial time. For one example, it can one-shot-learn a grammar, without being "trained" on millions of samples.
So, should one use a LLM that either needs a paid access or just slow to run, or freely go where "old school" systems like Eurisco [1] and Cyc went?
[1] https://en.wikipedia.org/wiki/Eurisko
Eurisco demonstrated superhuman abilities in 1982-83. It also demonstrated knowledge transfer at the time, where rules from VLSI place-and-route algorithms were used to design winning Traveler TCS fleet.
QuesnayJr | 11 hours ago
lioeters | 10 hours ago
https://www.sciencedirect.com/science/article/pii/S157106610...
adastra22 | 9 hours ago
crq-yml | 23 hours ago
I suggest using Rosetta Code as a learning resource for Forth idioms.
tl2do | 22 hours ago
artemonster | 21 hours ago
bob1029 | 13 hours ago
FarmerPotato | 19 hours ago
Don't use AI, it writes Forth like it writes C. It has got better at following Standard, in Gforth style, but it is awful at the spirit of Forth: factoring programs into a vocabulary of tiny, reusable pieces.
I posted a Forth programming challenge. I was very disappointed to get two AI answers and one human. I think the humans sussed out the solution and described an algorithm to Opus, but, the AI strategy produced one large page-filling word.
A top-level word filling one page, doing everything there except some subroutines mimicking C Standard Library.
In Forth, that chunk ought to be many smaller words. Heck, even in C (at least it fit in a page.)
shawn_w | 11 hours ago
mycall | 7 hours ago
Can you link to the programming challenge? It would be interesting to see if recursive language models that use double-blind latent space might work better.
vanderZwan | 6 hours ago
Well, being terse as heck is the point of Forth so of course the dataset isn't large /j.
More seriously, I think the bigger issue is that Forth isn't exactly uniform. It is so moldable that everyone has their own style
dharmatech | 17 hours ago
https://youtu.be/giLsd-bik6A?si=Cfh5eeWZ2re7ji4C
bingobang | 13 hours ago
Shame it didn't keep the coolest part of colorForth - the colors! You change the meaning of word by changing their colors (is it a a 'runtime' function, macro or number? green, cyan or yellow), and the when you input the colors you also let the editor in a sense pre-compile the code so the interpreter becomes insanely fast.
The colors are in reality a byte prefix that acts as an index into a jump table so hardly any interpreting needs to happen, almost like a half-jit'ed language.
Also uses a weird encoding for text instead of ascii - it's a variable sized shannon encoding to make the most frequent english characters take fewer bits, from 4 to 7 bits.
This is imo the real spirit of Forth - simplify, simplify, simplify, make it an exact custom fit for your needs, screw standards.
[OP] tosh | 12 hours ago
tokens are tagged by type via 8bits (number literal, string, word call, word address, base word, …)
and the interpreter dispatches using these bits
it just doesn't use the colors visually in the editor and uses prefixes instead (" for string, : for code definition, ' for address of a word, …) which also means the representation in the editor matches that of the r3 source in files.
vanderZwan | 6 hours ago
nickcw | 12 hours ago
That is like making a lisp without macros - it takes away a lot of the fun.
I suspect the reason is because it compiles the code in one step whereas immediate words need to run at compile time.