Have you tried GPT-OSS-120b MXFP4 with reasoning effort set to high? Out of all models I can run within 96GB, it seems to consistently give better results. What exact llama model (+ quant I suppose) is it that you've had better results against, and what did you compare it against, the 120b or 20b variant?
How are you running this? I've had issues with Opencode formulating bad messages when the model runs on llama.cpp. Jinja threw a bunch of errors and GPT-OSS couldn't make tool calls. There's an issue for this on Opencode's repo but seems like it's been waiting or a couple of weeks.
> What exact llama model (+ quant I suppose) is it that you've had better results against
Not llama, but Qwen3-coder-next is on top of my list right now. Q8_K_XL. It's incredible (not just for coding).
Again, you're not specifying what GPT-OSS you're talking about, there are two versions, 20b and 120b. Not to mention if you have a consumer GPU, you're most likely running it with additional quantization too, but you're not saying what version.
> Jinja threw a bunch of errors and GPT-OSS couldn't make tool calls.
This was an issue for a week or two when GPT-OSS initially launched, as none of the inference engines had properly implemented support for it, especially around tool calling. I'm running GPT-OSS-120b MXFP4 with LM Studio and directly with llama.cpp, the recent versions handle it well and I have no errors.
However, when I've tried either 120b or 20b with additional quantization (not the "native" MXFP4 ones), I've seen that they're having troubles with the tool syntax too.
> Not llama
What does your original comment mean then? You said llama was "strictly" better than GPT-OSS, which specific model variant are you talking about or you miswrote somehow?
Looks like a TOS violation to me to scrape google directly like that. While the concept of giving a text only model 'pseudo vision' is clever, I think the solution in its current form is a bit fragile. The SerpAPI, Google Custom Search API, etc. exist for a reason; For anything beyond personal tinkering, this is a liability.
> Looks like a TOS violation to me to scrape google directly like that
If something was built by violating TOS' and you use that to do more TOS violations against the ones who initially did the TOS violations to build the thing, do they cancel out each other?
Not about GPT-OSS specifically, but say you used Gemma for the same purpose instead for this hypothetical.
Coolest thing about it is, its 1 pip install to give your local model the ability to see, do Google Searches and use News, Shopping, Scholar, Maps, Finance, Weather, Flights, Hotels, Translate, Images, Trends etc
I don't get this. Isn't this the same as saying "I taught my 5 year old to calculate integrals, by typing them into Wolfram Alpha"...so the actual relevant cognitive task (integrals in my example, "seeing" in yours) is outsources to an external API.
Why do I need gpt-oss-120B at all in this scenario? Couldn't I just directly call e.g. gemini-3-pro api from the python script?
'Calculating' an integral, is usually done by applying a series of sort of abstract mathematical tricks. There is usually no deeper meaning applied to the solving. If you have profound intuition you can guess the solution to an integral, by 'inspection'.
What part here is the knowing or understanding? Does solving an integral symbolically provide more knowledge than numerically or otherwise?
Understanding the underlying functions themselves and the areas they sweep; has substitution or by-parts, actually provided you with this?
Parent says “I taught my 5yo how to” — this means their 5yo learned a process.
OP says “I taught LLM how to see” and this should mean the LLM (which is capable of being taught/learning) internalized how to. It did not, it was given a tool that does seeing and tells it what things are.
People are very interested in getting good local LLMs with vision integrated, and so they want to read about it. Next to nobody would click on the honest “I enabled an LLM to use a Google service to identify objects in images”, which is what OP actually did.
I can second this...Been trying to get local LLMs to play through Pokemon Emerald (with virtually 0 success).
I'm under the impression I'm being hampered by a separation of 'brain' and 'eyes', as I have yet to find a reasoning + vision local model that fits on my Mac, and played with two instances of qwen (vision and reasoning) to try to solve, but no real breakthroughs yet. The requirements I've given myself are fully local models, and no reading data from the ROM that the human player cannot be aware of.
I was hoping OP was able to retro-fit vision onto blind models, not just offload it to a cloud model. It's still an interesting write-up, but I for sure got click-baited
[I teach Math in the first year of the university in Argentina. We have a few Calculus courses, with different levels according to the degree.]
In 1D, substitution by linear functions like "t=3x+1" is very insightful. It's a pity that sometimes we don't have time to analyze it more deeply. Other substitutions may be insightful or not. Some tricks like "t=sin(x)" has a nice geometrical interpretation, but it's never explained, we don't teach it anyway now.
Integration by parts is not very insightful until you get to the 3rd or 4th year and learn Solovev spaces or advanced Electrodynamics. I'd like to drop it, but other courses require it and I'd be fired.
In some cases, parity and other symmetries are interesting, but those tricks are mostly teach in Physics than in Math.
Also, in the second year we get 2D or 3D integrals, that have a lot of interesting variable changes. Also, things like the Gauss theorem and it's relation with conservation laws.
Confused as to why you wouldn’t integrate a local vlm if you want a local llm as the backbone. Plenty of 8b - 30b vlms out there that are visually competent.
TZubiri | 15 hours ago
embedding-shape | 14 hours ago
magic_hamster | 13 hours ago
> What exact llama model (+ quant I suppose) is it that you've had better results against
Not llama, but Qwen3-coder-next is on top of my list right now. Q8_K_XL. It's incredible (not just for coding).
embedding-shape | 13 hours ago
> Jinja threw a bunch of errors and GPT-OSS couldn't make tool calls.
This was an issue for a week or two when GPT-OSS initially launched, as none of the inference engines had properly implemented support for it, especially around tool calling. I'm running GPT-OSS-120b MXFP4 with LM Studio and directly with llama.cpp, the recent versions handle it well and I have no errors.
However, when I've tried either 120b or 20b with additional quantization (not the "native" MXFP4 ones), I've seen that they're having troubles with the tool syntax too.
> Not llama
What does your original comment mean then? You said llama was "strictly" better than GPT-OSS, which specific model variant are you talking about or you miswrote somehow?
[OP] vkaufmann | 10 hours ago
embedding-shape | 8 hours ago
[OP] vkaufmann | 7 hours ago
N_Lens | 15 hours ago
speedgoose | 15 hours ago
peddling-brink | 15 hours ago
mt42or | 14 hours ago
interloxia | 13 hours ago
https://news.ycombinator.com/item?id=46329109
embedding-shape | 14 hours ago
If something was built by violating TOS' and you use that to do more TOS violations against the ones who initially did the TOS violations to build the thing, do they cancel out each other?
Not about GPT-OSS specifically, but say you used Gemma for the same purpose instead for this hypothetical.
cheschire | 12 hours ago
https://en.wikipedia.org/wiki/Clean_hands
svnt | 8 hours ago
If you do it and give it away they will come for you.
[OP] vkaufmann | 10 hours ago
[OP] vkaufmann | 10 hours ago
Easiest and fastest way and the impact is massive
tanduv | 15 hours ago
magic_hamster | 13 hours ago
But wasn't it Google Lens that actually identified them?
l1am0 | 12 hours ago
Why do I need gpt-oss-120B at all in this scenario? Couldn't I just directly call e.g. gemini-3-pro api from the python script?
reedf1 | 12 hours ago
What part here is the knowing or understanding? Does solving an integral symbolically provide more knowledge than numerically or otherwise?
Understanding the underlying functions themselves and the areas they sweep; has substitution or by-parts, actually provided you with this?
svnt | 8 hours ago
OP says “I taught LLM how to see” and this should mean the LLM (which is capable of being taught/learning) internalized how to. It did not, it was given a tool that does seeing and tells it what things are.
People are very interested in getting good local LLMs with vision integrated, and so they want to read about it. Next to nobody would click on the honest “I enabled an LLM to use a Google service to identify objects in images”, which is what OP actually did.
dpoloncsak | 8 hours ago
I'm under the impression I'm being hampered by a separation of 'brain' and 'eyes', as I have yet to find a reasoning + vision local model that fits on my Mac, and played with two instances of qwen (vision and reasoning) to try to solve, but no real breakthroughs yet. The requirements I've given myself are fully local models, and no reading data from the ROM that the human player cannot be aware of.
I was hoping OP was able to retro-fit vision onto blind models, not just offload it to a cloud model. It's still an interesting write-up, but I for sure got click-baited
gus_massa | 7 hours ago
In 1D, substitution by linear functions like "t=3x+1" is very insightful. It's a pity that sometimes we don't have time to analyze it more deeply. Other substitutions may be insightful or not. Some tricks like "t=sin(x)" has a nice geometrical interpretation, but it's never explained, we don't teach it anyway now.
Integration by parts is not very insightful until you get to the 3rd or 4th year and learn Solovev spaces or advanced Electrodynamics. I'd like to drop it, but other courses require it and I'd be fired.
In some cases, parity and other symmetries are interesting, but those tricks are mostly teach in Physics than in Math.
Also, in the second year we get 2D or 3D integrals, that have a lot of interesting variable changes. Also, things like the Gauss theorem and it's relation with conservation laws.
leumon | 11 hours ago
[OP] vkaufmann | 10 hours ago
leumon | 9 hours ago
[OP] vkaufmann | 5 hours ago
vessenes | 8 hours ago
[OP] vkaufmann | 7 hours ago
villgax | 8 hours ago