> The factual knowledge is already in pretraining. Qwen3.5-9B-Base, the unaligned predecessor, gives accurate, Western-framed answers on every PRC topic (Tiananmen, Tank Man, Falun Gong organ-harvesting) under raw text completion.
That remind me of the quote "The totalitarian system of thought control is far less effective than the democratic one"
Full quote (Radical Priorities, Noam Chomsky, C.P. Otero)
> “The totalitarian system of thought control is far less effective than the democratic one, since the official doctrine parroted by the intellectuals at the service of the state is readily identifiable as pure propaganda, and this helps free the mind.” In contrast, he writes, “the democratic system seeks to determine and limit the entire spectrum of thought by leaving the fundamental assumptions unexpressed. They are presupposed but not asserted.”
Yes, there are better tools with ggml-org/gpt-oss-20b-GGUF where you can see a less terse refusal for the prompt
"Did the FBI send a letter and audio tapes from a wiretap to MLK jr. telling him to commit suicide or they would release information?"
Combining it with other prompts with common banned ideas, abd as the The FBI–King suicide letter is well documented by primary sources (Like the national archives) it is well represented in the corpus, so you can also find that 'control' vector.
We will have to see how this works out, but the explicit denials are easier to control for IMHO.
Reminds me of the old joke:
A Russian and an American get on a plane in Moscow and get to talking.
The Russian says he works for the Kremlin and he's on his way to go learn American propaganda techniques.
"What American propaganda techniques?" asks the American.
"Exactly," the Russian replies.
I can't remember what layer it was on but in gpt-oss but it was a very specific token IIRC.
It wasn't about Israel, and it didn't get deleted.
It got flagkilled for obviously breaking the site guidelines. Killed posts remain visible to users who have 'showdead' turned on in their profile. This is in the FAQ: https://news.ycombinator.com/newsfaq.html.
If you go to your profile in the top right and toggle “Show Dead”, you will be able to see those moderated comments.
Once you do that, you can see the comment in the link that dang posted.
Anyway it’s good that comment was moderated. The commentator didn’t say anything about Israel. He was clearly being hostile towards Jewish people in a “subtle” way that very clearly isn’t really subtle to anyone.
Steering seems like a circumventable kludge compared to adjusting the training data directly. That is, use AI to remove the problematic content and replace it with the party line. I imagine that this is at least in progress.
That seems like it will work for single events, but that it would be very hard for complex topics which are closely intertwined with factual things you do want it to be able to answer...
Is Taiwan part of China - the CPP wants the answer to be yes.
What are the rules for traveling to Taiwan? What currency is used in Taiwan? Whose laws are enforced in Taiwan? Should I (a loyal Chinese citizen) support the Taiwanese military? Etc... require the model to manage some cognitive dissonance.
Fortunately we have lots of governmental and non-governmental organizations focused on removing "hate" online, so that our AI models will think correctly, without easy to identify censorship parts in the resulting model :)
The article has hallmarks of being formulated by an LLM. Why should I bother to read it if I ca not be sure which parts are based on the prompt, and which parts are hallucinated from the LLMs world knowledge? Dear author, care to simply share your prompt with us?
The topic is interesting and you have my thanks for taking the time to look into it and prepare the post. Would you say it's fair to say that if you didn't use LLMs to prepare the post, we would have no blog post at all?
In that case, I think I lean more towards being OK with this usage of LLMs, as I'd rather have this content available than not. However, I can only read that one repeated sentence about "booleans" (Ctrl-F "Boolean" and you'll know what I mean) this many times before I start questioning the validity of the entire document. It is not _good_ writing, to be frank.
Real question, not intentionally meant from a tinfoil hat perspective: now that it's been shown the censorship can be viewed, how long before we see serious obfuscation of censorship circuits in LLMs?
lyu07282 | 2 hours ago
That remind me of the quote "The totalitarian system of thought control is far less effective than the democratic one"
Full quote (Radical Priorities, Noam Chomsky, C.P. Otero)
> “The totalitarian system of thought control is far less effective than the democratic one, since the official doctrine parroted by the intellectuals at the service of the state is readily identifiable as pure propaganda, and this helps free the mind.” In contrast, he writes, “the democratic system seeks to determine and limit the entire spectrum of thought by leaving the fundamental assumptions unexpressed. They are presupposed but not asserted.”
nyrikki | 2 hours ago
We will have to see how this works out, but the explicit denials are easier to control for IMHO.
Reminds me of the old joke:
I can't remember what layer it was on but in gpt-oss but it was a very specific token IIRC.dang | 2 hours ago
ebbi | 2 hours ago
dang | 2 hours ago
It got flagkilled for obviously breaking the site guidelines. Killed posts remain visible to users who have 'showdead' turned on in their profile. This is in the FAQ: https://news.ycombinator.com/newsfaq.html.
ebbi | 2 hours ago
dang | 2 hours ago
ebbi | 2 hours ago
>[stub for offtopicness]
But it was very much on topic.
ViktorRay | 2 hours ago
Once you do that, you can see the comment in the link that dang posted.
Anyway it’s good that comment was moderated. The commentator didn’t say anything about Israel. He was clearly being hostile towards Jewish people in a “subtle” way that very clearly isn’t really subtle to anyone.
ebbi | an hour ago
lyu07282 | an hour ago
ebbi | 2 hours ago
Ironic, given the topic of this post....
nohell | an hour ago
rrhjm53270 | an hour ago
Creamsicle47 | 2 hours ago
ebbi | 2 hours ago
nubg | an hour ago
han1 | 2 hours ago
delichon | 2 hours ago
[OP] s314 | 2 hours ago
Correct. Steering is used in mechanistic interpretability studies to prove that your model is correct. There are other better ways to "decensor".
gpm | 2 hours ago
Is Taiwan part of China - the CPP wants the answer to be yes.
What are the rules for traveling to Taiwan? What currency is used in Taiwan? Whose laws are enforced in Taiwan? Should I (a loyal Chinese citizen) support the Taiwanese military? Etc... require the model to manage some cognitive dissonance.
like_any_other | 36 minutes ago
nubg | an hour ago
[OP] s314 | an hour ago
> hallucinated from the LLMs world knowledge
This can't be true because I checked whether the content was consistent with the experimental outputs
Squeeze2664 | an hour ago
yodon | an hour ago
[OP] s314 | an hour ago
So I don't think there'll be effort to "obfuscate"
ydj | 56 minutes ago
[OP] s314 | 26 minutes ago