Outer Web | Explore

Explore >> Select a destination

You are here		unstableontology.com Moral Reality Check - Unstable Ontology
\|	\|	joecarlsmith.com Video and transcript of talk on AI welfare - Joe Carlsmith	5.5 parsecs away Travel
\|	\|	An overview of my take on AI welfare as of May 2025, from a talk I gave at Anthropic.	5.5 parsecs away Travel
\|	\|	www.lesswrong.com Anthropic's Core Views on AI Safety - LessWrong	4.8 parsecs away Travel
\|	\|	Comment by Ethan Perez - Evan and others on my team are working on non-mechanistic-interpretability directions primarily motivated by inner alignment: 1. Developing model organisms for deceptive inner alignment, which we may use to study the risk factors for deceptive alignment 2. Conditioning predictive models as an alternative to training agents. Predictive models may pose fewer inner alignment risks, for reasons discussed here 3. Studying the extent to which models exhibit likely pre-requisites to deceptive inner alignment, such as situational awareness (a very preliminary exploration is in Sec. 5 in our paper on model-written evaluations) 4. Investigating the extent to which externalized reasoning (e.g. chain of thought) is a way to gain transparency int...	4.8 parsecs away Travel
\|	\|	yoshuabengio.org FAQ on Catastrophic AI Risks - Yoshua Bengio	6.3 parsecs away Travel
\|	\|	I have been hearing many arguments from different people regarding catastrophic AI risks. I wanted to clarify these arguments, first for myself, because I would really like to be convinced that we need not worry. Reflecting on these arguments, some of the main points in favor of taking this risk seriously can be summarized as follows: (1) many experts agree that superhuman capabilities could arise in just a few years (but it could also be decades) (2) digital technologies have advantages over biological machines (3) we should take even a small probability of catastrophic outcomes of superdangerous AI seriously, because of the possibly large magnitude of the impact (4) more powerful AI systems can be catastrophically dangerous even if they do not surpass huma...	6.3 parsecs away Travel
\|	\|	www.livescience.com 'Annoying' sycophantic version of ChatGPT pulled after chatbot wouldn't stop flattering users \| Live Science	18.7 parsecs away Travel
\|		A recent update caused ChatGPT to turn into a sycophant, with the chatbot excessively complimenting and flattering its users with reassurances - even when they said they'd harmed animals or stopped taking their medication. OpenAI has now reversed the changes.	18.7 parsecs away Travel