Explore >> Select a destination


You are here

unstableontology.com
| | joecarlsmith.com
5.5 parsecs away

Travel
| | An overview of my take on AI welfare as of May 2025, from a talk I gave at Anthropic.
| | www.lesswrong.com
4.8 parsecs away

Travel
| | Comment by Ethan Perez - Evan and others on my team are working on non-mechanistic-interpretability directions primarily motivated by inner alignment: 1. Developing model organisms for deceptive inner alignment, which we may use to study the risk factors for deceptive alignment 2. Conditioning predictive models as an alternative to training agents. Predictive models may pose fewer inner alignment risks, for reasons discussed here 3. Studying the extent to which models exhibit likely pre-requisites to deceptive inner alignment, such as situational awareness (a very preliminary exploration is in Sec. 5 in our paper on model-written evaluations) 4. Investigating the extent to which externalized reasoning (e.g. chain of thought) is a way to gain transparency int...
| | yoshuabengio.org
6.3 parsecs away

Travel
| | I have been hearing many arguments from different people regarding catastrophic AI risks. I wanted to clarify these arguments, first for myself, because I would really like to be convinced that we need not worry. Reflecting on these arguments, some of the main points in favor of taking this risk seriously can be summarized as follows: (1) many experts agree that superhuman capabilities could arise in just a few years (but it could also be decades) (2) digital technologies have advantages over biological machines (3) we should take even a small probability of catastrophic outcomes of superdangerous AI seriously, because of the possibly large magnitude of the impact (4) more powerful AI systems can be catastrophically dangerous even if they do not surpass huma...
| | www.livescience.com
18.7 parsecs away

Travel
| A recent update caused ChatGPT to turn into a sycophant, with the chatbot excessively complimenting and flattering its users with reassurances - even when they said they'd harmed animals or stopped taking their medication. OpenAI has now reversed the changes.