Explore >> Select a destination


You are here

www.swyx.io
| | blog.ouseful.info
2.9 parsecs away

Travel
| | I finally succumbed and had a look at Google's proposed window.ai browser javascript object in Chrome. To get started (the restarts are superstitious behaviour...): download and install Chrome Canary (you can run this alongside any other Chrome instance; you do not need to log in or register to try out any of the following) enable...
| | isthisit.nz
3.8 parsecs away

Travel
| | August 2024 Update: Now a solved problem. Use Structured Outputs. Large language models (LLMs) return unstructured output. When we prompt them they respond with one large string. This is fine for applications such as ChatGPT, but in others where we want the LLM to return structured data such as lists or key value pairs, a parseable response is needed. In Building A ChatGPT-enhanced Python REPL I used a technique to prompt the LLM to return output in a text format I could parse.
| | josephm.dev
3.0 parsecs away

Travel
| | Get the OpenAI API to return a JSON object.
| | qwenlm.github.io
23.2 parsecs away

Travel
| PAPER DISCORD Introduction Reinforcement Learning (RL) has emerged as a pivotal paradigm for scaling language models and enhancing their deep reasoning and problem-solving capabilities. To scale RL, the foremost prerequisite is maintaining stable and robust training dynamics. However, we observe that existing RL algorithms (such as GRPO) exhibit severe instability issues during long training and lead to irreversible model collapse, hindering further performance improvements with increased compute. To enable successful RL scaling, we propose the Group Sequence Policy Optimization (GSPO) algorithm.