<aside> <img src="/icons/calendar-week_lightgray.svg" alt="/icons/calendar-week_lightgray.svg" width="40px" />
<aside> <img src="/icons/bullseye_gray.svg" alt="/icons/bullseye_gray.svg" width="40px" />
</aside>
<aside> <img src="/icons/book_gray.svg" alt="/icons/book_gray.svg" width="40px" />
Toward a Science of AI Agent Reliability
…
</aside>
<aside> <img src="/icons/book_gray.svg" alt="/icons/book_gray.svg" width="40px" />
https://www.alignmentforum.org/
AI Consciousness
https://ai-frontiers.org/articles/the-evidence-for-ai-consciousness-today
Papers
https://arxiv.org/abs/2304.03279
https://arxiv.org/html/2402.06782
…
</aside>
<aside> <img src="/icons/preview_gray.svg" alt="/icons/preview_gray.svg" width="40px" />
</aside>
<aside> <img src="/icons/new-badge_gray.svg" alt="/icons/new-badge_gray.svg" width="40px" />
These are some of the leading tools for researching literature on AI
There is a Framework for AI Consciousness here. I think it would be very interesting to review. Kaj also wrote about the topic on LessWrong.
I attended a review of the paper Toward a Science of AI Agent Reliability hosted by BlueDot impact. My notes are here. There’s a pretty large capability-reliability gap in terms of AI Agents. Even though LLM’s are getting more capable (there are more tasks it can solve), reliability on tasks (whether it can do the task every time) seems to be lagging. Unless reliability catches up, we will need to build under the assumption of relatively permanent human-in-the-loop scenarios, or we will have increasingly capable but chaotic agents, which is the most potentially dangerous scenario. In this case, the way that agents fail at a task may be very difficult to predict in advance. The system unreliability would also compound in multi-agent systems.
</aside>
Research on Multi-Agents
https://docs.google.com/spreadsheets/d/1oOdrQ80jDK-aGn-EVdDt3dg65GhmzrvBWzJ6MUZB8n4/edit?usp=sharing
200 Concrete Problems In Interpretability Spreadsheet
Past BlueDot projects
https://fuzzyhead.substack.com/p/reproducing-metrs-re-bench-reward
https://blog.bluedot.org/p/shutdown-resistance-revisited-replicating
https://blog.bluedot.org/p/learnings-from-building-my-first
https://blog.bluedot.org/p/causal-probes
Interpretability Research