<aside> <img src="/icons/calendar-week_lightgray.svg" alt="/icons/calendar-week_lightgray.svg" width="40px" />

Research Agenda


  1. Tools for AI Research </aside>

<aside> <img src="/icons/bullseye_gray.svg" alt="/icons/bullseye_gray.svg" width="40px" />

Goals this Week


  1. Create questions and survey for the alignment measurement problem paper
  2. Apply for AI Research positions
  3. multi-layered AI guardrails

</aside>

<aside> <img src="/icons/book_gray.svg" alt="/icons/book_gray.svg" width="40px" />

Pages


AI Research Tools

Agentic AI System Design

Toward a Science of AI Agent Reliability

</aside>

<aside> <img src="/icons/book_gray.svg" alt="/icons/book_gray.svg" width="40px" />

Reading Stack


https://www.aisafetybook.com/

https://www.alignmentforum.org/

AI Consciousness

https://ai-frontiers.org/articles/the-evidence-for-ai-consciousness-today

https://www.alignmentforum.org/posts/hopeRDfyAgQc4Ez2g/how-i-stopped-being-sure-llms-are-just-making-up-their

Papers

https://arxiv.org/abs/2304.03279

https://arxiv.org/html/2402.06782

</aside>

<aside> <img src="/icons/preview_gray.svg" alt="/icons/preview_gray.svg" width="40px" />

Watch


</aside>

<aside> <img src="/icons/new-badge_gray.svg" alt="/icons/new-badge_gray.svg" width="40px" />

Takeaways


These are some of the leading tools for researching literature on AI

There is a Framework for AI Consciousness here. I think it would be very interesting to review. Kaj also wrote about the topic on LessWrong.

I attended a review of the paper Toward a Science of AI Agent Reliability hosted by BlueDot impact. My notes are here. There’s a pretty large capability-reliability gap in terms of AI Agents. Even though LLM’s are getting more capable (there are more tasks it can solve), reliability on tasks (whether it can do the task every time) seems to be lagging. Unless reliability catches up, we will need to build under the assumption of relatively permanent human-in-the-loop scenarios, or we will have increasingly capable but chaotic agents, which is the most potentially dangerous scenario. In this case, the way that agents fail at a task may be very difficult to predict in advance. The system unreliability would also compound in multi-agent systems.

</aside>

Log

Research on Multi-Agents

https://www.alignmentforum.org/posts/hHnpn3mEPbJMFLj4g/quick-thoughts-on-the-implications-of-multi-agent-views-of

https://docs.google.com/spreadsheets/d/1oOdrQ80jDK-aGn-EVdDt3dg65GhmzrvBWzJ6MUZB8n4/edit?usp=sharing

200 Concrete Problems In Interpretability Spreadsheet

Past BlueDot projects

https://fuzzyhead.substack.com/p/reproducing-metrs-re-bench-reward

https://blog.bluedot.org/p/shutdown-resistance-revisited-replicating

https://blog.bluedot.org/p/learnings-from-building-my-first

https://blog.bluedot.org/p/causal-probes

Interpretability Research

Inspect Evals Technical Contribution Guide