ROBIN Description Card

Jul 02, 2025

Executive Summary: A multi-agent system integrating literature search agents with data analysis agents to fully automate the scientific discovery process, from hypothesis generation through experimental data analysis and iterative refinement
Key Goal of the System: To automate the complete intellectual workflow of scientific discovery, generating novel therapeutic candidates, proposing experiments, analyzing results, and refining hypotheses in iterative cycles
System Architecture: Multi-agent architecture with specialized agents coordinated through structured workflows for literature synthesis and experimental data analysis
Base Model(s): OpenAI o4-mini (for synthesis and hypothesis generation), Anthropic Claude 3.7 Sonnet (for LLM judging), PaperQA2 (for literature agents)
General Tools Used: Scientific literature databases, clinical trial reports, Open Targets Platform, web search
Domain Specific Tools: PaperQA2 (literature search), Jupyter notebooks (data analysis), flow cytometry analysis tools, RNA-seq analysis pipelines

Number of Agents: 3 specialized agents plus coordinating system
Agent Types: Crow (concise literature search), Falcon (deep literature review), Finch (scientific data analysis)
Agent Hierarchy: Coordinated workflow with Robin system orchestrating agent deployment based on discovery stage
Communication Protocol: Structured handoffs between agents with shared context through literature reports and experimental results
Memory Architecture: Persistent storage of literature reviews, hypothesis rankings, experimental results, and analysis trajectories

Interaction Model: Scientist-in-the-loop paradigm where humans execute physical experiments while AI handles intellectual synthesis
Expertise Required: Domain expertise needed to execute laboratory protocols and validate AI-generated hypotheses
Feedback Mechanisms: Human experimental validation provides feedback for iterative hypothesis refinement
Output Formats: Ranked therapeutic candidate lists, detailed literature evaluations, experimental analysis reports, follow-up experimental proposals

Response Time: Variable based on analysis complexity; literature synthesis and candidate generation within hours to days
Computational Requirements: Standard computational resources for LLM inference plus specialized bioinformatics computing for data analysis
Scaling Properties: Employs consensus-driven analysis with 10 parallel trajectories for experimental data analysis
Benchmark Results: LLM judge achieved 7.25/10 concordance with human expert preferences; 88% intra-rater consistency vs 61% for human experts
Real-world Validation: Successfully validated in dry AMD drug discovery with identification and experimental confirmation of ripasudil efficacy

Literature Coverage: Access to scientific literature, clinical trial reports, and Open Targets Platform; approximately 400-500 papers analyzed per discovery cycle
Validation Status: Full wet-lab validation in RPE phagocytosis assays with flow cytometry and RNA-seq analysis
Target Identification Accuracy: Successfully identified ROCK inhibitors as therapeutic class and ripasudil as superior candidate with 7.5-fold efficacy improvement
Hypothesis Novelty Rate: Generated 30 distinct therapeutic candidates per disease; demonstrated novelty in proposing ROCK inhibitors for dry AMD (first such proposal)
Domain Expertise Breadth: Demonstrated across 11 different diseases including dry AMD, polycystic ovary syndrome, celiac disease, Charcot-Marie-Tooth disease, and others

Known Limitations: Cannot generate detailed executable laboratory protocols; requires domain expert prompt engineering for data analysis; limited to literature-based hypothesis generation
Safety Mechanisms: Human oversight required for experimental execution; iterative validation through wet-lab experiments
Edge Cases: Performance may degrade in domains with limited published literature or highly specialized experimental techniques
Ethical Considerations: Maintains human control over experimental execution and therapeutic development decisions; designed for augmentation rather than replacement of human scientists