Experimentein.ai: A smarter way to explore biological experiments
An evidence-first AI platform that extracts, indexes, and searches protein experiments from scientific papers using a multi-stage parsing pipeline, vector retrieval, and agent tooling.
Technologies Used

Built during Vectors in Orbit 2026 (GDC SupCom - FST, in partnership with Qdrant) by Team Babyneers (5 members). I contributed across full-stack delivery and AI systems architecture.
experimentein.ai addresses a practical research bottleneck: experiment-level information is buried in dense PDFs and is difficult to search at scale. Instead of treating documents as plain chunks, we modeled experiments as first-class entities while preserving provenance links back to source evidence.
What We Built
A full end-to-end system across interconnected services:
- Extraction Pipeline v2 (Python)
- Next.js web application (experimentein.ai)
- MCP server for tool-based retrieval
- Scrapper/uploader service for assets
Extraction Pipeline v2
The pipeline converts PDFs into structured and searchable experiment records in three stages:
pdf_to_infra: ingests papers through GROBID (TEI/XML) and Docling, uploads assets, and initializes paper metadata.structure_to_blocks: normalizes structural blocks, generates section summaries, computes embeddings, and writes vectors to Qdrant.blocks_to_items: runs retrieval-first candidate generation, then deterministic merge logic to surface clean experiment entities.
Web App and Agent Layer
The platform provides evidence-first search across papers, sections, blocks, and extracted experiment items. It includes:
- experiment viewer and side-by-side comparison
- credit accounting and usage history
- a LangGraph agent that can reason over indexed evidence
The in-app agent is connected to a custom MCP server exposing Astra DB and Qdrant tools, giving the AI layer native, composable retrieval capabilities.
Architecture Decisions
Key choices that made the system robust:
- experiment-centric data model instead of chunk-only retrieval
- retrieval-first extraction before LLM structuring to reduce hallucinations and control cost
- hybrid ingestion (GROBID + Docling) for better metadata/layout coverage
- MCP-based tooling rather than brittle prompt-only data access
Reflection
The biggest win was treating each service boundary as explicit and testable. Under hackathon pressure, this let the team parallelize effectively while keeping data contracts clear between ingestion, indexing, retrieval, and agent interaction.
Built in 2026 by Babyneers: Mohamed Amin Abassi, Fatma Ben Lakdhar, Rima Ardhaoui, Amina Bayoudh, and Mohamed Yassine Hemissi.