Experimentein.ai: A smarter way to explore biological experiments

Flagship

An evidence-first AI platform that extracts, indexes, and searches protein experiments from scientific papers using a multi-stage parsing pipeline, vector retrieval, and agent tooling.

2026·By Babyneers Team·Updated February 1, 2026

EngineeringAcademic/Community

Technologies Used

Next.js 16TypeScriptPythonLangGraphQdrantAstra DBMongoDBGROBIDDoclingMCP

View on GitHub →Live Demo →

experimentein.ai

Built during Vectors in Orbit 2026 (GDC SupCom - FST, in partnership with Qdrant) by Team Babyneers (5 members). I contributed across full-stack delivery and AI systems architecture.

experimentein.ai addresses a practical research bottleneck: experiment-level information is buried in dense PDFs and is difficult to search at scale. Instead of treating documents as plain chunks, we modeled experiments as first-class entities while preserving provenance links back to source evidence.

What We Built

A full end-to-end system across interconnected services:

Extraction Pipeline v2 (Python)
Next.js web application (experimentein.ai)
MCP server for tool-based retrieval
Scrapper/uploader service for assets

Extraction Pipeline v2

The pipeline converts PDFs into structured and searchable experiment records in three stages:

pdf_to_infra: ingests papers through GROBID (TEI/XML) and Docling, uploads assets, and initializes paper metadata.
structure_to_blocks: normalizes structural blocks, generates section summaries, computes embeddings, and writes vectors to Qdrant.
blocks_to_items: runs retrieval-first candidate generation, then deterministic merge logic to surface clean experiment entities.

Web App and Agent Layer

The platform provides evidence-first search across papers, sections, blocks, and extracted experiment items. It includes:

experiment viewer and side-by-side comparison
credit accounting and usage history
a LangGraph agent that can reason over indexed evidence

The in-app agent is connected to a custom MCP server exposing Astra DB and Qdrant tools, giving the AI layer native, composable retrieval capabilities.

Architecture Decisions

Key choices that made the system robust:

experiment-centric data model instead of chunk-only retrieval
retrieval-first extraction before LLM structuring to reduce hallucinations and control cost
hybrid ingestion (GROBID + Docling) for better metadata/layout coverage
MCP-based tooling rather than brittle prompt-only data access

Reflection

The biggest win was treating each service boundary as explicit and testable. Under hackathon pressure, this let the team parallelize effectively while keeping data contracts clear between ingestion, indexing, retrieval, and agent interaction.

Built in 2026 by Babyneers: Mohamed Amin Abassi, Fatma Ben Lakdhar, Rima Ardhaoui, Amina Bayoudh, and Mohamed Yassine Hemissi.

Experimentein.ai: A smarter way to explore biological experiments

Flagship

An evidence-first AI platform that extracts, indexes, and searches protein experiments from scientific papers using a multi-stage parsing pipeline, vector retrieval, and agent tooling.

2026·By Babyneers Team·Updated February 1, 2026

EngineeringAcademic/Community

Technologies Used

Next.js 16TypeScriptPythonLangGraphQdrantAstra DBMongoDBGROBIDDoclingMCP

Extraction Pipeline v2

The pipeline converts PDFs into structured and searchable experiment records in three stages:

pdf_to_infra: ingests papers through GROBID (TEI/XML) and Docling, uploads assets, and initializes paper metadata.

structure_to_blocks: normalizes structural blocks, generates section summaries, computes embeddings, and writes vectors to Qdrant.

blocks_to_items: runs retrieval-first candidate generation, then deterministic merge logic to surface clean experiment entities.

Web App and Agent Layer

The platform provides evidence-first search across papers, sections, blocks, and extracted experiment items. It includes:

experiment viewer and side-by-side comparison

credit accounting and usage history

a LangGraph agent that can reason over indexed evidence

The in-app agent is connected to a custom MCP server exposing Astra DB and Qdrant tools, giving the AI layer native, composable retrieval capabilities.

Architecture Decisions

Key choices that made the system robust:

experiment-centric data model instead of chunk-only retrieval

retrieval-first extraction before LLM structuring to reduce hallucinations and control cost

hybrid ingestion (GROBID + Docling) for better metadata/layout coverage

MCP-based tooling rather than brittle prompt-only data access

Reflection

Built in 2026 by Babyneers: Mohamed Amin Abassi, Fatma Ben Lakdhar, Rima Ardhaoui, Amina Bayoudh, and Mohamed Yassine Hemissi.