# MMMC

**Repository Path**: hf-datasets/MMMC

## Basic Information

- **Project Name**: MMMC
- **Description**: Mirror of https://huggingface.co/datasets/YanzheChen/MMMC
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-10-12
- **Last Updated**: 2025-10-12

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

---
configs:
  - config_name: default
    data_files:
      - split: train
        path: "metadata.jsonl"
---


# MMMC: Massive Multi-discipline Multimodal Coding Benchmark for Educational Video Generation

## Dataset Summary
The **MMMC (Massive Multi-discipline Multimodal Coding)** benchmark is a curated dataset for **Code2Video research**, focusing on the automatic generation of professional, discipline-specific educational videos. Unlike pixel-only video datasets, MMMC provides **structured metadata** that links lecture content with executable code, visual references, and topic-level annotations, enabling reproducible and interpretable video synthesis.

This dataset serves as the foundation for evaluating **Code2Video**, our code-centric agent framework, and is intended to support the broader community in exploring generative agents, multimodal learning, and education-oriented AI.

---

## Dataset Structure

### Data Files
- **metadata.jsonl**: Main metadata file containing structured information for each video instance.

Each entry in `metadata.jsonl` includes:
- **id**: Unique identifier for the video slice.  
- **category**: High-level subject category (e.g., Mathematics, Physics, Computer Science).  
- **video**: File path to the corresponding educational video slice.  
- <span style="background-color:yellow; font-weight:bold;">main_topics</span>: List of teaching topics.
- **num_slices**: Number of video segments the lecture is divided into.  
- **reference_image**: Key reference image (optional) related to the topic.  

---

## Intended Uses
- **Benchmarking educational video generation** from structured code.
- **Studying agent-based pipelines** for converting knowledge outlines → storyboard → executable code → videos.  
- **Exploring interpretability and controllability** in multimodal generative systems.  

---

## Data Collection
We construct **MMMC**, a benchmark for code-driven educational video generation, under two criteria:  
1. **Educational relevance** — each learning topic is an established concept worth teaching.  
2. **Executable grounding** — each concept aligns with a high-quality Manim reference, ensuring practical realizability.  

We source data from [3Blue1Brown (3B1B) YouTube corpus](https://www.3blue1brown.com/#lessons), renowned for its instructional impact and expert Manim craftsmanship. These professional videos serve as a natural **upper bound** for quality, guiding the design of evaluation metrics and providing a rigorous reference point.  

After filtering out non-instructional items (e.g., Q&A), we curate **117 long-form videos** spanning **13 subject areas**, including *calculus*, *geometry*, *probability*, and *neural networks*. To enrich supervision, we segment videos using author-provided timestamps into **339 semantically coherent sub-clips**, yielding **456 units in total**.  

An LLM extracts concise learning topics (avg. 6.3 words) from titles, descriptions, and metadata, producing a clean mapping from videos to educationally grounded units. On average, a full-length video lasts **1014 seconds (~16.9 minutes)**, while a segmented clip spans **201 seconds (~3.35 minutes)**, balancing long-horizon reasoning with fine-grained supervision.  


---

## Evaluation
MMMC supports multi-dimensional evaluation:  
- **VLM-as-a-Judge aesthetics**: scoring visual appeal and clarity.  
- **Code efficiency**: measuring execution stability and runtime.  
- **TeachQuiz**: a novel end-to-end metric testing how well a VLM, after unlearning, can recover knowledge from watching generated videos.  

---

## Ethics and Human Subjects
- Human evaluation was conducted under principles of **voluntary participation** and **minimal workload**.  
- To prevent participant fatigue, the number of assigned videos was capped at **20**, a limit unanimously agreed upon by all participants.  
- Privacy was safeguarded by anonymizing responses, and all data collection complied with research ethics standards.  

---


## Acknowledgements
We thank all volunteer participants and the open-source education community for providing high-quality feedback and annotations. Special thanks to **[3Blue1Brown (3B1B)](https://www.3blue1brown.com/#lessons)** for making available a comprehensive corpus of professional Manim-based lessons, which not only serves as an invaluable **upper bound** in our benchmark but also inspires the design of evaluation metrics and educational applications.

---