# MMMC **Repository Path**: hf-datasets/MMMC ## Basic Information - **Project Name**: MMMC - **Description**: Mirror of https://huggingface.co/datasets/YanzheChen/MMMC - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-10-12 - **Last Updated**: 2025-10-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README --- configs: - config_name: default data_files: - split: train path: "metadata.jsonl" --- # MMMC: Massive Multi-discipline Multimodal Coding Benchmark for Educational Video Generation ## Dataset Summary The **MMMC (Massive Multi-discipline Multimodal Coding)** benchmark is a curated dataset for **Code2Video research**, focusing on the automatic generation of professional, discipline-specific educational videos. Unlike pixel-only video datasets, MMMC provides **structured metadata** that links lecture content with executable code, visual references, and topic-level annotations, enabling reproducible and interpretable video synthesis. This dataset serves as the foundation for evaluating **Code2Video**, our code-centric agent framework, and is intended to support the broader community in exploring generative agents, multimodal learning, and education-oriented AI. --- ## Dataset Structure ### Data Files - **metadata.jsonl**: Main metadata file containing structured information for each video instance. Each entry in `metadata.jsonl` includes: - **id**: Unique identifier for the video slice. - **category**: High-level subject category (e.g., Mathematics, Physics, Computer Science). - **video**: File path to the corresponding educational video slice. - main_topics: List of teaching topics. - **num_slices**: Number of video segments the lecture is divided into. - **reference_image**: Key reference image (optional) related to the topic. --- ## Intended Uses - **Benchmarking educational video generation** from structured code. - **Studying agent-based pipelines** for converting knowledge outlines → storyboard → executable code → videos. - **Exploring interpretability and controllability** in multimodal generative systems. --- ## Data Collection We construct **MMMC**, a benchmark for code-driven educational video generation, under two criteria: 1. **Educational relevance** — each learning topic is an established concept worth teaching. 2. **Executable grounding** — each concept aligns with a high-quality Manim reference, ensuring practical realizability. We source data from [3Blue1Brown (3B1B) YouTube corpus](https://www.3blue1brown.com/#lessons), renowned for its instructional impact and expert Manim craftsmanship. These professional videos serve as a natural **upper bound** for quality, guiding the design of evaluation metrics and providing a rigorous reference point. After filtering out non-instructional items (e.g., Q&A), we curate **117 long-form videos** spanning **13 subject areas**, including *calculus*, *geometry*, *probability*, and *neural networks*. To enrich supervision, we segment videos using author-provided timestamps into **339 semantically coherent sub-clips**, yielding **456 units in total**. An LLM extracts concise learning topics (avg. 6.3 words) from titles, descriptions, and metadata, producing a clean mapping from videos to educationally grounded units. On average, a full-length video lasts **1014 seconds (~16.9 minutes)**, while a segmented clip spans **201 seconds (~3.35 minutes)**, balancing long-horizon reasoning with fine-grained supervision. --- ## Evaluation MMMC supports multi-dimensional evaluation: - **VLM-as-a-Judge aesthetics**: scoring visual appeal and clarity. - **Code efficiency**: measuring execution stability and runtime. - **TeachQuiz**: a novel end-to-end metric testing how well a VLM, after unlearning, can recover knowledge from watching generated videos. --- ## Ethics and Human Subjects - Human evaluation was conducted under principles of **voluntary participation** and **minimal workload**. - To prevent participant fatigue, the number of assigned videos was capped at **20**, a limit unanimously agreed upon by all participants. - Privacy was safeguarded by anonymizing responses, and all data collection complied with research ethics standards. --- ## Acknowledgements We thank all volunteer participants and the open-source education community for providing high-quality feedback and annotations. Special thanks to **[3Blue1Brown (3B1B)](https://www.3blue1brown.com/#lessons)** for making available a comprehensive corpus of professional Manim-based lessons, which not only serves as an invaluable **upper bound** in our benchmark but also inspires the design of evaluation metrics and educational applications. ---