AMAP-ML

DreamX Team @ Amap (Alibaba)

We are the DreamX team at Amap (Alibaba), driving cutting-edge research and production AI systems across large language models, reinforcement learning, agent systems, multimodal understanding, generative AI (image/video), world models, autonomous driving, and intelligent mobility. With 6,000+ GitHub stars across 30+ open-source research projects, our work has been published at top-tier venues including ICLR, CVPR, ACL, AAAI, SIGGRAPH, ICCV, EMNLP, and ACM MM.

We are always looking for talented interns and full-time researchers with strong coding skills and research experience. Please email us at cxxgtxy@gmail.com if you are interested.

🔥 News

2026.05.12 🎉 is accepted by ACL 2026 -- Training LLM Agents via Agent-Data Mutual Evolution.
2026.05.12 🎉 is accepted by ACL 2026 Findings -- Reinforced Parallel Map-Augmented Agent for Geolocalization.
2026.05.11 💻 We released 5B-Cam model and inference code -- A General-Purpose Interactive World Model.
2026.04.22 💻 We open-sourced -- Elucidating the SNR-t Bias of Diffusion Probabilistic Models (CVPR 2026).
2026.04.22 💻 We open-sourced -- Extending One-Step Image Generation from Class Labels to Text (CVPR 2026).
2026.04.10 💻 We open-sourced -- Let Skills Evolve Collectively with Agentic Evolver.
2026.04.10 💻 We open-sourced -- A General-Purpose Interactive World Model.
2026.04.01 🎉 is accepted by SIGGRAPH 2026 -- Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation.
2026.03.23 💻 We open-sourced -- A Comprehensive Benchmark for Evaluating Interactive Response Capabilities of World Models.
2026.03.20 💻 We open-sourced -- Incentivizing Reasoning and Self-Reflection for VLA in Autonomous Driving.
2026.03.18 💻 We open-sourced -- Reinforcing Open-Vocabulary Action Recognition with Tools.
2026.03.11 💻 We open-sourced -- Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing.
2026.03.01 🎉 is accepted by CVPR 2026 -- Beyond Generation: Advancing Image Editing Priors for Depth and Normal Estimation.
2026.02.28 🎉 is accepted by ICLR 2026 -- Frequency-Aware Sparse Attention.
2026.02.27 🎉 is accepted by Findings of CVPR 2026 -- Towards Close-up High-resolution Video-based Virtual Try-on.
2026.02.06 💻 We open-sourced -- A Scalable Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios.
2026.02.06 🎉 is accepted by ICLR 2026 -- Reinforcing Open-Vocabulary Action Recognition with Tools.
2026.02.06 🎉 is accepted by ICLR 2026 -- Incentivizing Reasoning and Self-Reflection for VLA in Autonomous Driving.
2026.02.06 🎉 is accepted by ICLR 2026 -- Benchmarking Spatial Intelligence of Text-to-Image Models.
2026.02.06 🎉 is accepted by ICLR 2026 -- Tree Search for LLM Agent Reinforcement Learning.
2026.02.06 🎉 is accepted by ICLR 2026 -- Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models.
2026.02.05 🎉 is accepted by ICLR 2026 -- Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation.
2026.02.04 💻 We open-sourced -- A GUI World Model via Renderable Code Generation.
2026.02.04 🎉 is accepted by ICLR 2026 -- A Simple and Strong Reinforcement Learning Baseline for Model Reasoning.
2026.02.04 🎉 is accepted by ICLR 2026 -- A Comprehensive Narrative-Centric Evaluation for Long Video Generation Models.
2026.02.04 🎉 is accepted by ICLR 2026 -- Advancing End-To-End Pixel-Space Generative Modeling via Self-Supervised Pre-Training.
2026.02.04 🎉 is accepted by AAAI 2026 -- Unified and Spatially-Controllable Visual Effects Generation.
2026.02.04 🎉 is accepted by AAAI 2026 -- Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints.
2026.02.02 🎉 is accepted by ICCV 2025 -- A Benchmark for Perception-Aligned Video Motion Generation.
2026.01.31 🎉 is accepted by ICLR 2026 -- Urban Socio-Semantic Segmentation with Vision-Language Reasoning.
2026.01.07 💻 We open-sourced -- Reinforced Parallel Map-Augmented Agent for Geolocalization.
2025.10.22 💻 We open-sourced -- Boosting MLLMs' Video Understanding via Counterfactual Video Generation.
2025.06.20 💻 We open-sourced -- A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing.
2025.05.21 💻 We open-sourced -- Reasoning Guided Universal Visual Grounding with Reinforcement Learning.
2025.04.07 💻 We open-sourced -- Realistic Image Quality and Aesthetic Scoring with Multimodal LLM.

📚 Research Areas

🧠 LLM Reasoning & Agent Systems

Repository	Description	Venue
SkillClaw	A framework enabling LLM agent skills to evolve collectively from real interactions, with automatic deduplication, improvement, and verification across sessions, agents, and devices.	-
Tree-GRPO	Adopts tree-search rollouts in place of independent chain-based rollouts for LLM agent RL, achieving superior performance with only a quarter of the rollout budget.	ICLR 2026
GPG	A minimalist RL approach (Group Policy Gradient) that directly optimizes the original RL objective, eliminating critic/reference models and KL constraints while outperforming GRPO.	ICLR 2026
MathForge	Proposes difficulty-aware GRPO and multi-aspect question reformulation to boost math reasoning by targeting harder questions from both algorithmic and data perspectives.	ICLR 2026
CoEvolve	A framework for training LLM agents via agent-data mutual evolution, using RL with failure-signal-driven task synthesis under changing training distributions.	ACL 2026
FASA	Frequency-aware sparse attention that identifies and preserves critical frequency components to achieve efficient and accurate sparse decoding.	ICLR 2026

🎨 Image Generation & Editing

Repository	Description	Venue
FluxText	A novel text editing framework for multi-line scene text in complex visual scenarios, with Condition Injection LoRA module and regional text perceptual loss.	-
FE2E	Leveraging image editing priors from diffusion models for accurate monocular depth and normal estimation.	CVPR 2026
RL3DEdit	An RL-based single-pass 3D scene editing framework using VGGT as geometry-aware reward model and GRPO to anchor 2D editing priors onto the 3D consistency manifold.	CVPR 2026
S2-Guidance	Leverages stochastic block-dropping to construct sub-networks for training-free guidance, surpassing CFG on text-to-image and text-to-video generation.	ICLR 2026
EPG	Advancing end-to-end pixel-space generative modeling via self-supervised pre-training, eliminating the need for a separate VAE.	ICLR 2026
Omni-Effects	A unified framework for prompt-guided and spatially controllable composite visual effects generation, using LoRA-MoE and spatial-aware prompts.	AAAI 2026
SpatialGenEval	A benchmark with 1,230 information-dense prompts and 12,300 multi-choice questions to evaluate complex spatial intelligence in text-to-image models.	ICLR 2026
DCW	Elucidating the SNR-t bias of diffusion probabilistic models and proposing a differential correction method to improve generation quality across various diffusion models.	CVPR 2026
EMF	Extending one-step image generation from class labels to text via discriminative text representation.	CVPR 2026
USP	Unified self-supervised pretraining via masked latent modeling in VAE space, significantly improving diffusion model convergence and generation quality.	ICCV 2025

🎬 Video Generation & Understanding

Repository	Description	Venue
MACE-Dance	A cascaded expert framework explicitly decoupling motion generation and appearance synthesis for high-quality music-driven dance video generation, with 70K-clip MA-Data dataset.	SIGGRAPH 2026
Video-STAR	Combines contextual sub-motion decomposition with tool-augmented reinforcement learning for open-vocabulary action recognition using GRPO with hierarchical rewards.	ICLR 2026
NarrLV	The first benchmark to comprehensively evaluate narrative expression capabilities of long video generation models, inspired by film narrative theory.	ICLR 2026
ImagerySearch	A prompt-guided adaptive test-time search strategy that dynamically adjusts search space and reward for imaginative video generation with long-distance semantic dependencies.	AAAI 2026
Eevee	A high-resolution dataset and benchmark for video-based virtual try-on, supporting both full-shot and close-up garment detail views.	Findings of CVPR 2026
VMBench	A perception-aligned video motion benchmark with human-aligned metrics achieving 35.3% improvement in Spearman's correlation over baselines.	ICCV 2025
Taming-Hallucinations	Introduces DualityForge, a controllable diffusion framework generating counterfactual videos for contrastive training, reducing MLLM video hallucinations by 24%.	-

🌍 World Models & Interactive AI

Repository	Description	Venue
Code2World	A VLM-based GUI world model that predicts dynamic transitions via renderable code generation, boosting Gemini-2.5-Flash by +9.5% on AndroidWorld navigation.	-
DreamX-World	A general-purpose world model for interactive world simulation, generating diverse, high-fidelity worlds that users can explore, control, and transform with event prompts.	-
Omni-WorldBench	A comprehensive benchmark specifically designed to evaluate the interactive response capabilities of world models across diverse scenarios.	arXiv 2026

👁️ Multimodal & Vision-Language

Repository	Description	Venue
AutoDrive-R2	A vision-language-action model using rule-based RL to elicit reasoning and self-reflection for autonomous driving trajectory prediction with physics-grounded rewards.	ICLR 2026
SocioReasoner	A vision-language reasoning framework for urban socio-semantic segmentation that simulates human annotation via cross-modal recognition and multi-stage RL-based reasoning.	ICLR 2026
UniVG-R1	Reasoning guided universal visual grounding with reinforcement learning.	CVPR 2026
RealQA	A 14,715-image UGC dataset with 10 fine-grained attributes for realistic image quality and aesthetic scoring; achieves SOTA on 5 public IQA/IAA benchmarks using next-token prediction.	-

🗺️ Maps, Mobility & Spatial Intelligence

Repository	Description	Venue
Thinking-with-Map	A map-augmented agent that conducts reasoning with real-world maps for geolocalization, trained via reinforcement learning.	ACL 2026 Findings
MobilityBench	A scalable benchmark for evaluating route-planning agents in real-world mobility scenarios.	arXiv 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMAP-ML

AMAP-ML

DreamX Team @ Amap (Alibaba)

🔥 News

📚 Research Areas

🧠 LLM Reasoning & Agent Systems

🎨 Image Generation & Editing

🎬 Video Generation & Understanding

🌍 World Models & Interactive AI

👁️ Multimodal & Vision-Language

🗺️ Maps, Mobility & Spatial Intelligence

Pinned Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!