分类: unified multimodal | 庞贝堡垒

Pompeii BL

文章分类

3D重建 1 git 1 Linux 1 image-processing 1 深度学习 1 docker 3 hexo 2 python 2 vscode 1 transformer 1 CG 1 google search console 1 机器学习 1 LLMs 7 Latex 1 《大模型动力引擎》阅读笔记 8 LLMs 1 docker 1 Prompt 1 生活 1 杂谈 1 线性代数 1 unified multimodal 1 网络配置 1 美食 1 论文阅读 1

论文阅读：Bagel（Emerging Properties in Unified Multimodal Pretraining）

论文阅读：Bagel（Emerging Properties in Unified Multimodal Pretraining）

背景多模态理解与生成是一个新兴的研究领域，许多研究项目已经展示了在优化生成和理解基准测试中联合优化的潜力。然而，现有的模型大多只在标准的图像生成和理解任务中的图像-文本配对数据上进行训练，导致学术模型与专有系统（如GPT-4o和Gemin

2025-07-16 unified multimodal 论文阅读

Bagel 论文阅读多模态预训练