增加可选的MLA支持、修复模型内部精度一致，优化代码add mla, fix model dtype, improve codes by Zephor5 · Pull Request #240 · jingyaogong/minimind

Zephor5 · 2025-02-28T16:10:38Z

针对预训练pretrain稍作了一些更新和优化

移除__pychache__多余文件
添加一个可选的基于单个字符（汉字）的tokenizer
添加Deepseek的MLA（多头潜在注意力）支持（参数控制可选）
pretrain dataset将数据准备移到批数据处理，增加可支持单个和多个数据文件（另有额外未使用验证的映射训练数据读取机制，避免一次性加载全部训练数据）
更新了model内部的数据精度，保持跟参数dtype一致。
另有pretrain中注释了的显存调试的代码
pretrain的学习率更新修改为累积步数时

TODO:
除了预训练、eval_model 其他的训练和api代码未做针对性适配，不影响原逻辑默认参数使用，若要启用MLA可能需要对应修改

fangzhangmnm · 2025-03-09T22:42:28Z

你试过训练吗。我自己写了一个mla结果loss收敛的很差

Disclaimer:
这个是我用自己写的代码在自己的数据集上训练的. 不是这个repo里的代码和数据
Edit:
后来发现是一个低级错误,导致这个attention层被短路了. 修复之后loss曲线和没有MLA时差不多,虽然巨慢(2x time) (因为没有MLA版的fast attention)

Zephor5 · 2025-03-11T02:03:24Z

@fangzhangmnm 相同参数情况下内存消耗多一点，因为中间参数多一些；loss曲线差不太多。感觉因为小模型用mla位置信息没有gqa多，所以效果在小模型下可能不太如gqa。

LeoWootsi · 2025-03-28T14:21:35Z

你试过训练吗。我自己写了一个mla结果loss收敛的很差

请问这个是在仓库里公开的那个pretrain_hq.jsonl 上训练的loss吗？

fangzhangmnm · 2025-03-28T18:22:49Z

你试过训练吗。我自己写了一个mla结果loss收敛的很差

请问这个是在仓库里公开的那个pretrain_hq.jsonl 上训练的loss吗？

这个是我用自己写的代码在自己的数据集上训练的. 不是这个repo里的代码和数据

add mla, fix model dtype, improve codes

dc0708a

Zephor5 mentioned this pull request Feb 28, 2025

梯度累计问题 #231

Closed

Zephor5 force-pushed the master branch from e00280a to c564e5d Compare February 28, 2025 16:58

fix load model and lr update

1c884f4

Zephor5 force-pushed the master branch from c564e5d to 1c884f4 Compare March 2, 2025 14:32

Nijikadesu mentioned this pull request Apr 18, 2025

mla的实现 Nijikadesu/breakdown-minimind#4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

增加可选的MLA支持、修复模型内部精度一致，优化代码add mla, fix model dtype, improve codes#240

增加可选的MLA支持、修复模型内部精度一致，优化代码add mla, fix model dtype, improve codes#240
Zephor5 wants to merge 2 commits intojingyaogong:masterfrom
Zephor5:master

Zephor5 commented Feb 28, 2025 •

edited

Loading

Uh oh!

fangzhangmnm commented Mar 9, 2025 •

edited

Loading

Uh oh!

Zephor5 commented Mar 11, 2025 •

edited

Loading

Uh oh!

LeoWootsi commented Mar 28, 2025

Uh oh!

fangzhangmnm commented Mar 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Zephor5 commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fangzhangmnm commented Mar 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Zephor5 commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LeoWootsi commented Mar 28, 2025

Uh oh!

fangzhangmnm commented Mar 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Zephor5 commented Feb 28, 2025 •

edited

Loading

fangzhangmnm commented Mar 9, 2025 •

edited

Loading

Zephor5 commented Mar 11, 2025 •

edited

Loading