ViTPose++: Vision Transformer for Generic Body Pose Estimation
論文中文--ViTPose++:用於通用人體姿態估計的視覺 Transformer
2611 words
|
13 minutes
Cover Image of the Post
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
論文中文-百萬字節:基於多尺度 Transformer 的百萬字節序列建模方法
2575 words
|
13 minutes
Cover Image of the Post
Sparse Upcycling Training Mixture-of-Experts from Dense Checkpoints
論文中文-從密集檢查點中稀疏再利用訓練混合專家模型
2791 words
|
14 minutes
Cover Image of the Post
Mixtral 8x7B: A High Quality Sparse Mixture of Experts
論文中文-Mixtral 8x7B:高質量的稀疏專家混合模型
2884 words
|
14 minutes
Cover Image of the Post
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
論文中文-GLaM:利用混合專家架構實現語言模型的高效擴展
2227 words
|
11 minutes
Cover Image of the Post
遊戲心得:What Remains of Edith Finch
2025-07-06
因為前段時間 Steam 夏季促銷,正好 Telegram 的 Arch 群友 @RichardLuo 給我推薦了這款遊戲:What Remains of Edith Finch,遊戲中文名:《艾迪芬奇的記憶》、《伊迪·芬奇的回憶豪宅》。
1348 words
|
7 minutes
Cover Image of the Post
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
論文中文-《切換變換器:通過簡單高效的稀疏性機制擴展至萬億參數模型》
3392 words
|
17 minutes
Cover Image of the Post
Sparse MoE(稀疏混合專家模型)論文推薦
因為目前需要針對稀疏 MoE 進行研究,所以整理了這些論文。
3815 words
|
19 minutes
Cover Image of the Post