DailyNews
vip

According to the qubit report, scholars from Microsoft Asia Research Institute (MSRA) proposed a new large model architecture Retentive Network (RetNet) in the paper "Retentive Network: A Successor to Transformer for Large Language Models", which is regarded as the field of large models Transformer's successor. Experimental data shows that on language modeling tasks: RetNet can achieve perplexity comparable to Transformer, reasoning speed is 8.4 times, memory usage is reduced by 70%, and it has good scalability. And when the model size is larger than a certain scale, RetNet will perform better than Transformer.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)