📢 Exclusive on Gate Square — #PROVE Creative Contest# is Now Live!
CandyDrop × Succinct (PROVE) — Trade to share 200,000 PROVE 👉 https://www.gate.com/announcements/article/46469
Futures Lucky Draw Challenge: Guaranteed 1 PROVE Airdrop per User 👉 https://www.gate.com/announcements/article/46491
🎁 Endless creativity · Rewards keep coming — Post to share 300 PROVE!
📅 Event PeriodAugust 12, 2025, 04:00 – August 17, 2025, 16:00 UTC
📌 How to Participate
1.Publish original content on Gate Square related to PROVE or the above activities (minimum 100 words; any format: analysis, tutorial, creativ
According to a report by IT House on January 15, Google Research recently used its own BIG-Bench Benchmark to establish a "BIG-Bench Error" dataset, and used the relevant dataset to conduct a series of evaluation studies on the "error probability" and "error correction ability" of popular language models on the market. Google researchers said that because there was no dataset that could evaluate the "error probability" and "self-correction ability" of large language models in the past, they created a dedicated benchmark dataset called "BIG-Bench Error" to evaluate the test. It is reported that the researchers first used the PaLM language model to run 5 tasks in their own BIG-Bench Benchmark task, and then modified the generated "Chain-of-Thought" trajectory to add a "logical error" section, and then re-threw it to the model to determine where there were errors in the chain of thought trajectory. Google researchers claim that the BIG-Bench Mistake dataset is beneficial for improving the self-error correction ability of models, and that models fine-tuned for relevant test tasks "generally perform better than large models with zero-shot prompts."