💙 Gate Square #Gate Blue Challenge# 💙
Show your limitless creativity with Gate Blue!
📅 Event Period
August 11 – 20, 2025
🎯 How to Participate
1. Post your original creation (image / video / hand-drawn art / digital work, etc.) on Gate Square, incorporating Gate’s brand blue or the Gate logo.
2. Include the hashtag #Gate Blue Challenge# in your post title or content.
3. Add a short blessing or message for Gate in your content (e.g., “Wishing Gate Exchange continued success — may the blue shine forever!”).
4. Submissions must be original and comply with community guidelines. Plagiarism or re
According to a report by IT House on September 9, NVIDIA recently announced the launch of TensorRT-LLM, a deeply optimized open source library that can accelerate the inference performance of all large language models on AI GPUs such as Hopper. NVIDIA is currently working with the open source community to use cutting-edge technologies such as SmoothQuant, FlashAttention and fMHA to implement AI kernels to optimize its GPU, which can accelerate GPT-3 (175B), Llama Falcom (180B) and Bloom models. The highlight of TensorRT-LLM is the introduction of a scheduling scheme called In-Flight batching, which allows work to enter and exit the GPU independently of other tasks. This solution allows the same GPU to dynamically process multiple smaller queries when processing large computing-intensive requests, improving the processing performance of the GPU and speeding up the throughput of the H100 by 2 times. In the performance test, NVIDIA used A100 as the basis and compared H100 and H100 with TensorRT-LLM enabled. In GPT-J 6B inference, the inference performance of H100 was 4 times higher than that of A100, while the performance of H100 with TensorRT-LLM enabled was better than A100. 8 times.