Is ChatGPT stupid or old?

Question

Original source: New Knowledge of Science and Technology

Image source: Generated by Unbounded AI ‌

"Past performance is no guarantee of future results." This is the fine print of most financial management models. Within the product business, this is called model drift, decay, or obsolescence. Things change and model performance degrades over time. The final measurement standard is the model quality indicator, which can be accuracy, average error rate, or some downstream business KPIs, such as click-through rate. No model works forever, but the rate of decline varies. ‍ Some products can be used for years without needing updates, such as certain computer vision or language models, or any decision-making system in an isolated, stable environment, such as common experimental conditions. If you want to ensure the accuracy of the model, you need to train new data every day. This is a paradigm flaw of the machine learning model, and it also makes the deployment of artificial intelligence cannot be done once and for all like software deployment. The latter has been created for decades, and currently the most advanced AI products still use software technology from earlier years. As long as they remain useful, even if the technology becomes obsolete, they will live on in every byte. However, large models represented by ChatGPT, known as the most cutting-edge products of artificial intelligence, have faced questions about whether they are becoming outdated and aging after experiencing a decline in popularity. ** No wind, no wave. Users are spending less and less time on ChatGPT, falling from 8.7 minutes in March to 7 minutes in August. It reflects from the side that when the supply side of large model tools is growing rapidly, ChatGPT, which is just a productivity tool, does not seem to be enough to become the favorite of Generation Z, the mainstream user group. The temporary popularity is not enough to shake the dominance of OpenAI, which is committed to becoming an application store in the AI era. The more core issue is that the aging of ChatGPT’s productivity is the main reason for the decline in trust among many old users. Since May, there have been posts on the OpenAI forum discussing that the performance of GPT-4 is not as good as before. So is ChatGPT obsolete? Will large models represented by ChatGPT age like past machine learning models? Without understanding these issues, we will not be able to find a sustainable development path for humans and machines amid the endless craze for large models.

01 Is ChatGPT obsolete?

The latest data from the Salesforce AI software service provider shows that 67% of large model users are Generation Z or Millennials; more than 68% of people who rarely use generative AI or are lagging behind in this regard are X generation or baby boomers. The generational difference shows that Generation Z is becoming the mainstream group embracing large models. Kelly Eliyahu, product marketer at Salesforce, said: "Gen Z is actually the AI generation, and they make up the super user group. 70% of Gen Z are using generative AI, and at least half are using it every week or more." However, as a leader in large model products, ChatGPT’s performance among Generation Z people is not outstanding.

According to data from market research agency Similarweb in July, **ChatGPT was used by 27% of Generation Z people, down from 30% in April. For comparison, Character.ai, another large-scale model product that allows users to design their own artificial intelligence characters, has a penetration rate of 60% among people aged 18-24. ** Thanks to the popularity of Generation Z, Character.ai's iOS and Android applications currently have 4.2 million monthly active users in the United States, which is getting closer and closer to the 6 million monthly active users of mobile ChatGPT. Different from ChatGPT's conversational AI, Character.AI adds two core functions of personalization and UGC on this basis, giving it richer usage scenarios than the former. On the one hand, users can customize AI roles according to personal needs to meet the personalized customization needs of Generation Z. At the same time, the AI characters created by these users can also be used by all users of the platform to build an AI community atmosphere. For example, virtual characters such as Socrates and God have been circulated on social media platforms before, as well as AI images of business celebrities such as Musk created independently by the government. On the other hand, the personalized in-depth customization + group chat function also makes users rely on the platform for emotional intelligence. Public comments from users of many social media platforms indicate that the chat experience is too realistic, as if "the characters you created have life, just like talking to a real person" and "is the closest thing to an imaginary friend or a guardian angel so far." Possibly due to pressure from Character.AI, OpenAI issued a brief statement on its official website on August 16, 2023, announcing the acquisition of the American start-up Global Illumination and bringing the entire team under its wing. This small company with only two years of history and eight employees is mainly engaged in using artificial intelligence to create smart tools, digital infrastructure and digital experiences. Behind the acquisition, it is likely that OpenAI will be committed to improving the current large-model digital experience in a rich way.

02 The Aging of Artificial Intelligence

The aging of ChatGPT at the level of large-model digital experience affects its time-killing effect. As a productivity tool, the accuracy of its generated results is erratic, which is also affecting its user stickiness.

According to a previous survey by Salesforce, nearly 60% of large model users believe that they are mastering this technology through accumulated training time. However, the current mastery of this technology is changing over time.

As early as May, old users of large models began to complain on the OpenAI forum that GPT-4 "had difficulty performing things that performed well before." Business Insider reported in July that many old users described GPT-4 as "lazy" and "dumb" compared to its previous inference capabilities and other outputs. Since the official did not respond to this, people began to speculate on the reasons for the decline in GPT-4 performance. Could it be due to OpenAI's previous cash flow problems? Mainstream speculation focuses on performance degradation due to cost optimization. Some researchers say OpenAI may be using smaller models behind the API to reduce the cost of running ChatGPT. However, this possibility was later denied by Peter Welinder, OpenAI’s vice president of product. He said on social media: "We are not making GPT-4 dumber. One of the current assumptions is that when you use it more frequently, you will start to notice problems that you didn't notice before." More people and longer use have exposed the limitations of ChatGPT. Regarding this hypothesis, the researchers tried to present "changes in the relationship between ChatGPT performance and time" through more rigorous experiments.

A research paper titled "How is ChatGPT's behavior changing over time?" submitted by Stanford University and the University of California, Berkeley, in July shows that: **The same version of a large model can indeed change in a relatively short period of time. Big changes have taken place. ** From March to June, the researchers tested two versions of GPT-3.5 and GPT-4, collected and evaluated the generation results of four common benchmark tasks: mathematical questions, answering sensitive questions, code generation and visual reasoning. The results show that whether it is GPT-3.5 or GPT-4, the performance and generation results of both may change over time. In terms of mathematical ability, GPT-4 (March 2023) performs quite well in identifying prime numbers and composite numbers (84% accuracy), but GPT-4 (June 2023) performs poorly on the same problem ( 51% accuracy). Interestingly, CPT-3.5 performed much better on this task in June than in March. However, in terms of sensitive questions, GPT-4 was less willing to answer sensitive questions in June than in March; in terms of coding capabilities, both GPT-4 and GPT-3.5 showed more errors in June than in March. Researchers believe that although there is no obvious linear relationship between the performance of ChatGPT and time, the accuracy does fluctuate.

This is not only a problem of ChatGPT itself, but also a common problem of all previous AI models. **According to a 2022 study by MIT, Harvard University, University of Monterey, and University of Cambridge, 91% of machine learning models will degrade over time. Researchers call this phenomenon "artificial intelligence" Intelligent Aging”. ** For example, Google Health once developed a deep learning model that can detect retinal diseases through patient eye scans. The model achieved 90% accuracy during the training phase, but failed to provide accurate results in real life. Mainly because in the lab, high-quality training data is used, but real-world eye scans are of lower quality. Due to the aging of machine learning models, AI technologies that came out of the laboratory in the past were mainly based on single speech recognition technology, and products such as smart speakers were the first to become popular. According to a 2018 U.S. Census Bureau survey of 583,000 U.S. companies, only 2.8% used machine learning models to bring advantages to their operations. However, with the breakthrough in the intelligent emergence capabilities of large models, the aging speed of machine learning models has significantly weakened, and they are gradually moving out of the laboratory to a wider audience. However, there is still unpredictability under the black box of emergent capabilities, causing many people to question whether ChatGPT can maintain continuous improvement in AI performance in the long term.

03 Anti-aging under the black box

The essence of artificial intelligence aging is actually the paradigm flaw of machine learning models.

In the past, machine learning models were trained based on the correspondence between specific tasks and specific data. Through a large number of examples, first teach the model what is good and what is bad in that field, and then adjust the weight of the model to output appropriate results. Under this idea, every time you do something new or the data distribution changes significantly, the model must be retrained. There are endless new things and new data, and the model can only be refreshed. However, the refresh of the model will also cause things that were done well in the past to suddenly not be done well, further limiting the application. **To summarize, in traditional machine learning models, the essence of the data flywheel is to iterate the model and use new models to solve new problems. ** However, large models represented by ChatGPT have emerged with autonomous learning capabilities and have broken through this paradigm. In the past, machine learning first "eats" the data and then "imitates" it, based on correspondence relationships; large models like ChatGPT "teach" the data and then "understand" it, based on "internal logic." In this case, the large model itself does not change and can theoretically remain young forever. However, some practitioners said that just like the emergence of intelligence in large models, it develops non-linearly, is unpredictable, and appears suddenly. It is also unknown whether large models will age over time, emerging with unpredictable uncertainties. **In other words, after ChatGPT emerged with intelligent performance that was difficult to derive theoretically, it also began to emerge with unpredictability and uncertainty. ** Regarding the black box nature of "emergence", at the Baichuan Intelligent Baichuan2 open source large model launch conference on September 6, Zhang Bo, academician of the Chinese Academy of Sciences and honorary dean of the Institute of Artificial Intelligence of Tsinghua University, said: "Until now, the world has no confidence in the big open source model. The theoretical working principle of the model and the phenomena produced are all unclear, and all conclusions are deduced to produce the phenomenon of emergence. The so-called emergence is to give yourself a retreat. When the explanation is unclear, it is said to be emergence. In fact, it reflects We don’t know anything about it.” In his view, the question of why large models produce hallucinations involves the difference between ChatGPT and human natural language generation principles. The most fundamental difference is that the language generated by ChatGPT is externally driven, while human language is driven by its own intentions, so the correctness and rationality of ChatGPT content cannot be guaranteed. After jumping on the bandwagon through a series of concept hype, the challenge for those committed to developing basic models of productivity will be how to ensure the reliability and accuracy of their products’ continued output. But for entertainment products related to large models, as Character.AI co-founder Noam Shazeer said in the New York Times: “These systems are not designed for truth. They are designed for reasonable dialogue. ” In other words, they’re confident bullshit artists. The huge waves of the big model have begun to branch off.

Reference:

Gizmodo-Is ChatGPT Getting Worse?
TechCrunch-Al app Character.ai is catching up to ChatGPT in the US
Machine Learning Monitoring- Why You Should Care About Data and Concept Drift
Miss M’s Study Record-The Five Most Important Questions About ChatGPT
Tsinghua University Artificial Intelligence International Governance Institute-Research on large models is very urgent, and we cannot just say "emergence" if the explanation is not clear

View Original

Is ChatGPT stupid or old?

**01 Is ChatGPT obsolete? **

02 The Aging of Artificial Intelligence

03 Anti-aging under the black box

01 Is ChatGPT obsolete?