"Wenxinyiyan seems to be launched in a hurry. I think this thing is not for making money at all, but to catch up with the ChatGPT boom. The industry's large model is what can really generate commercial value." Shortly after the release of Baidu Wenxinyiyan , A former Baidu employee told Titanium Media, "When OpenAI was not so popular last year, Mr. Wang (Baidu CTO Wang Haifeng) led a team to build 10 large-scale models, including large-scale industry models. At that time, there was not much attention outside the industry, but If you look at Baidu's layout now, the big industry model is actually a forward-looking layout, earlier than OpenAI and Microsoft."
Today, after the hustle and bustle of general-purpose large-scale models, industry models are gradually gaining traction, which also confirms this reality: Basic large-scale models such as ChatGPT earn "crying", which largely plays a role in educating the market and shaping cognition , Artificial intelligence is really going to be implemented and earn the current money, but also depends on the industry's large model.
Even in overseas markets, ChatGPT, as part of the attributes of C-end products, has gradually weakened. According to SimilarWeb data, the growth rate of ChatGPT’s visits in the early stage was astonishing. The month-on-month growth rate was 131.6% in January and 62.5% in February. It was 55.8% in March, and it slowed down significantly in April, with a month-on-month growth rate of 12.6%. By May, this figure had changed to 2.8%, and it is expected that the month-on-month growth rate in June may be negative.
"I believe that many of us have tried ChatGPT, and I believe that many people have put it aside after trying it, because it is basically separated from our work at present, so we put it down after using it. But I I still hope that everyone will not "get up early and catch the late episode", because this is a paradigm revolution that will bring about subversive changes." Microsoft (China) Chief Technology Officer (CTO) Wei Qing said previously.
The B-side solution based on ChatGPT or large models is a good way to solve the separation between large models and scenes.
Internationally, major companies such as Microsoft and Amazon have also begun to seek commercialization paths from enterprise-level services, and have begun to explore multiple industries; domestically, Baidu, Alibaba, Tencent, and Huawei are all speeding up investment in large-scale industry models. In addition, many industry leaders and start-up companies around the world are also exploring the prospect of large-scale industry models. Recently, the Beijing Municipal Science and Technology Commission and the Zhongguancun Management Committee also released the first batch of 10 application cases of large-scale artificial intelligence industry models in Beijing. In addition, the amount of mergers and acquisitions of related technology routes has also reached new highs...
But the large-scale model track is far from being crowded—with the rapid development of technological iteration, all walks of life are re-accumulating technical knowledge and shaping business models, and everything has just begun.
Upgrade: Thousand Models War
If the basic model is a "hundred-model war", the industry's large-scale model is a "thousand-model war". Just like the trunk grows branches, each basic large-scale model manufacturer can incubate several industry large-scale models. unanimous.
"Although everyone has high expectations for the general-purpose large-scale model, it is not necessarily the optimal solution to meet the needs of industry scenarios." On June 19, at the Tencent Cloud Industry Large-scale Model Conference, Senior Executive Vice President of Tencent Group, Cloud and Tang Daosheng, CEO of Smart Industry Business Group, said.
In the case that Hunyuan Assistant did not release it to the public, Tencent took the lead in releasing large-scale industry models. Relying on the Tencent Cloud TI platform to build a selection of large-scale industry models, it provides customers with one-stop MaaS services and helps corporate customers build exclusive large-scale models and Smart application. It is learned from Tencent that Tencent will release official information about the general model of the C-end in the future.
This series of measures may be understood as, regardless of the effect and progress of the Hunyuan basic large-scale model, the priority release of the industry large-scale model is a necessary move for Tencent to ensure its own reputation and seize market customers when customers are in urgent need.
Earlier, Tian Qi, the chief scientist in the field of artificial intelligence at Huawei Cloud, mentioned that Huawei divides the large model into three levels, L0, L1, L2, and L0 is what everyone calls the basic general model, like GPT-3, in the basic model L0 On the basis of , plus industry data, the industry large model obtained by mixed training is L1.
Then, L1 is deployed for specific subdivision scenarios of thousands of industries downstream, and the task model L2 of the subdivision scenarios is obtained. In order to reduce production costs and improve efficiency as soon as possible, how to quickly produce L2 models from the large industry model L1, and Deploying the L2 model to the device side, edge side, and cloud side is a very important issue.
It can be seen on the agenda of the upcoming Huawei Developer Conference in July that Huawei Cloud will conduct a series of interpretations and releases on how the Pangu model was refined from a basic model to an industry model.
At this year's Alibaba Cloud Summit, Alibaba Cloud CTO Zhou Jingren also said, "Today not all companies need to start training from scratch, nor do you need everyone to start from scratch to create a variety of corpus, including a large number of computing power resources, to grow from scratch. A series of customization of the model, we hope that based on the Tongyi Qianwen model today, combined with the enterprise's scenario, enterprise knowledge system, and enterprise's special needs in the industry, each enterprise-specific model will be generated."
Microsoft is also making its own industry model. In April, in China, the international version of Microsoft Azure OpenAI Service released the first three sets of Azure global innovation industry scenarios for retail e-commerce, manufacturing and digital native fields, integrating GPT-3 and GPT-4 for local enterprise users going overseas. , Codex, DALL-E, and enterprise-level ChatGPT, five large-scale model services, to help Chinese overseas enterprise customers accelerate their expansion into the global market.
The "thousand-model war" is about to break out, but it is still too early to really enter the stage of big waves washing the sand. On the whole, large-scale models are still in a relatively early stage of development. Although large-scale models in the industry are concentrated, there is obviously more room for this track .
Taking the large model of the financial industry as an example, it is divided into different fields such as securities companies, insurance, banks, and new finance. The downstream tasks of each field are divided into dozens or hundreds of sub-tasks.
"The more important moment is when based on the basic model, SFT and other mechanisms and structures can be efficiently adapted to downstream tasks, and when the downstream tasks of the financial industry or other industry models have a scale effect." In Alibaba According to Chen Haiqing, head of the Moyuan Innovation Business Center, it is only the beginning of the industry's large models and scenarios for continuous training through some universal unstructured data.
Sensible and realistic choice
If an enterprise wants to create a basic large-scale model with hundreds of billions of parameters, it needs a computing power of more than 10,000 cards in a single-machine cluster, not only a GPU card, but also the utilization of GPU cluster resources, which most companies cannot do.
The large industry model is obviously easier to realize, and it also has a broader application prospect.
"Large models can empower thousands of industries, but you must have a good understanding of the scenarios of thousands of industries, and you can't expect to train hundreds of billions or trillions of large models, which can be easily used by enterprise users," said Zhou Ming, founder of Lanzhou Technology. "From the general model to the industry model, it is necessary to do the last mile for the user's scenario."
After assessing the investment required for the basic large-scale model and weighing the pros and cons and gains and losses, enterprise customers quickly turned to the large-scale industry model, and manufacturers devoted more energy to it.
Tang Daosheng said frankly that the current general-purpose large-scale models are generally trained based on extensive public literature and network information. The information on the Internet may contain errors, rumors, and biases. Many professional knowledge and industry data are insufficiently accumulated, resulting in the model's industry-specific The accuracy and accuracy are not enough, and the data "noise" is too large.
However, in many industrial scenarios, users have high requirements for professional services provided by enterprises, and their fault tolerance is low. Once a company provides wrong information, it may cause huge legal liability or public relations crisis. Therefore, the large-scale models used by enterprises must be controllable, traceable, and correctable, and must be tested repeatedly and fully before they can be launched.
"We believe that customers need more industry-specific industry models, coupled with the company's own data for training or fine-tuning, in order to create highly practical intelligent services. What companies need is to truly solve the problem in actual scenarios. Solve a certain problem instead of solving 70%-80% of the problem in 100 scenes." Tang Daosheng said.
Zhu Yong, vice president of Baidu Smart Cloud, also said, "From the situation at home and abroad, we can see that there are not so many general-purpose models. Some manufacturers on the market actually make relatively small models. On the contrary, domain models are special Important, because the general model only has the ability of general knowledge, the domain model can be aligned with the task expectations of specific industries and domains, and solve the actual problems of the business. This process is very important, but the cost and resources required for this process are far less than starting from scratch Do the underlying general model."
At the same time, he also judged that there may be only a few basic models (underlying general models) in the future, but combined with data in the professional field and industry know how, many different types of domain models will grow on it. These domain models will be very prosperous in the future and support the upper layer. Prosperous domain applications.
Taking the large model of the energy industry "State Grid-Baidu Wenxin" created by Baidu Smart Cloud and State Grid as an example, Baidu Smart Cloud, together with State Grid experts, introduced the samples accumulated by State Grid in the power business into the general large-scale model Data and unique knowledge, and in the training, combine the experience of both parties in the pre-training algorithm and the business and algorithm in the power field, design algorithms such as entity discrimination in the power field and document discrimination in the power field as pre-training tasks, so that Wenxin large model can learn power in depth Professional knowledge, so as to truly solve practical business problems in the energy field, and achieve the purpose of reducing costs and increasing efficiency.
Zhu Yong said that the difference between the general model and the domain model can be compared to a person with a wide range of knowledge who has gone to university. He may know some medical knowledge, but he cannot diagnose patients and is not a professional doctor. The domain model is to learn medical knowledge in depth on the basis of strong general ability, and become a professional doctor who can contribute value in the medical field.
From a general model with a wide range of knowledge to a professional medical model, the cost of resources required in this process is far less than that of building a general large model from scratch, but it emphasizes that there are professional data, there must be It is driven by tasks in the professional field to stimulate it to produce such abilities.
How to do industry model
The large model itself is a new thing, which has changed the previous software development paradigm. Manufacturers need a new tool chain and platform to help customers polish the industry large model earlier and faster.
With the advent of the big model era, the efficiency of the last mile will be greatly improved. Zhou Ming mentioned that a new generation of software development paradigm is taking shape, mainly based on the fact that enterprises provide many functional engines, and users are now assistants to improve efficiency. On this basis, it is easy to construct a new application.
Take Wenxin Qianfan large-scale model platform as an example, it is a one-stop large-scale model development and service operation platform for enterprise developers. It not only provides the underlying model (ERNIE-Bot) and third-party open source large models, but also provides various AI development tools and a complete development environment to facilitate customers to easily use and develop large model applications.
For data management, automated model SFT, and cloud deployment of reasoning services, manufacturers hope to realize one-stop large-scale model customization services. The capabilities of the large-scale model building platforms of different manufacturers are basically similar, and the difference lies in the ease of use, the quality of the effect, and the supported software and hardware.
"Making a large model is indeed not cheap, but there are only two reasons why the large model service can be promoted in the end: the first is that the effect of the model is better, and the effect of the model is not good. Needless to say, the second is the cost." Baidu Xin Zhou, general manager of Smart Cloud AI and Big Data Platform, said.
In effect, the industry model should rely on the general model. For example, in general education, if there is no better general model, it is impossible to talk about the application effect in a specific industry. The Bloomberg GPT jointly launched by Bloomberg and Johns Hopkins is an example. In its data distribution, the general basic model data accounts for half, the public data of the financial industry accounts for half, and Bloomberg's own data accounts for 0.6%.
"In order for any model to achieve a better level of intelligence or basic capabilities, it must train the basic model with a relatively good number of parameters, and then integrate some industry professional data into the basic model to make an industry model." Xin Zhou said.
Baidu's idea is to launch a "big guy" (Wenxin Yiyan) and a very complete tool platform (Wenxin Qianfan), and then provide differentiated model services according to the actual needs of customers to help customers make the most cost-effective choice. They believe that price will not become a bottleneck for companies to embrace large models.
In addition to model calling costs and training costs, Baidu is also helping companies to further reduce costs. If companies only focus on their relatively narrow fields, Baidu also has a version with relatively low parameters, so that while ensuring the effect of the model, use or The cost of training models will drop dramatically.
In fact, there is no universal standard for the cost of building a large industry model.
First of all, different basic large models have different parameter specifications, and the investment in software and hardware must change dynamically according to the basic parameters and capabilities of the model. If the parameter is tens of billions, an A100 card can also run and start downstream tasks.
The current relatively concentrated application scenario requirements fall into this category, such as intelligent question answering, intelligent writing, and intelligent creation in knowledge management, as well as pan-Internet marketing scenarios and code generation requirements.
Second, the cost is related to the amount of data and the direction of application. The current global large-scale model pricing is based on 1000 Token as the basic unit. If the downstream tasks of an enterprise are very simple and can be done with only tens of thousands of tokens, then its cost is very low and it requires very few GPU cards. The amount of data required to build a large industry model is usually in G or even T, so its offline training cost will be very high.
**Who is running the race? **
Players flocked to the large-scale model track. This time, not only the first-tier Internet companies, but also more industry leaders and start-up companies joined.
Which industries can take the lead in breaking through? Perhaps it can be seen from the industry in which the cooperation case is located. As shown in the table at the beginning of the article, finance, medical care, education, autonomous driving and other fields are frequently used.
For example, when Alibaba Cloud released the Tongyi large model in April, it announced that it has launched cooperative explorations with a number of companies. The first batch of cooperative companies include OPPO Andes Smart Cloud, Geely Automobile, Zhiji Automobile, Chery New Energy, Momo Zhixing, Swire Coca-Cola, Bosideng, Palm Technology, etc. According to reports, the financial industry, retail industry, and some large-scale consumer-oriented scenarios and industries have accumulated a lot of public data and scenario data, which is convenient for building enterprise or industry-specific models.
According to public information, the number of Baidu Wenxin's large-scale industry models has reached 11, covering energy and electricity, finance, aerospace, media, film and television, automobiles, urban management, gas, insurance, electronics manufacturing and social sciences.
The first batch of ten large-scale model application cases in the artificial intelligence industry in Beijing released on June 27 involves energy and electricity, medical health, finance, autonomous driving, construction, scientific research, life, and question-and-answer fields. It is reported that during the period from June 27 to July 30, the Beijing Science and Technology Commission and the Zhongguancun Management Committee will also focus on key areas such as urban governance, medical health, scientific research, smart finance, smart life, and smart cities, and target innovation subjects in the city. , will collect more than 80 industry large-scale model application case projects.
But more customers are facing a new wave of knowledge accumulation and learning process.
"When we communicated with customers, we found that many customers don't know much about industry models, but they will take the initiative to ask for Baidu's industry models." Li Jingqiu, deputy general manager of Baidu Smart Cloud AI Platform, said that at this time, it will be specifically combined with the actual use of enterprises. Analyze the needs of the products and customers, such as what kind of capabilities do you want the industry model to have, what systems or applications to use, who will use these applications, and what effects do you hope to achieve... After asking these questions, you will really find customers What is needed is a large model based on Wenxin Qianfan's tool chain SFT, or a pre-trained model for the industry. The latter needs at least several months, or even last year, to build and deploy—from technical issues such as data processing, resource allocation at the computing power layer, to long-term training on common data in the industry.
From the hustle and bustle of the basic large-scale model to the beginning of the industry's large-scale model, entering the second half of 2023, a real business transformation will be accelerated.
It is also interesting to compare the paths of domestic manufacturers such as Baidu and OpenAI/Microsoft in the field of large-scale models. When ChatGPT showed global phenomenon-level popularity, some voices questioned why China could not produce ChatGPT. Of course, there is a technical environment In the end, many people still have a superficial consensus-"China's AI is more inclined to business applications and commercialization capabilities." To put it bluntly, China's AI has less patience and wants to make money .
But on the other hand, the market is the biggest driving force for technological development, and the grasp of time and rhythm has created different results. Take the large-scale industry model as an example. Microsoft is either waiting for the further maturity of the technology, or feels that the time has not come yet, and it is a step too late. Domestic manufacturers quickly transition from the basic large-scale model to the large-scale industry model. Have lasting vitality.
Lost in the east, harvested in mulberry, in terms of results, it is not a bad thing that the domestic industry's large models run fast.
View Original
The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.
Industry model, open the book!
Source: Titanium Media, Author: Zhang Shuai
"Wenxinyiyan seems to be launched in a hurry. I think this thing is not for making money at all, but to catch up with the ChatGPT boom. The industry's large model is what can really generate commercial value." Shortly after the release of Baidu Wenxinyiyan , A former Baidu employee told Titanium Media, "When OpenAI was not so popular last year, Mr. Wang (Baidu CTO Wang Haifeng) led a team to build 10 large-scale models, including large-scale industry models. At that time, there was not much attention outside the industry, but If you look at Baidu's layout now, the big industry model is actually a forward-looking layout, earlier than OpenAI and Microsoft."
Today, after the hustle and bustle of general-purpose large-scale models, industry models are gradually gaining traction, which also confirms this reality: Basic large-scale models such as ChatGPT earn "crying", which largely plays a role in educating the market and shaping cognition , Artificial intelligence is really going to be implemented and earn the current money, but also depends on the industry's large model.
Even in overseas markets, ChatGPT, as part of the attributes of C-end products, has gradually weakened. According to SimilarWeb data, the growth rate of ChatGPT’s visits in the early stage was astonishing. The month-on-month growth rate was 131.6% in January and 62.5% in February. It was 55.8% in March, and it slowed down significantly in April, with a month-on-month growth rate of 12.6%. By May, this figure had changed to 2.8%, and it is expected that the month-on-month growth rate in June may be negative.
"I believe that many of us have tried ChatGPT, and I believe that many people have put it aside after trying it, because it is basically separated from our work at present, so we put it down after using it. But I I still hope that everyone will not "get up early and catch the late episode", because this is a paradigm revolution that will bring about subversive changes." Microsoft (China) Chief Technology Officer (CTO) Wei Qing said previously.
The B-side solution based on ChatGPT or large models is a good way to solve the separation between large models and scenes.
Internationally, major companies such as Microsoft and Amazon have also begun to seek commercialization paths from enterprise-level services, and have begun to explore multiple industries; domestically, Baidu, Alibaba, Tencent, and Huawei are all speeding up investment in large-scale industry models. In addition, many industry leaders and start-up companies around the world are also exploring the prospect of large-scale industry models. Recently, the Beijing Municipal Science and Technology Commission and the Zhongguancun Management Committee also released the first batch of 10 application cases of large-scale artificial intelligence industry models in Beijing. In addition, the amount of mergers and acquisitions of related technology routes has also reached new highs...
Upgrade: Thousand Models War
If the basic model is a "hundred-model war", the industry's large-scale model is a "thousand-model war". Just like the trunk grows branches, each basic large-scale model manufacturer can incubate several industry large-scale models. unanimous.
"Although everyone has high expectations for the general-purpose large-scale model, it is not necessarily the optimal solution to meet the needs of industry scenarios." On June 19, at the Tencent Cloud Industry Large-scale Model Conference, Senior Executive Vice President of Tencent Group, Cloud and Tang Daosheng, CEO of Smart Industry Business Group, said.
In the case that Hunyuan Assistant did not release it to the public, Tencent took the lead in releasing large-scale industry models. Relying on the Tencent Cloud TI platform to build a selection of large-scale industry models, it provides customers with one-stop MaaS services and helps corporate customers build exclusive large-scale models and Smart application. It is learned from Tencent that Tencent will release official information about the general model of the C-end in the future.
This series of measures may be understood as, regardless of the effect and progress of the Hunyuan basic large-scale model, the priority release of the industry large-scale model is a necessary move for Tencent to ensure its own reputation and seize market customers when customers are in urgent need.
Earlier, Tian Qi, the chief scientist in the field of artificial intelligence at Huawei Cloud, mentioned that Huawei divides the large model into three levels, L0, L1, L2, and L0 is what everyone calls the basic general model, like GPT-3, in the basic model L0 On the basis of , plus industry data, the industry large model obtained by mixed training is L1.
Then, L1 is deployed for specific subdivision scenarios of thousands of industries downstream, and the task model L2 of the subdivision scenarios is obtained. In order to reduce production costs and improve efficiency as soon as possible, how to quickly produce L2 models from the large industry model L1, and Deploying the L2 model to the device side, edge side, and cloud side is a very important issue.
It can be seen on the agenda of the upcoming Huawei Developer Conference in July that Huawei Cloud will conduct a series of interpretations and releases on how the Pangu model was refined from a basic model to an industry model.
At this year's Alibaba Cloud Summit, Alibaba Cloud CTO Zhou Jingren also said, "Today not all companies need to start training from scratch, nor do you need everyone to start from scratch to create a variety of corpus, including a large number of computing power resources, to grow from scratch. A series of customization of the model, we hope that based on the Tongyi Qianwen model today, combined with the enterprise's scenario, enterprise knowledge system, and enterprise's special needs in the industry, each enterprise-specific model will be generated."
Microsoft is also making its own industry model. In April, in China, the international version of Microsoft Azure OpenAI Service released the first three sets of Azure global innovation industry scenarios for retail e-commerce, manufacturing and digital native fields, integrating GPT-3 and GPT-4 for local enterprise users going overseas. , Codex, DALL-E, and enterprise-level ChatGPT, five large-scale model services, to help Chinese overseas enterprise customers accelerate their expansion into the global market.
The "thousand-model war" is about to break out, but it is still too early to really enter the stage of big waves washing the sand. On the whole, large-scale models are still in a relatively early stage of development. Although large-scale models in the industry are concentrated, there is obviously more room for this track .
Taking the large model of the financial industry as an example, it is divided into different fields such as securities companies, insurance, banks, and new finance. The downstream tasks of each field are divided into dozens or hundreds of sub-tasks.
"The more important moment is when based on the basic model, SFT and other mechanisms and structures can be efficiently adapted to downstream tasks, and when the downstream tasks of the financial industry or other industry models have a scale effect." In Alibaba According to Chen Haiqing, head of the Moyuan Innovation Business Center, it is only the beginning of the industry's large models and scenarios for continuous training through some universal unstructured data.
Sensible and realistic choice
If an enterprise wants to create a basic large-scale model with hundreds of billions of parameters, it needs a computing power of more than 10,000 cards in a single-machine cluster, not only a GPU card, but also the utilization of GPU cluster resources, which most companies cannot do.
The large industry model is obviously easier to realize, and it also has a broader application prospect.
"Large models can empower thousands of industries, but you must have a good understanding of the scenarios of thousands of industries, and you can't expect to train hundreds of billions or trillions of large models, which can be easily used by enterprise users," said Zhou Ming, founder of Lanzhou Technology. "From the general model to the industry model, it is necessary to do the last mile for the user's scenario."
After assessing the investment required for the basic large-scale model and weighing the pros and cons and gains and losses, enterprise customers quickly turned to the large-scale industry model, and manufacturers devoted more energy to it.
Tang Daosheng said frankly that the current general-purpose large-scale models are generally trained based on extensive public literature and network information. The information on the Internet may contain errors, rumors, and biases. Many professional knowledge and industry data are insufficiently accumulated, resulting in the model's industry-specific The accuracy and accuracy are not enough, and the data "noise" is too large.
However, in many industrial scenarios, users have high requirements for professional services provided by enterprises, and their fault tolerance is low. Once a company provides wrong information, it may cause huge legal liability or public relations crisis. Therefore, the large-scale models used by enterprises must be controllable, traceable, and correctable, and must be tested repeatedly and fully before they can be launched.
"We believe that customers need more industry-specific industry models, coupled with the company's own data for training or fine-tuning, in order to create highly practical intelligent services. What companies need is to truly solve the problem in actual scenarios. Solve a certain problem instead of solving 70%-80% of the problem in 100 scenes." Tang Daosheng said.
Zhu Yong, vice president of Baidu Smart Cloud, also said, "From the situation at home and abroad, we can see that there are not so many general-purpose models. Some manufacturers on the market actually make relatively small models. On the contrary, domain models are special Important, because the general model only has the ability of general knowledge, the domain model can be aligned with the task expectations of specific industries and domains, and solve the actual problems of the business. This process is very important, but the cost and resources required for this process are far less than starting from scratch Do the underlying general model."
At the same time, he also judged that there may be only a few basic models (underlying general models) in the future, but combined with data in the professional field and industry know how, many different types of domain models will grow on it. These domain models will be very prosperous in the future and support the upper layer. Prosperous domain applications.
Taking the large model of the energy industry "State Grid-Baidu Wenxin" created by Baidu Smart Cloud and State Grid as an example, Baidu Smart Cloud, together with State Grid experts, introduced the samples accumulated by State Grid in the power business into the general large-scale model Data and unique knowledge, and in the training, combine the experience of both parties in the pre-training algorithm and the business and algorithm in the power field, design algorithms such as entity discrimination in the power field and document discrimination in the power field as pre-training tasks, so that Wenxin large model can learn power in depth Professional knowledge, so as to truly solve practical business problems in the energy field, and achieve the purpose of reducing costs and increasing efficiency.
Zhu Yong said that the difference between the general model and the domain model can be compared to a person with a wide range of knowledge who has gone to university. He may know some medical knowledge, but he cannot diagnose patients and is not a professional doctor. The domain model is to learn medical knowledge in depth on the basis of strong general ability, and become a professional doctor who can contribute value in the medical field.
From a general model with a wide range of knowledge to a professional medical model, the cost of resources required in this process is far less than that of building a general large model from scratch, but it emphasizes that there are professional data, there must be It is driven by tasks in the professional field to stimulate it to produce such abilities.
How to do industry model
The large model itself is a new thing, which has changed the previous software development paradigm. Manufacturers need a new tool chain and platform to help customers polish the industry large model earlier and faster.
With the advent of the big model era, the efficiency of the last mile will be greatly improved. Zhou Ming mentioned that a new generation of software development paradigm is taking shape, mainly based on the fact that enterprises provide many functional engines, and users are now assistants to improve efficiency. On this basis, it is easy to construct a new application.
Take Wenxin Qianfan large-scale model platform as an example, it is a one-stop large-scale model development and service operation platform for enterprise developers. It not only provides the underlying model (ERNIE-Bot) and third-party open source large models, but also provides various AI development tools and a complete development environment to facilitate customers to easily use and develop large model applications.
For data management, automated model SFT, and cloud deployment of reasoning services, manufacturers hope to realize one-stop large-scale model customization services. The capabilities of the large-scale model building platforms of different manufacturers are basically similar, and the difference lies in the ease of use, the quality of the effect, and the supported software and hardware.
"Making a large model is indeed not cheap, but there are only two reasons why the large model service can be promoted in the end: the first is that the effect of the model is better, and the effect of the model is not good. Needless to say, the second is the cost." Baidu Xin Zhou, general manager of Smart Cloud AI and Big Data Platform, said.
In effect, the industry model should rely on the general model. For example, in general education, if there is no better general model, it is impossible to talk about the application effect in a specific industry. The Bloomberg GPT jointly launched by Bloomberg and Johns Hopkins is an example. In its data distribution, the general basic model data accounts for half, the public data of the financial industry accounts for half, and Bloomberg's own data accounts for 0.6%.
"In order for any model to achieve a better level of intelligence or basic capabilities, it must train the basic model with a relatively good number of parameters, and then integrate some industry professional data into the basic model to make an industry model." Xin Zhou said.
Baidu's idea is to launch a "big guy" (Wenxin Yiyan) and a very complete tool platform (Wenxin Qianfan), and then provide differentiated model services according to the actual needs of customers to help customers make the most cost-effective choice. They believe that price will not become a bottleneck for companies to embrace large models.
In addition to model calling costs and training costs, Baidu is also helping companies to further reduce costs. If companies only focus on their relatively narrow fields, Baidu also has a version with relatively low parameters, so that while ensuring the effect of the model, use or The cost of training models will drop dramatically.
In fact, there is no universal standard for the cost of building a large industry model.
First of all, different basic large models have different parameter specifications, and the investment in software and hardware must change dynamically according to the basic parameters and capabilities of the model. If the parameter is tens of billions, an A100 card can also run and start downstream tasks.
The current relatively concentrated application scenario requirements fall into this category, such as intelligent question answering, intelligent writing, and intelligent creation in knowledge management, as well as pan-Internet marketing scenarios and code generation requirements.
Second, the cost is related to the amount of data and the direction of application. The current global large-scale model pricing is based on 1000 Token as the basic unit. If the downstream tasks of an enterprise are very simple and can be done with only tens of thousands of tokens, then its cost is very low and it requires very few GPU cards. The amount of data required to build a large industry model is usually in G or even T, so its offline training cost will be very high.
**Who is running the race? **
Players flocked to the large-scale model track. This time, not only the first-tier Internet companies, but also more industry leaders and start-up companies joined.
Which industries can take the lead in breaking through? Perhaps it can be seen from the industry in which the cooperation case is located. As shown in the table at the beginning of the article, finance, medical care, education, autonomous driving and other fields are frequently used.
For example, when Alibaba Cloud released the Tongyi large model in April, it announced that it has launched cooperative explorations with a number of companies. The first batch of cooperative companies include OPPO Andes Smart Cloud, Geely Automobile, Zhiji Automobile, Chery New Energy, Momo Zhixing, Swire Coca-Cola, Bosideng, Palm Technology, etc. According to reports, the financial industry, retail industry, and some large-scale consumer-oriented scenarios and industries have accumulated a lot of public data and scenario data, which is convenient for building enterprise or industry-specific models.
According to public information, the number of Baidu Wenxin's large-scale industry models has reached 11, covering energy and electricity, finance, aerospace, media, film and television, automobiles, urban management, gas, insurance, electronics manufacturing and social sciences.
The first batch of ten large-scale model application cases in the artificial intelligence industry in Beijing released on June 27 involves energy and electricity, medical health, finance, autonomous driving, construction, scientific research, life, and question-and-answer fields. It is reported that during the period from June 27 to July 30, the Beijing Science and Technology Commission and the Zhongguancun Management Committee will also focus on key areas such as urban governance, medical health, scientific research, smart finance, smart life, and smart cities, and target innovation subjects in the city. , will collect more than 80 industry large-scale model application case projects.
"When we communicated with customers, we found that many customers don't know much about industry models, but they will take the initiative to ask for Baidu's industry models." Li Jingqiu, deputy general manager of Baidu Smart Cloud AI Platform, said that at this time, it will be specifically combined with the actual use of enterprises. Analyze the needs of the products and customers, such as what kind of capabilities do you want the industry model to have, what systems or applications to use, who will use these applications, and what effects do you hope to achieve... After asking these questions, you will really find customers What is needed is a large model based on Wenxin Qianfan's tool chain SFT, or a pre-trained model for the industry. The latter needs at least several months, or even last year, to build and deploy—from technical issues such as data processing, resource allocation at the computing power layer, to long-term training on common data in the industry.
From the hustle and bustle of the basic large-scale model to the beginning of the industry's large-scale model, entering the second half of 2023, a real business transformation will be accelerated.
It is also interesting to compare the paths of domestic manufacturers such as Baidu and OpenAI/Microsoft in the field of large-scale models. When ChatGPT showed global phenomenon-level popularity, some voices questioned why China could not produce ChatGPT. Of course, there is a technical environment In the end, many people still have a superficial consensus-"China's AI is more inclined to business applications and commercialization capabilities." To put it bluntly, China's AI has less patience and wants to make money .
But on the other hand, the market is the biggest driving force for technological development, and the grasp of time and rhythm has created different results. Take the large-scale industry model as an example. Microsoft is either waiting for the further maturity of the technology, or feels that the time has not come yet, and it is a step too late. Domestic manufacturers quickly transition from the basic large-scale model to the large-scale industry model. Have lasting vitality.
Lost in the east, harvested in mulberry, in terms of results, it is not a bad thing that the domestic industry's large models run fast.