🎉 #Gate xStocks Trading Share# Posting Event Is Ongoing!
📝 Share your trading experience on Gate Square to unlock $1,000 rewards!
🎁 5 top Square creators * $100 Futures Voucher
🎉 Share your post on X – Top 10 posts by views * extra $50
How to Participate:
1️⃣ Follow Gate_Square
2️⃣ Make an original post (at least 20 words) with #Gate xStocks Trading Share#
3️⃣ If you share on Twitter, submit post link here: https://www.gate.com/questionnaire/6854
Note: You may submit the form multiple times. More posts, higher chances to win!
📅 End at: July 9, 16:00 UTC
Show off your trading on Gate Squ
GPT-5 is not far away! OpenAI launched the web crawler GPTBot, which automatically grabs data and can be selectively turned off
Edit: Peach is so sleepy
Source: Xinzhiyuan
Guide: Just now, OpenAI launched GPTBot - a web crawler that can automatically grab data from the entire Internet. The resulting data will be used to train AI models like GPT-4 and GPT-5!
Some time ago, there was a turmoil in grabbing platform user data, and Reddit netizens were arguing.
Today, OpenAI launched a web crawler tool GPTBot, which can automatically scrape website data.
**how to use? **
OpenAI said in the published document that the web crawler will filter to remove sources that require paid access, but also remove personally identifiable information (PII) or text that violates its policies.
The data captured by GPTBot is used to train GPT-4 or GPT-5, which can improve the accuracy and capabilities of future artificial intelligence systems.
The tool can be identified by the following code:
User agent token: GPTBotFull user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +
Forbid access to GPTBot
On the other hand, you can also disable GPTBot from accessing websites by adding it to the site robots.txt.
This means that website owners must voluntarily take measures to prohibit OpenAI from accessing their websites and not using their own data for training.
User-agent: GPTBotDisallow: /
Custom GPTBot Access
You can also control GPTBot's access to some content of the website through the following code.
User-agent: GPTBotAllow: /directory-1/Disallow: /directory-2/
IP Export
For OpenAI's crawler, the website will be called from a block of IP addresses recorded on the OpenAI website.
Netizen Hot Discussion
OpenAI's move has triggered discussions among netizens on the ethical issues of web crawlers used to train AI models.
“OpenAI is not even moderately cited. It is making a derivative work and not citing it, thus obscuring the fact that it is.”
References: