The original "web 1.0" was a place to serve static pages built by companies. Along came forums and social media, and we suddenly had a "web 2.0" in which users created and added content. Tim Berners-Lee (inventor of web 1.0) coined the term web 3.0 to mean a web based on data that not only humans, but machines could process. If web 1.0 created an encyclopedia, then web 2.0 was Wikipedia and web 3.0 would make everything on the web into a massive database. How could web 3.0 be securely accomplished and managed? In a word, artificial intelligence (AI).
Why a machine-readable web matters
AI consumes and processes data, and the promise of web 3.0 is to make all of the web into data compatible with AI technologies. That would provide a massive AI training set, most of which is currently inaccessible as "unstructured data." The result could be a step function in AI capability. Imagine a Google, Siri or Alexa search that was able to use all data on the internet: today, if you ask Alexa a question, it might respond with "According to Wikipedia..." and read a web 2.0 article. In the future, it could understand the meaning of everything online and provide a detailed answer.
Broadening web 3.0
As the internet has evolved, the trend in its development has been to "decentralize" the web. Web 1.0 served up content controlled by companies, and web 2.0 is made up of platforms controlled by companies hosting user-created content. Why shouldn't web 3.0 provide a new platform for content to be added without a company controlling it? Simultaneously, blockchain emerged as a way in which anyone could post a transaction that would be validated and accepted by the consensus of a community instead of a platform owner. Those uncomfortable with the control of web 2.0 platform owners over content suddenly envisioned user content on distributed and decentralized platforms.
Is that a redefinition of web 3.0? Not entirely. What Tim Berners-Lee described was a web with inherent meaning, which focuses on how data can be consumed. The new definition of a decentralized web focuses on how data gets added. There is no conceptual reason why both can't be right at the same time. Web 3.0 is a platform in which anyone can add content without the control of centralized gatekeepers and the content has meaning which can be interpreted by people and machines.
Cyber risks of web 3.0
While the vision for web 3.0 offers multiple opportunities for growth and development, it presents security concerns. A poorly defined web 3.0 can pose cybersecurity risks for a number of reasons.
- Information quality: Web 1.0 relied on the reputation of publishers to be accurate. Web 2.0 lowered data quality, leading to the efficacy of mis- and disinformation on the web. Will the consensus to accept machine-managed data in web 3.0 include accuracy checks? Who gets to make the decision, what are their qualifications, and what motivates them to be fact-based instead of promoting an agenda?
- Data manipulation: Intentional manipulation of data that will be used for training AI is a huge cybersecurity concern. People can create bad data to manufacture the results they want, making AI the world’s biggest disinformation system. When Microsoft decided to train their chatbot "Tay" by letting it learn from Twitter, people intentionally sent malicious tweets that trained the machine to be racist. Imagine what a nation-state could do to disrupt things by feeding AI misinformation data or changing the meaning of words. How will cybersecurity professionals find, block and remove data that is designed to deceive?
- Web 3.0 availability: If our systems depend on data, what happens when that data is unavailable? The web today is full of broken links. Machines will either need to make local copies of everything on the Internet or retrieve information on demand, such as in web 2.0. This could increase dependency on the availability of systems IT teams have no control over.
- Data confidentiality: Data breaches compromise confidential information constantly. On top of that threat, content can be accidentally released or placed in an unsecure location. With machines scanning and including that data in their knowledge base, they suddenly increase the likelihood of private data not just being found, but used. Cybersecurity leaders need to bolster their defenses to anticipate a system with the potential to spread confidential information faster than ever before.
More cybersecurity concerns will likely arise as web 3.0 takes shape. Still, it makes sense to consider solutions for privacy and security from the start. The future of the web without gatekeepers, holding content meaningful to people and AI, sounds like a dream come true. Security should be built in from the ground up to keep that dream from becoming a nightmare.