Scientific Software
Volcano Monitoring Software
Real Native Software, Not A Web Subset
TiltSoft Blog - About Things

Generative Sewer

ChatGPT is everywhere. Not one day without being bashed with it. But...

AI and generative AI
First, do not think it's the same thing. AI has been around for decades. I even used this in the 90s to automatically classify seismic events. But generative AI is all the hype today.
AI is auto/self learning in known processes.
It's very helpful in handling millions of calculations for, say, pattern recognitions: faces, postures, cars, obstacles, shapes, cancer cells, etc... This works on more and more accurate probabilities. Those are based on a coherent targeted dataset.

On the other end, GPT, an AI machine learning system is not supervised learning using trained and accurate data sources.
Manual training is time consuming but on target, with a focused dataset.
Replacing this manual selection with quantity means replacing coherence with the crowd noise. GPT crawls the web for everything providing anything, but not on target, focused datasets, it's mostly garbage.

ChatGPT is based on this idea, but it populates its system with the entire internet content, it's too vast to be fed by humans. So value has the data source, what is it truly?? Garbage.

Copyright Infrigement
By crawling the entire internet, ChatGPT doesn't care about copyrights, it swallows everything. If you website is on its path, even if you add comments like "this content cannot be used by machine learning, generative AI", it doesn't care and your work will be used against your will.
Without even talking about newspaper contents that are not public. Without even talking about authors whose work is just purely stolen to feed the machine learning models.

The big sewer
When the internet started it was an amazing communication tool and an even more wonderful community knowledge sharing agora. People asked professional questions, got answers, shared information. It unfortunately drifted into a big stinky sewer over a few decades, where most humans have no clue about anything but talk a lot about everything, hence feeding GPT with that garbage.

Promote garbage
GPT is reproducing and is self building up itself on that emptiness. Just look at the results you get when you really want something accurate. If you write a PhD thesis in a specific domain, it is helpless for unknown domains, uncharted territories. PhD candidates won't find answers as their research, domains are still unknown, hence not fed into GPT. It will not have a valid and reliable dataset before it steals your PhD work without your consent.

GPT leans towards the general talk (bad) not the specific professional one, it is fed from this sewer and just statistically provides/amplifies more sewer content, nothing else.

Self maintained sewer?
What will be interesting in the near future, when GPT will recursively become even more putrid when re-ingurgating its own fecies and enter an infinite (deadly?) spiral...
How will it be able to sort the grain from the chaff?

GPT can be refined, and it's a promising tool, but still is based on poor quality stolen data and fake news.
GPT just guesses, it doesn't invent anything, it's probability, it's plausibility, but nothing else.
Accurate probabilities are based on quality data.
Nuance that makes us human is non-existent in GPT. Just try to translate some sentences in Google or DeepL and you will see that the context is barely understood and results may be really funny. Will it improve overtime? Certainly. Will it be perfect? Probably not as, again, its data source is unreliable.

Certified data and regulation
GPT could only work with certified data. Who would do this certification job? Where is the cursor between certified and plausability? Maybe GPT should crawl only certified and verified information websites.
But who decides what is valid or not without leaning towards censorship, influence, crowd manipulations? This is the real challenge for generative artificial intelligence. Imagine what autoritarian regimes are doing with such a tool!

Let's use some dirty language: regulation.
Should we Regulate GPT ? Yes, just like we regulate traffic. Just like we regulate sewers.
In Roman times, sewers were not regulated and pandemics were common. Do we want to replicate such an environment? The pandemics could be populations without wisdom.
We're in dangerous grounds.

Illustration from Vecteezy


TiltSoft - Blog about Life, Technology, Development

We don't collect data from you, we present ours.
The only thing we register is counting which page is visited, but there is no information from you.