Saturday, December 17, 2022

My Dinner with ChatGPT

 I have nothing original to say about OpenAI's ChatGPT.  I'll say it anyway.

Calculating and Reasoning

ChatGPT will readily tell you that it is not a calculator and you shouldn't trust it for calculations.  What's interesting about my interaction on this is that the problems that showed up were not with the accuracy of the calculations, but with the choice of what to calculate.  For a language model, it gets the actual numbers right remarkably well (your mileage—or other units—may vary).

A Classic "Back of the Envelope" Question

I asked ChatGPT how many ping pong balls would fit in a 747.  The first response was to chide me for asking such a thing, and advising me that it would be a bad idea to fill a 747 with ping pong balls.  I explained that it was a thought experiment, and it allowed me to proceed.

It began with an estimate of the volume of a 747 as about 15,000 cubic meters.  I didn't find an actual number in a quick search online, but based on the 747 dimensions I found that seems like it's in the right ballpark.

Next it told me that a ping pong ball has a diameter of 2.7 centimeters.  This is way off.  A ping pong ball's diameter is actually 4 centimeters (but it seems it used to be 3.8 centimeters).  How did it come up with that number?  Here's a scary fact: a ping pong ball's mass is 2.7 grams.  Oh, boy.

The bot proceeded to correctly calculate the volume of a 2.7 cm sphere in cubic meters and divide 15,000 by that.  So far, so good, but it left out an important consideration.

I prompted it, pointing out that spherical ping pong balls can't be packed with perfect density.  It agreed, described two different ways of packing spheres (cubic and hexagonal), correctly reported the density for hexagonal packing as about 0.74, and proceeded to redo its calculation by multiplying 15,000 cubic meters by 0.74.  In doing so, it made every ping pong ball 1 meter in diameter, and it reported the new final number without regard for the huge difference between its original estimate and the revised one.

Interesting.

Confidence and Servility

You can tell that ChatGPT was trained on the Internet, because it never expresses any uncertainty or doubt.  Yet it has also been trained to accept correction with extreme humility.  The experience is a little bizarre.

Dressage

I had a brief chat about the sport of dressage.  In the course of the chat, ChatGPT told me that the USDF (US Dressage Federation) was the National Governing Body for dressage in the US.  I honestly can't recall whether that was completely spontaneous or a logical response to something I said, but I definitely did not ask that specifically.

I said I thought that the USEF (US Equestrian Federation) was the NGB for dressage.  The bot immediately agreed, apologizing for its mistake.  This is disconcerting.  It just feels unnatural for a conversational partner to assert a fact with complete confidence and then turn around and admit it was entirely mistaken with total equanimity.  "Bygones."

SQL "QUALIFY" and "HAVING"

Bear with me here, I won't make you learn SQL.  I asked ChatGPT about "QUALIFY" clauses in SQL, because I only learned about them relatively recently and I'm still trying to get the hang of how to use them and when they are useful.  After I asked the bot for an example, it gave me one that could have been done using the more familiar HAVING clause.  So, I said "show me an example of using QUALIFY where HAVING would not work".  And it did.  

The example it gave did use "GROUP BY",  though, which is used with "HAVING", so I said, "Are there cases where I would use QUALIFY and not have any GROUP BY?"  It proceeded to show me a query written with just QUALIFY, and another version written with GROUP BY and HAVING.  Then it said the fateful text:

Both of these queries would produce the same result, but the first query uses the QUALIFY clause without a GROUP BY clause, whereas the second query uses the GROUP BY and HAVING clauses.

But that wasn't true.  I asked: 

Are these two queries equivalent? Can't the first one have multiple rows with the same salesperson and product?

And, without hesitation, ChatGPT said: 

The two queries are not equivalent because they produce different results.

It went on to explain exactly why the two queries produce different results.  And that's where I'm stuck: ChatGPT will happily explain to you exactly how its earlier, confidently presented answer was wrong, yet it isn't able to figure out that the answer is wrong before you prompt it.

ChatGPT Is Not Intelligent

People much smarter than I am keep pointing out that ChatGPT is manipulating words, not concepts.  We linguistically-oriented humans think being good with language is a distinguishing feature of intelligence.  That's probably because we've no previous experience with something so good with words and bad with ideas.

Just above I said the bot isn't able to figure out on its own that the answer is wrong, and one reason, I think, is that ChatGPT lacks any sort of "executive function" that introspects and asks, "Is what I'm saying right, or does it just sound right?"  And that—not the threat to professional writers, student essays, or junior programmers—is the truly alarming problem the technology poses in its current form.