will-artificial-intelligence-outsmart-us Archives

Which Skills Should Humans Learn in an Age of ‘AI’? In my previous article, I looked at the new challenge that faces all who teach online. How do we stop students from using AI to cheat on assessments?

Well, the short answer is: we can’t. Not entirely. AI is now good enough at answering questions to pass some quite tough exams, for example, to become a licensed doctor. On many questions of fact, the AI could be generating the entire answer and the student would not be tested at all.

In such cases, we would really be testing students on how good they were at using AI. This is not a facetious idea. As AI is such a wonderful research assistant, perhaps we should be training students to use it – wisely.

Learning & Writing with AI

We know that AIs don’t always give correct answers because the data used to train them is not always correct. So students using this technology need to check the answers. Also, I’m beginning to hear that Google is finding and eliminating AI-generated content from search results. If Google can do that, then plagiarism-checking tools will soon do that too (damn that AI).

So students will need to check their AI’s output, perhaps paraphrasing content and changing its style to suit. Ironically there’s an AI tool for that too! They may also need to add some personal touches. Google prioritizes E-E-A-T: experience, expertise, authoritativeness, and trustworthiness. Students probably need to do the same.

That said, AI really is a wonderful research assistant. Suppose you feed it your exam question: “Write me an essay about Napolean” and you add “citing sources used”. If your chosen AI does so, you might get a reasonable essay, with citations so that you can fact-check and correct it. Doing so will give you a better essay, which you can then make your own. Result: a good essay!

(Please note that Chat GPT-4 will not write you a whole essay, it will only provide the structure and sources.)

Enter Napoleon

Now, you still have to do some work. But without the AI, it would have taken you many hours to discover lots of things about Napoleon. (Remember: we don’t know what we don’t know.) You could submit a good essay much quicker than without your AI research assistant. Or …

… you could use the time saved to take it to the next level. Supposing you discover that there are two different schools of thought about Napolean (quite likely about any major subject). You could now instruct the AI to write the same essay but twice – once from each point of view. Using these results, you can compare and contrast them and make your own assessment.

You now have a great essay! Perhaps, more importantly, you’ve taken your learning, about Napolean and historical analysis, to another level. You used the AI to do the drudgery so you can focus on the clever stuff. Now you have – rapidly – learned some high-level, transferable skills that you can apply to any historical analysis.

Okay, I’m a safety engineer, so I’m not likely to be answering exam questions about Napoleon. I might conceivably be asked to discuss the approaches of, say, Jens Rasmussen versus Erik Hollnagel. Personally, I’d rather not, but understanding different theories on risk and accident causation is relevant to my profession.

Whatever you are doing there’s probably an AI for it, in fact, there’s a site with over 3,000 AI tools that do all sorts of things. However, this isn’t an article on how to do things with AI, so…

Back to the Challenge

The challenge facing online educators is to assess students in a way that tests the student, not the AI. Online education is a multi-billion-dollar business, and AI could undermine the credibility of most qualifications, so this is a critical issue.

I think it’s fair to say that we won’t all go back to physically sitting exams in a room with strict security (although I did just that to get my CISSP certification). The costs are too great, and we need remote assessment techniques.

This means that universities and other education or training providers will look for assessment strategies that AIs struggle with. This means that – if we want top marks – we will need to be good at things that AIs don’t do well.

Are there any things that AI can’t do (yet)? If so, what are they?

We Reflect on ‘AI’

We have to remind ourselves that ‘AI’ is not really intelligent. A lot of what is sold as ‘AI’ is just using statistics to analyze lots of data. I’ve worked with a statistician, and I was amazed at what she could deduce from a data set. Even human behavior is amenable to statistical analysis. We all like to think that we’re original and unique, but we’re mostly not. Sorry.

The next level up from statistics is Machine Learning (ML). This is a phrase that represents what’s going on much better than ‘AI’.

Machine Learning

ML is much more powerful than statistics because it uses a variety of algorithms. These can be much more complex than generic, statistical equations. Specific algorithms are developed to solve specific classes of problems.

Nevertheless, all ML works by training algorithms on a data set. Humans review the results and tweak the algorithms or the data set, or both, to produce better results. Or perhaps we give the machine a goal and it tweaks itself to get there better and/or faster.

ML is so effective because decades of research by the best human minds have gone into developing it. An awful lot of human ingenuity is encoded in those algorithms.

ML itself though works by brute force. Computers are very fast, and they can process vast amounts of data. This data is now easily accessible on the internet, which contains a significant proportion of the vast treasure store of human knowledge. ML isn’t intelligent, it just appears to be because it has been trained by vast repetition. It impersonates human intelligence by copying, merely by rote learning.

It’s been said that to really be intelligent AI must be able to create something truly original. That article refers to an AI playing the Asian game ‘Go’ – a game rather like checkers. The AI beat a world champion using a revolutionary strategy that no human is ever taught. However, even with this example, I note that Go is a 2D board game where all the counters are identical in character. Surely, this is a problem that is inherently amenable to being solved by a computer?

But so what?

Well, if we humans want to stay relevant, then we need to do things that machines can’t. If we understand what they can and can’t do, and get better at the latter, then we add value.

We Reflect to be Different from Machines

In my previous article, I mentioned that Chat GPT-4 struggles to reflect on learning. If we go online and look up the word ‘reflect’, we get:

embody or represent (something) in a faithful or appropriate way.
think deeply or carefully about.
(of an action or situation) bring credit or discredit to the relevant parties.
Google Search

We have three meanings here, as follows:

To represent – to portray, describe, or paraphrase, but not copy – something faithfully or appropriately. We are not simply repeating details, but capturing the essence of something.
To think deeply and carefully – not quickly or superficially.
To make a value judgment about something, its validity, morality, or desirability.

At this point, my fellow engineers, as well as scientists and mathematicians, might be wondering what this has got to do with them. After all, 2+2=4, and what is there to reflect on? This ‘reflection’ sounds like something that arts and humanities folk do. OK, perhaps psychologists and business studies too. But us?

I think we do. In terms that might appeal to engineers, etc., let’s call it the difference between ‘verification’ and ‘validation’.

Verification versus Validation

Verification asks: “Did we build the thing right?” We can answer that question by testing it, inspecting it, or analyzing it: does it do what it’s supposed to? If we can’t fully verify the product, perhaps we need some process evidence as well. Did we develop it using a sound process? Does it comply with or conform to applicable standards?

Verification may be complex, but it’s mechanistic. In verification, “right” means correct – and only that.

Validation asks: “Did we build the right thing?” In this case, “right” means a whole lot more than just correct.

It means complete: did we do the whole job? Meet the overall need and not just the written specification? It means comprehensible: does it make sense in context? is it usable by those who need to? is it appreciated by those who paid for it, or wanted by those who might pay for it?

It may also mean other things. Does it help? Is it ethical? Sustainable? Valuable to a person, group, or society as a whole?

A thing can be successfully verified yet fail validation, in one or more ways. Becoming skilled at reflecting on the wider implications of what we do can help us all, no matter what our field of endeavor.

We Curate, not Just Collect as Machines Do

One of my hobbies is writing fiction – badly. Again and again, I read that to get better, I must read better. I must read a lot, but not just in quantity; I must read the best quality I can get, the best, most successful authors. Writers should not just read within their chosen genre, either, but they must get out of their comfort zone and read all sorts.

Similarly, I’ve heard it said that ‘the best bands have the best record collections’. The best is not the biggest, but the broadest collection of good-quality music. The aim is not just to collect, but to curate.

This makes sense as we seek to differentiate ourselves from competing machines. Earlier versions of Chat GPT (and other ‘AI’s) were trained on millions or even billions of web pages. We can’t compete with machines on quantity. Referring back to my previous article, I note that Chat GPT-4 is “safer and more aligned” (good validation words) because it was trained on a human-curated data set.

Mere repetition is not going to help us. We need to reflect on a broad range of the best-quality stuff we can find. Looking deeper, and slower, asking those ‘validation’ questions. Skills like comprehension, summarising, and producing a precis of others’ work are valuable (b*gger me, my English Literature teacher was right all along). Drawing what I see, not what I think I see (thanks are also due to my Art teacher). Learning from disciplines other than the ones we practice.

Being a well-rounded person, I guess.

What do You think?

‘How Should We Learn in an Age of ‘AI’?’ is the first in a series of articles addressing this topical subject.

Introduction

I’ve created and taught courses on technical subjects for about 20 years now. I started when I inherited a half-finished course on software supportability in 2001. The Royal Air Force relied on software in all its combat aircraft but knew precious little about software, and less about how to support it. We needed that course.

After I left the Air Force, I joined a firm called QinetiQ. I discovered that we had a contract to teach safety to all UK Ministry of Defence staff that required it; the classroom was just down the road from our office. I joined the instructing team.

With that experience, I created and taught bespoke safety courses for the Typhoon, Harrier and Raytheon Sentinel platforms. I also helped create a safety course for the UK Military Aviation Authority. Since moving to Australia, I have created and sold courses commercially, teaching home workers online for the first time.

It’s still difficult to access system safety training in Australia, and that’s why I started the Safety Artisan. In my business, I am only teaching online.

The Problem

Recently I’ve been in discussions with colleagues in industry and academia about improving system safety education in Australia. Because of the COVID-19 pandemic, learning has gone through a revolution. We are now learning online much more than we ever did; in fact, it’s the ‘New Normal’.

Now another revolution has occurred: generative Artificial Intelligence (AI).

“Generative AI is a set of algorithms, capable of generating seemingly new, realistic content—such as text, images, or audio—from the training data. The most powerful generative AI algorithms are built on top of foundation models that are trained on a vast quantity of unlabeled data in a self-supervised way to identify underlying patterns for a wide range of tasks.”
© 2023 Boston Consulting Group, https://www.bcg.com/x/artificial-intelligence/generative-ai

This presents a challenge to anyone designing an online course that leads to a certification or award. How do we assess students online, when we know that they can use an AI to help them answer the questions?

In some circumstances, the AI could be generating the entire answer and the student would not be tested at all. What we would really be testing them on is how good they were at using the AI. (I’m not being facetious. As AI is such a wonderful research assistant, perhaps we should be training students to use it – wisely.)

Enter Chat GPT-4

OpenAI, the creators of Chat GPT-4, make some big claims for their product.

“GPT-4 is more creative and collaborative than ever before. It can generate, edit, and iterate with users on creative and technical writing tasks, such as composing songs, writing screenplays, or learning a user’s writing style.”
OpenAI, https://openai.com/product/gpt-4

“GPT-4 can accept images as inputs and generate captions, classifications, and analyses.”
ibid

“GPT-4 is capable of handling over 25,000 words of text, allowing for use cases like long form content creation, extended conversations, and document search and analysis.”
ibid

But perhaps most significant of all is GPT-4’s claimed ‘safety’:

“We spent 6 months making GPT-4 safer and more aligned. GPT-4 is 82% less likely to respond to requests for disallowed content and 40% more likely to produce factual responses than GPT-3.5 on our internal evaluations.”
ibid

In other words, GPT-4:

Is less likely to regurgitate nasty sludge from the bottom of the web; and
Is more likely* to not make stuff up.

*Notice that they said “more likely” – this is not certain or assured. (More on this in a later article.)

This is because the creators were more selective about the data they used to train the model. Presumably, this implies that previous efforts just used any old rubbish scraped off the web, but nobody is admitting to that!

The Beginning of an Answer…

One of the academics I’ve met (sorry, but I can’t give them credit, yet) has studied this problem. They’ve come up with some interesting answers.

In their experiments with GPT-4, they found that it was very good at the things you would expect it to be. It was great at answering questions by gathering and collating facts and presenting written answers.

But it wasn’t good at everything. It was not good at reflecting on learning, for example. GPT-4 could not reflect on the learning that the student had experienced. Similarly, it could not extrapolate what the student had been taught and apply it to new scenarios or contexts.

Therefore, the way to assess whether students really know their stuff is to get them to do these things. Most assessment marks can still be straightforward questions, which an AI could help answer. But a few marks, maybe only 20%, should require the student to reflect on what they had learnt and to extrapolate it to a new situation, which they must come up with. This bit of the assessment would separate the also-rans from the stars.

…And a Lot More Questions

Now there are obvious, mechanistic, reasons why the AI could not perform these tasks. It had not been exposed to a student’s learning and therefore could not process it. Even more difficult would be to take a student’s life and work experience – also unknown to the AI – and use that to extrapolate from the taught content.

(Okay, so there are possible countermeasures to these mechanistic problems. The next stage is that the AI is exposed to all the online learning alongside the student. The student also uploads their resume and as much detail as they can about their work to teach the AI. But this would be a lot of work for the student, just to get those last 20% of the marks. That would probably negate the advantage of using an AI.)

However, the fact is that GPT-4 and its brethren struggle to do certain things. Humans are great at recognising patterns and making associations, even when they are not logical (e.g. ‘whales’ and ‘Wales’). We also have imagination and emotion. And we can process problems at multiple levels of cognition, coming up with multiple responses that we can then choose from. We also have personal experience and individuality. We are truly creative – original. Most AI still struggles to do these things, or even pretend to.

So, if we want to truly test the human learner, we have to assess things that an AI can’t do well. This will drive the assessment strategies of all educators who want to teach online and award qualifications.

And, guess what? This is where the $$$ are, so it will happen. Before COVID-19, education was a massive export earner: “Australia’s education exports totalled $40bn in 2019.” This is according to the Strategy, Policy, and Research in Education (SPRE).

This then begs the question:

What Else Can Humans do that AI Can’t (Yet)?

Why? Because if these are the skills on which we will be assessed, then we need to focus on being good at them. They will get us the best marks, so we can compete for the best jobs and wages. These skills might also protect us from being made redundant (from those well-paid jobs) by some pesky AI!

This is what I’m going to explore in subsequent articles.