DXtera Institute and Riiid Labs, alongside over 50 other international organizations, has recently launched the EdSAFE AI Alliance. The planning for this has been in the works for a while and we think it’s useful to reflect on some of what this all means, and the limitations and challenges that exist around this endeavor.
In EdSAFE AI the acronym S.A.F.E. stands for Safety, Accountability, Fairness, and Efficacy. Determining whether an AI algorithm designed to enhance human education is safe (will not harm or hinder a learner) and fair (ethical, equitable and unbiased) is a noble pursuit, and one for which we would hope standards and tests could be established. But it might not be that simple.
In their paper “Superintelligence Cannot be Contained: Lessons from Computability Theory”, Manuel Alfonseca and his coauthors argue that the problem of determining whether a sufficiently intelligent machine algorithm will or will not harm humans is “undecidable.” In other words, it is impossible to design another algorithm or technique that can definitely tell whether such an AI is harboring harmful code.
Their work builds on the same kinds of proofs that Alonzo Church and Alan Turing employed in 1936 and 1937 to show that there is no way of determining whether an arbitrary computer program, given arbitrary input data, will eventually stop processing or go on forever (the “halting problem”). It is considered to be computationally undecidable. Turing’s and Church’s work has led generations of researchers, like Alfonseca and his colleagues, to study decidability around many aspects of computational systems.
Re-stating the Alfonseca et al proof to better align with the terminology of the EdSAFE AI: this suggests that the problem of determining whether a sufficiently intelligent AI algorithm designed to enhance human learning is “safe” and “fair” is equally undecidable.
This should come as no surprise at a fundamental level, particularly as AI efforts aim to close the gap between machine intelligence and human intelligence. We all know at heart, and probably from experience, that whether or not another intelligent human intends to do us harm is also relatively difficult to determine. In the domain of education we know that standards and tests created to identify whether a flesh-and-blood teacher is effective and unbiased regularly fail to identify bad actors, exposing learners to unexpected harm through the actions, or non-actions, of other humans.
What, then, does this mean for AI in education? Does this mean we should abandon all hope in defining standards and certifications for educational AI? Of course not. In the same way that we have not abandoned hope for evaluation of human educators, because there is still much we can, in fact, do.
In their “Response to ‘Superintelligence cannot be contained: Lessons from Computability Theory‘” Jaime Sevilla and John Burden argue that while the proof presented by Alfonseca et al is sound, it must be remembered that it applies to a special class of superintelligent programs. Such programs would exhibit general intelligence and would be able to arbitrarily access any and all other algorithms available in their world.
In reality software engineers are typically dealing with specific machine intelligence towards a specific purpose, and there are indeed safeguards that can apply to more bounded problem domains and systems even within the realm of what we consider to be “AI.”
As an EdSAFE AI community, we must be aware of such research and conversations and what we can learn from them. We need to understand the theoretical limitations of what we set out to achieve, and understand the real boundaries of our domain of practice. We are entering into an exciting, challenging and critical endeavor, and there is much to learn and do.
If you’re interested in learning more about the work that DXtera and it’s community is doing, join us! We value and welcome input and expertise from around the world.