The Value Alignment Problem’s Problem

10 January 2017, 1409 EST

Having recently attended a workshop and conference on beneficial artificial intelligence (AI), one of the overriding concerns is how to design beneficial AI.  To do this, the AI needs to be aligned with human values, and as such is known, pace Stuart Russell, as the “Value Alignment Problem.”  It is a “problem” in the sense that however one creates an AI, the AI may try to maximize a value to the detriment of other socially useful or even noninstrumental values given the way one has to specify a value function to a machine.

As Russell explains:



The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem:

  1. The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.
  2. Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.

A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable.  This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want. A highly capable decision maker – especially one connected through the Internet to all the world’s information and billions of screens and most of our infrastructure – can have an irreversible impact on humanity.


In essence, The Problem is identical to one we see in moral philosophy in relation to utilitarianism.  If one attempts to maximize one’s utility, specified as x, one may end up with morally repugnant conclusions, such as the violation of the rights of others.   For instance, if you told your AI assistant to go to the pharmacy to pick up your medication, you do not want it to violate the rights of others when it does so.  You want it to obey traffic laws, pay for the medicine and not steal it, stand in line and take its turn, and not complete its task in the “most efficient” way possible.  Efficiency, or in some parlance “optimization,” is in relation to task completion and solving in computational (polynomial) time, and this may actually be in conflict with other people’s interests or rights.

However, when discussions about how to actually create a value-aligned AI begins, there is a problem with The Problem.  That is, no one appears to have the terms clear, and if they fail to understand the nuance of “values” and “beneficial” to or for humans, one cannot actually design systems to be useful for humans.  I suggest, therefore, that we put some basic terms on the table to help the AI community in its thinking about The Problem.

Objective vs. Subjective Values

In conversations with various people in the AI community, when pressed about what they really mean when they say “value-aligned AI,” the response is typically “the user’s values” (as opposed to Russell’s claim about the human race above).  In essence, what is meant by “value” is really more what political scientists or economists would call a “preference.”  It isn’t something objectively valuable but something subjectively valuable to the person’s personal point of view.

However, there is a fundamental difference between objective and subjective values.  An objective value would be true regardless of the preferences of any particular agent.  In moral philosophy, we could identify a variety of such values, or principles say, if we were moral realists.  We could identify things like: rights, duties, permissions, excuses, and justifications in relation to particular objective values (such as life, freedom, privacy, etc.).  Of course, if we were not moral realists, but were instead relativists or collectivists, we could say that only AIs designed to particular people or communities could ever be so aligned.  Even then, however, I’d suggest there is some sort of regress from user to community ad infinitum.   Why is this important and why is it that social scientists and the humanities should be involved in this debate?

For the very simple reason that we know a lot about objective and subjective values.  The AI community may in fact need us to help them.  Otherwise they will end up building AIs that are very good at identifying what their particular user may want, or think they want, but not in linking that AI’s actions with objectively valuable reasons and actions.  (For another approach about learning about the user and not taking on values oneself, see Russell’s approach on cooperative inverse reinforcement learning here.)

An example might be helpful here for those of us not steeped in the particularities.  Let’s say that you download a new AI application that is a decision aid, personal assistant, and more – call it the “BFF app.”  Its entire design is to be your best friend, schedule things for you, find helpful tips for you, etc.  However, if it is created in a way to maximize your values, then it may take on your values as its own to the detriment of others.  Let’s say it thinks you want more free time because you are stressed out, so it clears all your meetings, despite the fact you need to go to them.  Or, perhaps you ask it for directions to a new place to meet a friend for lunch, but it reasons that you don’t need any more friends and takes you to a remote location instead.  These would be small but irritating conclusions.  It would be far worse if you actually had misanthropic preferences, and it learned how to maximize these.   Indeed, one might claim that the AI actually needs to know when to say “no” to bad preferences, as well as needing to have some sort of grasp of balancing values.

Thus, if one really wanted a value-aligned AI, the AI actually needs to understand the moral universe, and how rights, obligations and permissions work, as well as when the user is asking it to do something contrary to the rights of others.  But to grant this would be to then claim that The Problem is not about merely getting an AI to do what the user wants, but what is good for “people” writ large.  That is, to solve for the values of humanity.

What is more, we ought to question the notion that as long as preference maximization is possible, it is beneficial.  Intelligence is not merely about preference maximizing and optimization.  Relations of power about who’s preferences are included, maximized, made visible versus those who are excluded, minimized and hidden is a moral and political choice.   And if the reader thinks this is some hypothetical futurist vision that does not need addressing now by fields like political science, sociology, economics, philosophy and law, and not merely computer science and robotics, then we need look no further than the present debate on algorithmic bias in criminal sentencing, the market place, housing applications, job interviews, and much more.   The future of such systems that are not based on algorithms, but “policies” and learning, is going to be even more pressing.