Dirk Knemeyer

The Digitial Life #260: AI Plays Poker

This week on The Digital Life, our special guest is Noam Brown, a PhD student in Computer Science at Carnegie Mellon University, who with his advisor, Professor Sandholm, created Libratus, an AI which decisively defeated four of the world’s best human poker professionals in a Man vs. Machine competition. The breakthrough was published in Science, received widespread mainstream news coverage, and continues to be cited as one of the milestone achievements of AI. Join us as we discuss poker, the application of AI to imperfect information games, and the possibilities for this kind of artificial intelligence to be used in negotiation and other real world scenarios.

 

Resources:
Noam Brown

Superhuman AI for heads-up no-limit poker: Libratus beats top professionals

How computers were finally able to best poker pros

Inside Libratus, the Poker AI That Out-Bluffed the Best Humans

Jon: Welcome to episode 260 of the Digital Life, a show about our insights into the future of design and technology. I’m your host, Jon Follett, and with me is founder and cohost Dirk Knemeyer.

Dirk: Greetings, listeners.

Jon: This week, our special guest is Noam Brown, and PhD student in computer sciences at Carnegie Mellon University, who with his advisor, Professor Sandholm, created Libratus, an AI which decisively defeated four of the world’s best human poker professionals in a man versus machine competition. The breakthrough was published in Science, has received widespread mainstream news coverage, and continues to be cited as one of the milestone achievements of AI. Noam, welcome to the show.

Noam: Thank you for having me.

Jon: So, to give the listener some background on this milestone poker-beating AI, Libratus, that you’ve created. We have Big Blue beating humans at chess, and AlphaGo, of course, in the past couple years, similarly having success against humans playing go. But the achievement of Libratus is significant because defeating human players playing poker is a much more complex problem, because it’s an imperfect information game. Noam, could you explain sort of the difference between perfect information games and imperfect information games, and why the latter is such a challenge for AI?

Noam: Yeah. Like you said, Ais have traditionally been very successful in dealing with perfect information games like chess or go, where both players know exactly what’s going on at all times. In a game of go … or, a game of chess, all the information that you need is available to you to make a decision. But in a game like poker, there’s hidden information. You don’t know what cards your opponent is holding, and so you always have to act with uncertainty of what their strategy is, or what situation you’re in.

Noam: And this is particularly challenging for AI. It makes things way more difficult. It makes it way harder to compute a strategy. And so, for a long time, researchers in AI just sort of ignored the issue, and they focused on these perfect information games like chess and go, and just sort of pretended that problems like poker didn’t exist, really.

Noam: Which is really unsatisfying, and so there were some of us, myself included, that thought this was a problem we should address, because the truth is, most real world situations involve hidden information. You can make an AI that plays chess, but it’s not going to be that useful in the real world if there’s hidden information involved.

Noam: And we really took a very different approach that’s compared to the Ais used for perfect information games. We did something that’s very different, and ultimately was successful, and I think that particularly, after AlphaGo and the success that AI has had recently with perfect information games, the fact that those techniques could not be used in imperfect information games, hidden information games like poker, really highlighted the importance of our research, and highlighted the importance of our accomplishment in beating humans at poker.

Jon: Yeah, so, you spoke a little bit there about how these imperfect information games are really representative more of the reality we face when we’re dealing with situations, perhaps, like negotiation or war, or anything where you don’t know what the other party is going to do next. I know Libratus was not just conceived as a poker-playing AI. I mean, that was the beginning of it. What are the other applications of Libratus over time, and are you working on any of those? How do you see your research and the achievements here, how do you see building on those, and what’s next?

Noam: Yeah, as researchers, our focus is not to develop an AI for poker. Our focus is to develop an AI that can handle hidden information, and we used poker as a test bed, as a way to benchmark our progress, but it’s not the goal. We view beating humans at poker as sort of like a milestone, and showing how far we’ve come in being able to handle hidden information, not the achievement that we were aiming for in the first place.

Noam: So, we don’t use techniques that are specific to poker. We don’t, for example, tell the bots to bluff at a certain percentage, or say that with these cards you should raise, and with these cards, you should fold. We’re developing a way for the AI to determine strategy on its own, so if it were faced with a different imperfect information game … And, by the way, I’m using “game” very loosely here to mean any sort of strategic interaction. I’m using game in the sense of game theory, not in the sense of a “let’s play Monopoly” sort of game.

Jon: Right.

Noam: So, if it’s given any sort of imperfect information game, any sort of imperfect information strategic interaction, it can figure out on its own what the optimum strategy is, without having that technique being dependent on the game being poker. And the way that it does this is through self-play. It’s actually very similar to how humans learn to play a game. You learn from experience.

Noam: The AI starts by knowing nothing about the game, and it plays totally randomly, and it plays itself. It plays a copy of itself in that game for trillions of iterations. Trillions of hands of poker, for example. And as it plays, it learns from experience that if it was in a particular situation and it raised and lost money, well, could it have gotten more money if it had folded instead? So, after the hand is over, it will review its decisions, and it will say at each decision point, what would have happened if I had taken some other action instead? If I had raised instead of called, would I have gotten more money? And if the answer is yes, then it will have regret on that action. It will regret not having taken that action, and if it can find itself in that decision point later on, or a similar decision point, then it will take that action that it regrets with higher probability.

Noam: And, in truth, this is actually very similar to how humans learn. If you’ve ever played poker with some humans, then you’ll know it’s very common for a person to ask, “What would have happened if I had raised instead? Would you have called me?” And that’s exactly what the bot is doing. It’s asking that hypothetical question of, what would have happened if I had done this other thing instead? And it’s able to get an answer because it’s playing against a copy of itself, and so it can ask itself that question, and it can give itself an answer.

Dirk: When it’s in the process of playing humans, is it making a book on each human? Is it saying, “I’m playing player one,” and putting a profile together on player one, or is it completely independent of the specific opponents?

Noam: That’s a good questions. Yeah, these algorithms are trying to find what’s called a match equilibrium. It’s trying to find a perfect strategy. Now, a match equilibrium is proven to exist in any game, and in particular in two-player, zero-sum games, if you’re playing according to the match equilibrium strategy, then you are guaranteed to not lose an expectation, no matter what your opponent does. This is what the AI is trying to find. It’s not trying to adapt to its opponent. It’s trying to find this match equilibrium strategy and play according to it, because it knows that if it’s playing this match equilibrium, then no matter what its opponent does, it’s not going to lose.

Noam: I think the idea that match equilibrium exists, the idea that this perfect strategy exists in poker, is surprising to a lot of people. But if you think about it for a bit, it’s really, you can see this in smaller games. For example, in rock, paper, scissors, we all know what the match equilibrium strategy is. It’s to throw rock, paper, and scissors with one-third probability each. And if you were to do that, if you were to just play that strategy, then no matter what your opponent does, you’re going to not lose an expectation.

Noam: Now, in the case of rock, paper, scissors, you’re not going to win an expectation. No, you’re just going to tie an expectation. But in a complicated game like poker, if you are able to play the match equilibrium strategy, then it’s likely your opponent will make mistakes, and by playing the match equilibrium strategy, you will, in practice, win, because you’re playing the perfect strategy.

Noam: So, we’re not trying to adapt to the opponent. In fact, during the competition, we never looked at the cards the opponent had, for example. We never cared which player we were playing against. We were always playing the same exact strategy, no matter who the opponent was.

Dirk: That’s really interesting.

Noam: Yeah.

Jon: I noticed, in the video about the competition, that Libratus was actually sort of learning as the games progressed, and patching holes in its strategy as the game continued on. So, the human players, whenever they found an opportunity to exploit, Libratus, the next day, would come back and make sure that that opportunity was no longer there. This continuous learning, is that related to … I understand there are actually three different AI working together as Libratus in totalis. Is that part of the mechanism of continuous learning, or could you explain more about how those three AI enhance each other?

Noam: Yeah. I wouldn’t say there’s three AIs. I would say there’s three components to one AI.

Jon: Okay.

Noam: Now, the first component is what I just described. It’s trying to estimate this match equilibrium. Now, we’re not finding that perfect match equilibrium through this self-play component, but we’re getting a rough idea of what the match equilibrium would be, and we’re doing that offline before the competition ever begins, and so we come into the competition with a strategy that the AI thinks is pretty strong.

Noam: But it’s not perfect, and so actually what it does is, during the competition, if it finds itself in a particular situation, when it’s actually playing against a human in a particular hand of poker, and it’s on the third betting round, for example, it will compute in real time a closer approximation to match equilibrium for the situation that it’s in at that moment. So, it will take 20 seconds to figure out, let me find a much better strategy for this particular situation, but that fits within this overarching blueprint strategy that I’ve computed for the entire game as a whole.

Noam: That’s the second component, and I think that was actually the big breakthrough with Libratus. Nobody had really found an effective way of doing real time equilibrium computation in imperfect information games before. But of course, with chess AIs and go AIs, thinking in real time, that’s a big part of those AIs, and it’s kind of surprising, in retrospect, that people didn’t really focus on this earlier in imperfect information games.

Noam: Now, the third component, which is what you described, is this idea that it was, I would say, in some sense, learning from the opponents. Now, I want to make it clear, it was not adapting to the opponents. It was not trying to exploit the opponents in any way. What happened is, because the AI is not perfect, it’s not computing a perfect match equilibrium, there are some parts in the game tree, there are some different situations, where it’s playing suboptimally. And that’s a problem, because if it’s playing suboptimally, then there’s opportunities for the humans to exploit it in those situations.

Noam: And that’s what the humans were constantly trying to do. Every single day, they were trying to find out, where are these weaknesses that we could take advantage of this AI? So, at the end of each day the AI would review the situations that it was finding itself in most frequently. It was trying to find the situations where the humans were, not necessarily successfully, but at least trying to exploit it. And it would come up with a much better strategy, a much closer approximation of the match equilibrium for those particular situations that the humans were focusing on. And then the next day it would have a much better strategy in those situations, so it would be far less exploitable for those points.

Noam: And so, this led to a sort of cat and mouse sort of game, because every day, the humans would try to find a weakness and take advantage of it, and at the end of each day, the AI would fix those weaknesses in preparation for the next day. So, as the competition progressed, these holes that were in the AI shrunk and got smaller and smaller over time, and the humans had even less opportunity to exploit the AI.

Jon: Yeah, that’s completely fascinating. I could see the, I don’t know if I’d characterize it as frustration, but just the reaction of the players to those changes over the course of the video. Very interesting, indeed. So, what was unexpected or surprising about how Libratus played the human players? I know there’s been descriptions of it that said, “This AI does not play anything like a human opponent.” What makes up those unexpected elements of play for Libratus?

Noam: Well, yeah. One of the really cool things about Libratus is that because it was trained from self-play, from playing against a copy of itself starting from scratch, without knowing anything at the game, it never looked at human data, and it came up with a very different strategy compared to how humans played. And so, when it started the match, the humans said it was like playing an alien. It was like playing somebody that had learned how to play poker on mars.

Noam: Some of the cool things that it did were, for example, it was using … I think the best example was, it was using bet sizes that were very different from human convention. In the human poker world, you typically bet a fraction of the pot, and that fraction of the pot is about .5 times the pot to one times the pot. So, if there is $300 in the pot, you might bet $150, or you might bet $300, and maybe in some really rare circumstances you might bet $500 at most.

Noam: But the AI didn’t feel a need to constrain its bets to those amounts, and it would some times bet three times the pot, five times the pot, sometimes even 20 times the pot. It would have no problem putting $20,000 into a $200 pot. And this was a big shock to the humans. It was very frustrating for them in particular, because they could be in a situation where they have a really, really strong hand, maybe the second-best hand that’s possible, and then suddenly the bot bets $20,000 into a $500 pot, and the bot’s basically saying, I’m either bluffing or I have the best hand.

Noam: And so, this human who’s sitting there with the second-best hand now has to think for a really long time, is he really going to lay down the second-best hand that’s possible just because the AI is saying that he has something better? And you could see the humans sometimes taking five or 10 minutes to make a decision. It was very frustrating for them. It was very satisfying for me, but I could understand the constriction in those moments. So, that was one difference.

Noam: I think, also, a big difference, it was bluffing … It learned to bluff, of course, because you have to to bluff to play poker well. But the situations that it chose to bluff in were pretty different. I mean, some situations were pretty different from what humans would do. And this was something that, I’m not a very advanced player, but I have a pretty decent idea of how to play the game. Actually, when we were developing the AI, I would look at the hands and try to get a sense of how well it was doing, and I would see these really weird situations where it would bluff with hands that didn’t make sense to me.

Noam: I actually called up my friend who’s a better poker player, he plays professionally, and I said, “Is this a smart move? Can you tell me if this is something that a human would do?” And he looked at the hand and said, “You have a bug in your program. There’s no way that it can be bluffing in that situation.” So, I looked at the code, and I realized the code looks totally fine.

Noam: So, I called up an even better poker player, one of the best in the world, and asked him, “In your opinion, is this a smart thing to do?” And he said, “Okay, this is not something that I would do, but let me take some time to think about it.” And he came back the next day, and he said, “I thought about it, I crunched the numbers, and this is not something that any human would ever do, but it’s actually a brilliant move.” And he said, “This bot is thinking two moves ahead of a human.”

Jon: Wow.

Noam: Yeah. That’s the kind of behavior we saw with this AI. It’s just really light years ahead of anything a human would do. And in fact, the humans that it played against, they told us that they were going to take some of these strategies and start using it in their own play, particularly these big over bets, betting huge amounts to the pot in some situations. They said that that’s something they’re going to do in the future when they play against humans.

Jon: That’s fascinating. I wanted to ask, you have some background creating, or researching strategies around algorithmic trading for financial markets, and that’s some of the work you did prior to creating Libratus. How did that research and strategic work affect your approach to creating this AI, or did it affect your approach?

Noam: Well, I think when it comes to computer science, math is math, and the math that’s used in financial markets and the math that’s used in developing artificial intelligence are related, even though they might not seem that closely related at first glance. So, I wouldn’t say that my experience in the financial markets directly influenced the development of the AI, but I would say that my background learning the mathematics, and learning the sort of strategic reasoning that’s involved in financial markets does [inaudible 00:18:56] carry over to developing AI. And in particular, the truth is that financial markets are an imperfect information game, just like poker.

Jon: Right.

Noam: And so, having an understanding of what that means, dealing with uncertainty, understanding that other market participants are going to react to your behavior, that’s something that you also see in poker.

Jon: So, conversely, do you think you’ll be applying any of the lessons learned through the creation of Libratus back to financial markets at some point? Is that an area that your research team is looking at, or that you’ve been considering as a future project?

Noam: Like I said, the things that we’re developing are not specific to poker, and they’re applicable to any imperfect information game. Now, financial markets are one example of that. Financial markets are an imperfect information game. So are negotiations. So are options. So are military situations. So, we see a lot of potential for applying these techniques to other domains in the future.

Noam: Now, we’re not specifically looking at financial markets right now, but I think that down the road, that is potentially something that we will look into, and I think that it’s going to have a huge impact. Everybody would love to see an AI … Well, I mean, some people. People would love to own an AI that can trade on the financial markets and do that well. So, there’s a lot of interest in this, but I think that there are steps that need to be taken to get to that point. I wouldn’t say it’s going to happen within three years, but maybe 10 years from now, we will see this being used in financial markets.

Dirk: What does your roadmap more specifically look like? Because you talked about poker as just a stop along the way, and you just talked sort of broadly about all these huge domains that could be covered, but what actually are you working on, in the short term and in the more medium term? What are you trying to do? I’d love to understand it better.

Noam: Yeah, one of the challenges … There are some challenges in extending to the things like negotiations and financial markets, and I see two main issues. One is that when you are dealing with a zero-sum game, which poker is a zero-sum game. Any money that you’re winning, you are taking from somebody else. Those games have nice properties that make it easier to compute an equilibrium solution.

Noam: But the real world isn’t necessarily like that. Maybe military situations are like that, but if you’re dealing with a negotiation for example, you have win-win outcomes. It’s not zero sum. And that’s important to understand. It’s important to understand that you and your adversary can both win in this game, and the techniques that are needed are a little bit different from the zero-sum setting. So, figuring out how to adapt the techniques that we have to this general sum setting, that is one of the things that we’re looking at in the short term, and I think that’s the smaller obstacle to overcome.

Noam: The bigger obstacle is that when you’re moving from a game to the real world, your strategies and your payoffs are not well-defined. In poker, it’s very clear what actions you can take in any given situation, and it’s very clear what the payoffs are for those actions. You win a certain amount of money at the end of the hand. But if you move to a negotiation, for example, your actions are not as clearly defined. You can negotiate over all sorts of things, and that may not be well-defined at the start. And the payoffs are also not as well-defined. How do you value certain outcomes?

Noam: In financial markets, for example, you may even think that it’s clearly defined, that it’s just dollar value, but that’s not necessarily the case, actually. It could be risk, it could be … Maybe you value short-term liquidity, you value selling an asset in the short term in order to have cash on hand. There’s all sorts of things that make this situation more complicated.

Noam: So, I would say that if you are able to define the model, then you can use these techniques in the short term, I would say within the next few years. But the fact that you can’t easily define the model in a lot of situations, that’s going to be an AI obstacle, and that’s something that a lot of people in AI are working on right now. You’ve seen a lot of success in AIs for games, and you’ve seen less success in the real world, and that’s the main reason, is that it’s not really clear how to construct the model in the real world.

Dirk: I mean, that’s the key to human negotiation, right? It’s not like a poker game, where there’s a specific pot of money, and I’m trying to get as much, and you’re trying to get as much. The reality is, you are motivated by things that sometimes I don’t care at all about, and so it’s a question of understanding what your motivation is in order to get more of what I want out of it, by … There’s a negative connotation to this word, which I don’t intend, but for me to exploit knowledge of what it is you want to get more of what it is I want. So, is a big part of … I mean, I was going to say, is a big part of where you’re taking this for the AI to figure out what the human wants? But based on how you solved the poker problem, which ignores what the human wants, maybe that’s a foolish suggestion. I don’t know.

Noam: I would say that you shouldn’t be too concerned about trying to exploit your opponent, and I think that, for example, if you’re a big company like Facebook, for example, and you’re trying to develop a negotiation AI, or Amazon, you’re trying to develop a negotiation AI, you’re not trying to exploit your user base. If you try to do that, people would be very upset.

Dirk: Right.

Noam: The point isn’t to exploit. The point is to be unexploitable. And that’s really what our research is about, is how do we make an AI that cannot be exploited? And I think that that is the more useful and the more promising line of research, compared to figuring out how to take advantage of the opponent. So, I think that that is the more promising path.

Dirk: Got it.

Jon: So, as AI continues to become more advanced, and clearly these systems start to permeate the real world a little bit more, and AI takes over certain tasks, what tasks do you think are sort of best suited for AI, and what tasks, conversely, would be better suited for humans to be participating in, at least in the short to mid term?

Noam: THat’s a good question. I think we’ve seen AI developing in a lot of domains in recent years, and the pace of progress has really been astonishing in the past five years or so. And I think there are legitimate concerns that AI is going to be replacing a number of jobs in the near future. In particular, driving, for example. Truck driving is a huge industry, a huge occupation. Taxi driving. That’s all going to go away, and I think that the consensus is pretty clear that that is the first major job to be eliminated, which I don’t think people are too upset about, honestly. Nobody really wants to be a taxi driver, right, I’d imagine.

Noam: And now, in the longer term, I think it’s also important to understand that there’s limitations to what AI can do, and I think that those limitations are being overcome every year to some extent, but there are some that are just so far off that it’s unimaginable to imagine AI, for example, writing a novel. I don’t think that AIs will ever, or at least, not in my lifetime, perhaps, have the understanding or the creativity to be able to write a prize-winning novel.

Dirk: That’s an interesting comment, though, Noam, because there actually are AIs that are currently being competitive in novel-writing competitions, in Japan specifically, in fact.

Noam: I would be skeptical about … I would also say that when you see these articles about AIs doing all these incredible things, be a little skeptical. I have not heard about this, and I would be very surprised if there was actually an AI that could write a competent novel. I’ve seen AIs writing really short passages that, if you don’t look too closely, then sometimes they make sense. But it’s really just imitating what a human would do. When you’re dealing with a short passage, it’s much easier to sort of fake it. You can’t get an AI to write Harry Potter, and that’s not going to happen any time soon.

Noam: So, I think that things that rely on verbal skills, on understanding of the world, that is going to take a long time to replace. And I think also things that involve human interaction, for example, day care. You’re not going to see an AI taking care of kids any time soon, and I don’t think that’s going to change.

Dirk: And what is it that makes solving those problems in AI so much harder than the problems you’re solving?

Noam: That’s a good question. It’s kind of hard to quantify, and I think that this is something that people always get wrong. Everybody always has this notion of what is hard for an AI and what is easy for a human, and that always changes. For example, people thought years ago, if you could make an AI that could play chess, that is really an accomplishment, and it’s a sign that AIs will be stronger than humans at everything, right? Playing chess is really the epitome of human intelligence. And we see that you can make an AI that can play chess, and it can’t do a lot of other things.

Noam: And people might have thought that playing poker, being able to bluff, that is something that is uniquely human, and a sign of human intuition, and being able to read your opponent, and that’s something that if an AI were ever able to do that, then it would really be the sign of the robot apocalypse, the AIs doing everything. And now we’ve seen that, an AI that can bluff. Certainly, that means they can do a lot of things now, but they’re also, they can’t do everything.

Noam: So, I think that’s, to try to say … I’m saying that now, an AI can’t write a prize-winning novel, and if it ever did that, I would be very terrified. I was thinking back to those people 50 years ago that said, if an AI can ever play chess, then it’s the end of the world. I could be wrong, and it could be the case that an AI could write a prize-winning novel, and it won’t mean anything. But I’m kind of … I wouldn’t really put … It’s kind of hard to quantify. It’s kind of hard to say what is it that makes certain tasks harder, what makes certain tasks easier? And anything I say, I could be wrong.

Jon: Noam, thanks so much for joining us today, and we really appreciate your time on the show.

Noam: Well, thank you for having me.

Jon: Listeners, remember that while you’re listening to the show, you can follow along with the things that we’re mentioning here in real time. Just head over to the DigitaLife.com. That’s just one L in the DigitaLife. And go to the page for this episode. We’ve included links to pretty much everything mentioned by everybody, so it’s a rich information resource to take advantage of while you’re listening, or afterward, if you’re trying to remember something that you liked.

Jon: You can find the Digital Life on iTunes, SoundCloud, Stitcher, Player FM, and Google Play, and if you’d like to follow us outside of the show, you can follow me on Twitter @JonFollett. That’s J-O-N F-O-L-L-E-T-T. And, of course, the whole show is brought to you by GoInvo, a studio designing the future of health care and emerging technologies, which you can check out at GoInvo.com. That’s G-O-I-N-V-O.com. Dirk?

Dirk: You can follow me on Twitter @DKnemeyer. That’s @D-K-N-E-M-E-Y-E-R, and thanks so much for listening. Noam, how about you?

Noam: I’m also on Twitter. My Twitter handle is @PolyNoamial, with a misspelling, with my name in it. So, it’s P-O-L-Y-N-O-A-M-I-A-L. And I’m also, you can find me online. My name is Noam Brown, N-O-A-M Brown. I have a website, and I have YouTube videos on that website that covers lot of this material in more detail.

Jon: Terrific. So, that’s it for episode 260 of the DigitaLife. For Dirk Knemeyer, I’m Jon Follett, and we’ll see you next time.

 

 

https://www.washingtonpost.com/national/health-science/how-computers-were-finally-able-to-best-poker-pros/2017/02/03/3d1fd8c8-e7fa-11e6-b82f-687d6e6a3e7c_story.html?noredirect=on&utm_term=.aed520809fa0
Tagged on:                 

Leave a Reply

Your email address will not be published. Required fields are marked *