11 KiB
Pacman 实验2
一.实验目的
1.通过实验理解零和游戏的博弈过程,学会简单的评估函数的设计
2.掌握极小极大法和alpha-beta剪枝
二.实验原理
1.极小极大法
在零和博弈中,玩家均会在可选的选项中选择将其N步后优势最大化或者令对手优势最小化的选择。将双方决策过程视作一颗决策树,若决策树某一层均为己方决策依据状态(即接下来是己方进行动作),则己方必定会选择使得己方收益最大化的路径,将该层称为MAX层。若决策树某一层均为对手决策依据状态(即接下来是对手进行动作),则对手必定会选择使得己方收益最小化的路径,将该层成为MIN层。由此,一个极小极大决策树将包含max节点(MAX层中的节点)、min节点(MIN层中的节点)和终止节点(博弈终止状态节点或N步时的状态节点)。每个节点对应的预期收益成为该节点的minimax值。 对于终止结点, minimax值等于直接对局面的估值。对于max结点,由于max节点所选择的动作将会由己方给定,因此选择minimax值最大的子结点的值作为max结点的值。对于min结点,则选择minimax值最小的子结点的值作为min结点的值。 极小极大算法过程可描述如下:
构建决策树; 1.将评估函数应用于叶子结点; 2.自底向上计算每个结点的minimax值; 3.从根结点选择minimax值最大的分支,作为行动策略。 minimax计算流程如下:
1.如果节点是终止节点:应用估值函数求值; 2.如果节点是max节点:找到每个子节点的值,将其中最大的子节点值作为该节点的值; 3.如果节点时min节点:找到每个子节点的值,将其中最小的子节点值作为该节点的值。
其伪代码如下,
2.alpah-beta剪枝
举个简单的例子说明alpha-beta剪枝的过程
其伪代码如下,
三.实验内容
1.完成multiagent文件夹中的evaluationFunction,MinimaxAgent,AlphaBetaAgent函数。
在编写完代码后,可在终端中的multiagent目录下执行python2 .\autograder.py
命令,即可查看是否通过!
四.实验结果
博弈算法中的三个函数(类)的具体实现如下
1.evaluationFunction
def evaluationFunction(self, currentGameState, action):
"""
Design a better evaluation function here.
The evaluation function takes in the current and proposed successor
GameStates (pacman.py) and returns a number, where higher numbers are better.
The code below extracts some useful information from the state, like the
remaining food (newFood) and Pacman position after moving (newPos).
newScaredTimes holds the number of moves that each ghost will remain
scared because of Pacman having eaten a power pellet.
Print out these variables to see what you're getting, then combine them
to create a masterful evaluation function.
"""
# Useful information you can extract from a GameState (pacman.py)
successorGameState = currentGameState.generatePacmanSuccessor(action)
newPos = successorGameState.getPacmanPosition()
newFood = successorGameState.getFood()
newGhostStates = successorGameState.getGhostStates()
newScaredTimes = [ghostState.scaredTimer for ghostState in newGhostStates]
#many ghosts
"*** YOUR CODE HERE ***"
GhostPos = successorGameState.getGhostPositions()
x_pacman,y_pacman = newPos
failedDist = min([(abs(each[0]- x_pacman) + abs(each[1]-y_pacman)) for each in GhostPos])
if failedDist != 0 and failedDist < 4:
ghostScore = -11 / failedDist
else :
ghostScore = 0
nearestFood = float('inf')
width = newFood.width
height = newFood.height
if failedDist >= 2:
dx = [1,0,-1,0]
dy = [0,1,0,-1]
List = []
d = {}
List.append(newPos)
d.update({(x_pacman,y_pacman) : 1})
while List:
tempPos = List[0]
List.pop(0)
temp_x,temp_y = tempPos
if newFood[temp_x][temp_y]:
nearestFood = min(nearestFood,(abs(temp_x - x_pacman) + abs(temp_y - y_pacman)))
break
for i in range(len(dx)):
x = temp_x + dx[i]
y = temp_y + dy[i]
if 0 <= x < width and 0 <= y < height:
tempPos =(x,y)
if tempPos not in d:
d[tempPos] = 1
List.append(tempPos)
if nearestFood != float('inf'):
foodScore = 10 / nearestFood
else :
foodScore = 0
return successorGameState.getScore() + foodScore + ghostScore
2.MinimaxAgent
class MinimaxAgent(MultiAgentSearchAgent):
"""
Your minimax agent (question 2)
"""
def getAction(self, gameState):
"""
Returns the minimax action from the current gameState using self.depth
and self.evaluationFunction.
Here are some method calls that might be useful when implementing minimax.
gameState.getLegalActions(agentIndex):
Returns a list of legal actions for an agent
agentIndex=0 means Pacman, ghosts are >= 1
gameState.generateSuccessor(agentIndex, action):
Returns the successor game state after an agent takes an action
gameState.getNumAgents():
Returns the total number of agents in the game
gameState.isWin():
Returns whether or not the game state is a winning state
gameState.isLose():
Returns whether or not the game state is a losing state
"""
"*** YOUR CODE HERE ***"
def gameOver(gameState):
return gameState.isWin() or gameState.isLose()
#Be different with me
def min_value(gameState, depth, ghost):
value = float('inf')
if gameOver(gameState):
return self.evaluationFunction(gameState)
for action in gameState.getLegalActions(ghost):
if ghost == gameState.getNumAgents() - 1:
value = min(value, max_value(gameState.generateSuccessor(ghost, action), depth))
else:
value = min(value, min_value(gameState.generateSuccessor(ghost, action), depth,
ghost + 1))
return value
def max_value(gameState, depth):
value = float('-inf')
depth = depth + 1
#Be different with me
if depth == self.depth or gameOver(gameState):
return self.evaluationFunction(gameState)
for action in gameState.getLegalActions(0):
value = max(value, min_value(gameState.generateSuccessor(0, action), depth, 1))
return value
nextAction = gameState.getLegalActions(0)
Max = float('-inf')
Result = None
for action in nextAction:
if (action != "stop"):
depth = 0
value = min_value(gameState.generateSuccessor(0, action), depth, 1)
if (value > Max):
Max = value
Result = action
return Result
util.raiseNotDefined()
3.AlphaBetaAgent
class AlphaBetaAgent(MultiAgentSearchAgent):
"""
Your minimax agent with alpha-beta pruning (question 3)
"""
def getAction(self, gameState):
"""
Returns the minimax action using self.depth and self.evaluationFunction
"""
"*** YOUR CODE HERE ***"
ghostIndex = [i for i in range(1,gameState.getNumAgents())]
def gameOver(state,depth):
return state.isWin() or state.isLose() or depth == self.depth
def min_value(state,depth,ghost,alpha,beta):
if gameOver(state,depth):
return self.evaluationFunction(state)
value = float('inf')
for action in state.getLegalActions(ghost):
if ghost == ghostIndex[-1]:
value = min(value,max_value(state.generateSuccessor(ghost,action),depth+1,alpha,beta))
else:
value = min(value,min_value(state.generateSuccessor(ghost,action),depth,ghost+1,alpha,beta))
if value < alpha:
return value
beta = min(beta,value)
return value
def max_value(state,depth,alpha,beta):
if gameOver(state,depth):
return self.evaluationFunction(state)
value = float('-inf')
for action in state.getLegalActions(0):
if action == 'stop':
continue
value = max(value,min_value(state.generateSuccessor(0,action),depth,1,alpha,beta))
if value > beta:
return value
alpha = max(value,alpha)
return value
def function(state):
value = float('-inf')
actions = None
alpha = float('-inf')
beta = float('inf')
for action in state.getLegalActions(0):
if action == 'stop':
continue
tmpValue = min_value(state.generateSuccessor(0,action),0,1,alpha,beta)
if value < tmpValue:
value = tmpValue
actions = action
alpha = max(value,alpha)
return actions
return function(gameState)
util.raiseNotDefined()
测试过程与结果如下
五.实验总结
博弈算法在AI课程中第一次接触,虽然博弈过程是一个十分玄妙的过程,但是计算机人用其大智慧,将这个过程转化为一个可以用数学公式描述的过程,而这种抽象化的思维能力是一个及其重要的能力。通过Pacman的实验,加深了我对alpha-beta剪枝的原理与过程。