000 01281nam a2200205 4500
008 220505b ||||| |||| 00| 0 eng d
020 _a9781886529397 (hb.)
082 _a519.703
_bBER
100 _aBertsekas, Dimitri P.
_9809
245 _aReinforcement learning and optimal control
260 _aMassachusetts
_bAthena Scientific
_c2019
300 _axiv, 373p.,
500 _ahttp://www.athenasc.com/rlbook_athena.html
520 _aThis book considers large and challenging multistage decision problems, which can be solved in principle by dynamic programming (DP), but their exact solution is computationally intractable. We discuss solution methods that rely on approximations to produce suboptimal policies with adequate performance. These methods are collectively known by several essentially equivalent names: reinforcement learning, approximate dynamic programming, and neuro-dynamic programming. They have been at the forefront of research for the last 25 years, and they underlie, among others, the recent impressive successes of self-learning in the context of games such as chess and Go.
650 _aMathematics
_9435
650 _aMathematical optimization
_9688
650 _aDynamic programming
_9640
650 _aReinforcement learning
_9859
942 _cBK
999 _c7874
_d7874