China - The unchanged culture of China allowed there to be a sense of patriotism, and unified the country. Wanted to be superior to the other countries, and to do this they remained self-sufficient. They also wanted to focus on their own security, since they were being attacked by the Manchus. Very worried about invasion. To achieve this, China built a wall.
Japan - Jesuits are trying to convert others to Christianity, and create a Catholic empire. In 1615 the Japanese drove the Jesuits from Japan. Began closed, or locked country policy to remain away from foreign influence and culture. Remained closed for over 200 years. The policy gave the Japanese 250 years of peace. To achieve this, they closed off their ports except for one. Only trades with the Chinese and Dutch. They are an island.
In Japan, they remained isolated to save their culture and keep out foreign influence. In China, they remained isolated to feel superior and show off their wealth.
Although there is no war, disease, or self-sufficient economy, the risk of having no allies is too great. They would also be cut off from new inventions, ideas, medicine, crops, and discoveries.
Recommended textbook solutionsHuman Resource Management
15th EditionJohn David Jackson, Patricia Meglich, Robert Mathis, Sean Valentine
249 solutions
Human Resource Management
15th EditionJohn David Jackson, Patricia Meglich, Robert Mathis, Sean Valentine
249 solutions
Information Technology Project Management: Providing Measurable Organizational Value
5th EditionJack T. Marchewka
346 solutions
Human Resource Management
15th EditionJohn David Jackson, Patricia Meglich, Robert Mathis, Sean Valentine
249 solutions
Scheduled maintenance: Thursday, December 8 from 5PM to 6PM PST
Home
Subjects
Expert solutions
Create
Log in
Sign up
Upgrade to remove ads
Only ₩37,125/year
-
Flashcards
-
Learn
-
Test
-
Match
-
Flashcards
-
Learn
-
Test
-
Match
Terms in this set (20)
Exploration
thing that separates other machine learning topics from RL
The more data we have the more confident we are to believe that data.
True. Confidence based exploration
To obtain alot of reward, an RL agent must prefer actions that it has tried in past and found to effective in producing reward. But to discover such actions it has to try actions that it has not selected before.
The agent has to exploit what it has already experience , but it also has to
EXPLORE in order to make better actions selections in the future
True
On a stochastic task, each action must be tried many times in order to gain a reliable estimate of its expected reward
True
Transitions bandits vs deterministic MDPs vs Stochastics MDPs
bandits - don't have any state transitions at all but do have stochasticity
deterministic MDPs - we have state transitions but no stochasticity at all
stochastic MDPs - combining both before gives us a way of solving general
MDPs
K-Armed Bandits
- k different arm bandits (slot machine)
- don't know the payouts of each bandit
Minimum confidence is going to get you better estimates. While Max likelihood is going to get you more reward
True
Metrics for Bandits
1.
Maximize expected reward over finite horizon
2. Identify near optimal arm with high probability
3. Nearly maximize reward with high probability
Can be combined
Find best arm --> Few mistakes
Few mistakes --> Do well
Do well -> Find best arm
If we have an algorithm that gets within epsilon per time step of optimal than find best.
Hoeffding
- how many number of samples we need to accurately learn the value of arm
Exploring deterministic MDPs
- explore randomly
- trap states
- mistake bounds : number of epsiolon - suboptimal actions need to be bounder
RMAX algorithm
1. Keep track of MDP
2. Any unknown state-action pair is Rmax
3. Solve MDP
4. Take action from optimal policy
and what is the RMAX analysis
1. - Once all edges are know, no more mistakes : because solves MDP so every action is optimal
2. Stop visiting unknown states
- if we loop without learning anything then there is no mistakes
3. Number of transitions to unknown state action pair is bounder by number of states
- number of times we might discover state action pairs n*k
Lower bound for algorithm is O(n^2 k)
True
General Stochastic MDPs
- want to do efficient algorithm :
- stochastic Hoeffding bound until estimates are accuerate OR
- sequential (unknown state-action pairs assumed optimistic)
Explore or exploit lemma
- if all transitions are either accurately estimated or unknown, optimal policy is either near optimal or an unknown state is reached quickly
Summary
Bandits
bandits which is all about stocasticity and randomness
- decisions making with randomness
- we can estimate what we know using Hoffding bound, union bound to convince ourselfs that we have a sufficiently accurate estiamte of near optimal reward in stocastic decision problems.
- we have a stochastic world and we want to learns how it works so that we can get near
optimal reward and optimize
- lets us deal with stochastic decision making
Hoffding bound tells us how certain we really are so that we know when we are certain enough
True
RMax works with Deterministic MDPs
-optimism in the face of uncertainty
- causes us to explore at a distance by planning ahead getting to new
states that new information could be gained
- lets us deal with sequential decision making
Combined the 2 : use the bandit idea to estimate(noisy parameters) Transition probability and use RMax idea to make sure visited things enough to get accurate estimates
True
bandits - helped with transition prob and knew when to believe them
- stochastic + sequential
KWIK learning
- way we can distinguish in bandits from known and known can be generalized to KWIK learning
- learning transition probability using methods using know what they known
- so if you can distinguish between known and unknown it can associate optimism with the unknown and it makes guaranteers on how efficiently near-optimal behavior can be learned
- KWIK is a learning framework that try to generalize beyond tabluar MDPs to able to generalize between transition probablities between different parts of MDP
Rmax setting c parameters
Rmax in practice is very effective algorithm : makes good use of the data
- too small: might not learn near optimal behavior
- too big : learner has to visit many state-action pairs many many times over again
CCC
8 terms
casanas10
Game Theory 3
9 terms
casanas10
Game Theory Reloaded
23 terms
casanas10
Options
12 terms
casanas10
engineering
Consider a process during which no entropy is generated $\left(S_{\mathrm{gen}}=0\right)$. Does the exergy destruction for this process have to be zero?
Verified answer
chemistry
What is an emulsifying agent?
Verified answer
engineering
The fan blows air at $6000\ \mathrm{ft}^3 / \mathrm{min}$. If the fan has a weight of $40\ \mathrm{lb}$ and a center of gravity at $G$, find the smallest diameter $d$ of its base so that it will not tip over. Assume the airstream through the fan has a diameter of $2\ \mathrm{ft}$. The specific weight of the air is $\gamma_a=0.076\ \mathrm{lb} / \mathrm{ft}^3$.
Verified answer
physics
Sand runs from a hopper at constant rate dm/dt onto a horizontal conveyor belt driven at constant speed V by a motor. a. Find the power needed to drive the belt. b. Compare the answer to a with the rate of change of kinetic energy of the sand. Can you account for the difference?
Verified answer
Recommended textbook solutions
Information Technology Project Management: Providing Measurable Organizational Value
5th EditionJack T. Marchewka
346 solutions
Introduction to Algorithms
3rd EditionCharles E. Leiserson, Clifford Stein, Ronald L. Rivest, Thomas H. Cormen
726 solutions
Information Technology Project Management: Providing Measurable Organizational Value
5th EditionJack T. Marchewka
346 solutions
Computer Organization and Design MIPS Edition: The Hardware/Software Interface
5th EditionDavid A. Patterson, John L. Hennessy
220 solutions
Other Quizlet setsecology ch.1 nature of ecology
30 terms
mohaina_samad
Audit Quiz Chapter 9
51 terms
Mbecn
Chem 106 Kinetics Test
82 terms
jeff_shipley