Python Program

30.0 EUR

30.0 EUR peopleperhour Technology & Programming Overseas
373 days ago

Description

You are required to program in Python the following environment and algorithms:
Environment: *k stochastic bandits *Each bandit i has a reward that is uniformly distributed in [a_i, b_i]. *a_i, b_i should be chosen randomly in [0,1] and must be different for each arm i. *Example, if a_i = 0.3 and b_i = 0.8 for arm i, then its reward at a given time can take any value in [0.3,0.8] with equal probability ("uniform") and its expected reward mu_i = 0.55
Algorithms: *ε-Greedy: assume ε_t gets reduced according to the theorem in the slides. *Upper Confidence Bound algorithm
Measurement Tasks: * Produce plots that prove or disprove the respective sublinear regret rates for each scheme *Compare the convergence/learning speed of the two algorithms for T = 1000, k = 10 *Repeat (2) for another two scenarios with different T,k values and comment on the differences similarities.
Hand in: Python notebook of the code - this must execute correctly, also producing the respective plots above it MUST be commented on in detail.
A short report (1-2 pages max) with measurement plots and brief comments for each all plots should include axis titles, legends, etc., to be readable.

关注公众号,不定期副业成功案例分享
Follow WeChat

Success story sharing

Want to stay one step ahead of the latest teleworks?

Subscribe Now

Similar Teleworks