Python Program

30.0 EUR

30.0 EUR peopleperhour 技术与编程 海外
73天前

详细信息

You are required to program in Python the following environment and algorithms:
Environment: *k stochastic bandits *Each bandit i has a reward that is uniformly distributed in [a_i, b_i]. *a_i, b_i should be chosen randomly in [0,1] and must be different for each arm i. *Example, if a_i = 0.3 and b_i = 0.8 for arm i, then its reward at a given time can take any value in [0.3,0.8] with equal probability ("uniform") and its expected reward mu_i = 0.55
Algorithms: *ε-Greedy: assume ε_t gets reduced according to the theorem in the slides. *Upper Confidence Bound algorithm
Measurement Tasks: * Produce plots that prove or disprove the respective sublinear regret rates for each scheme *Compare the convergence/learning speed of the two algorithms for T = 1000, k = 10 *Repeat (2) for another two scenarios with different T,k values and comment on the differences similarities.
Hand in: Python notebook of the code - this must execute correctly, also producing the respective plots above it MUST be commented on in detail.
A short report (1-2 pages max) with measurement plots and brief comments for each all plots should include axis titles, legends, etc., to be readable.

免责声明

该外包需求信息来源于站外平台,本站仅提供公开信息部分字段展示与订阅服务,更多请查看免责声明

关注公众号,不定期副业成功案例分享
关注公众号

不定期副业成功案例分享

领先一步获取最新的外包任务吗?

立即订阅

类似推荐