PromptsVault AI is thinking...
Searching the best prompts from our community
Searching the best prompts from our community
Prompts matching the #q-learning tag
Implement reinforcement learning algorithms for decision-making, game playing, and optimization problems. RL fundamentals: 1. Markov Decision Process: states, actions, rewards, transition probabilities, discount factor (0.9-0.99). 2. Value functions: state-value V(s), action-value Q(s,a), Bellman equations, optimal policies. 3. Exploration vs exploitation: epsilon-greedy (ε=0.1), UCB, Thompson sampling strategies. Q-Learning implementation: 1. Q-table updates: Q(s,a) ← Q(s,a) + α[r + γ max Q(s',a') - Q(s,a)]. 2. Learning rate: α=0.1 to 0.01, decay schedule, convergence monitoring. 3. Experience replay: stored transitions, batch sampling, stable learning. Deep Q-Networks (DQN): 1. Neural network approximation: Q-function approximation, target network stabilization. 2. Double DQN: overestimation bias reduction, action selection vs evaluation separation. 3. Dueling DQN: value and advantage streams, better value estimates. Policy gradient methods: 1. REINFORCE: policy gradient theorem, Monte Carlo estimates, baseline subtraction. 2. Actor-Critic: policy (actor) and value function (critic), advantage estimation, A2C/A3C. 3. Proximal Policy Optimization (PPO): clipped objective, stable policy updates, trust region. Advanced algorithms: 1. Trust Region Policy Optimization (TRPO): constrained policy updates, KL divergence limits. 2. Soft Actor-Critic (SAC): off-policy, entropy maximization, continuous action spaces. Environment design: OpenAI Gym integration, custom environments, reward shaping, curriculum learning, multi-agent scenarios for complex interaction modeling.