Motivated by the online Ad auction problem for advertisers, we consider the general problem of simultaneous learning and decision-making in a stochastic game setting with a large population. We formulate this type of games with unknown rewards and dynamics as a generalized mean-field-game (GMFG), with the incorporation of action distributions. We first analyze the existence and uniqueness of the GMFG solution, and show that naively combining Q-learning with the three-step fixed-point approach in classical MFGs yields unstable algorithms. We then propose an approximating Q-learning algorithm and establish its convergence and complexity results. The numerical performance shows superior computational efficiency. This is based on joint work with Xin Guo (UC Berkeley), Anran Hu (UC Berkeley) and Junzi Zhang (Stanford).