Listening to music is helpful in relieving the pressure of people, and has become a major entertainment for the general public. The development of the Internet makes it convenient for people to listen to music, but it also makes the problem of “information overload”more and more serious. Although many internet companies have launched music recommendation system to solve the problem, the existing recommendation systems cannot guarantee good user experience. As a result, there is still a populardemand for precise recommendation for music. In order to solve the problem of “information overload” and guarantee good user experience atthe same time, this paper presents a reward value algorithm based on state transition.Specifically, the user preference model is first built; then, the music popularity and user conformity is proposed based on user data; finally, the reward function is defined based on user preference, music popularity and state transition probability. The proposed algorithm can individually screen and classify the data from the music library. In the algorithm, the Davies Bouldin exponent is used to discretize vocal characteristicswhen processing data; the algorithm based on list distance minimization is used to select parameters during the model training. The experiments are conducted on the Million Song Dataset , and the results show that the music popularity has certain influence on the recommendation effect of the algorithm. The recommendation algorithm proposed in this paper can improve the performanceof recommendation, which proves the effectiveness of the proposed algorithm.