Routing in wireless sensor networks was regarded as a sequential decision making problem with incomplete information and an energy-balanced routing algorithm based on reinforcement learning prediction named EBRRLP was proposed to avoid uneven energy consuming in wireless sensor networks. In EBRRLP algorithm, transmitting nodes predicted behavior of forwarding nodes by reinforcement learning and selected a node that has the best predicted value to relay the data using -greedy strategy, afterwards the principal-agent mechanism was adopted to suppress the selfishness of forwarding nodes and maintain the maximum utility of each node. Simulation results showed that the EBRRLP algorithm has better prediction and higher throughput, can save energy and balance energy consumption, with its performance superior to that of other existing algorithms.