the effective representation of the video is the key and difficulty in human action recognition. In this paper, we proposed an improved string of feature graphs method to describe an video, which combines submodular optimization method and graphic matching technique in the framework of dynamic programming. Firstly, space-time feature points in a video are obtained by utilizing spatio-time interest point detector, and leveraging the submodular means considering the time order divides the video into many small time intervals. Then the representation of the video can be transformed into a string graphs which are constructed based on these feature points falling in the corresponding time interval. Finally, measuring the similarity of pair of videos is implemented through using the techniques of the Reweighted Random Walks for Graph Matching (RRWM) and Dynamic Time Warping (DTW) between string graphs from two videos respectively. Here we provide comparisons against other methods on the two published datasets (KTH and UT-interaction) and the results demonstrate that this algorithm is effective and feasible.