To solve the incompleteness of single data for facial expression recognition, we concatenated images and mark points as inputs. For the complexity of hand-crafted feature in convolutional pattern recognition methods, we adopted neural network to extract feature automatically. The method concatenated images and mark points as basis and framed by neural network. Sparse Auto-encoder was used to pre-train the network and make the network sparse. In addition, structured regularization was added to restrict the connection between different inputs and neuron in hidden layer. The experimental results showed that the concatenation of images and mark points could present the facial expression more thoroughly. The application of Sparse Auto-encoder and structured regularization could help the network extract the key feature effectively and learn the importance of different data to facial expression recognition automatically.