We propose a spatio-temporal embedding framework for better modeling image sequences, known as dynamic textures, that exhibit certain stationarity in the appearance and the dynamics. Our method can characterize more types of data than those by the existing techniques established on linear dynamical systems (LDS). The algorithm finds a low-dimensional representation for the original data without discarding the critical time-dependent information. It then directly deals with the nonlinear dynamics through an integration of Gaussian process with time delay embedding. In addition, the possible nonlinearity in the appearance space is also appropriately addressed by our use of multiple local analyzers. Consequently, the framework is flexible enough for handling various applications, including prediction and smoothing. Results on prediction and smoothing are provided to demonstrate the advantages of our approach.