With the large-scale deployment of surveillance cameras along city streets and public open areas, the streaming and archived surveillance videos have become useful resources for tracking suspicious persons or vehicles. Unlike video surveillance for small indoor areas where objects can be easily tracked across cameras based on overlapping regions, cameras deployed in wide area are not dense enough for seamless tracking across multiple cameras. The human operator can specify an image containing the desired target as the query, and the system returns a ranked set of images from other cameras according to the similarity measure. The similarity measure is based on object appearance as well as the spatial-temporal constraints obtained from the provided geographic information and time stamps encoded on the images in the archive. In particular, the retrieved images are clustered and represented by paths where each path contains consistent video frames across multiple cameras.