Estimating the similarity between two strings is an important and practical problem in many fields, such as computational biology, sortedness measuring, rank aggregation and music theory. Given a string α = α1 α2 … αn, an adjacent swap σ(i, i计1) on α is an operation that transforms α to into another string α' = α1 α2 … αi-1 αi+1 αi αi+2 … αn. Given two strings of size n over an alphabet Σ, the swap distance problem is to find the minimum number of adjacent swaps needed to transform one given string into the other. The focus of this thesis is the swap distance problem. Assume that |Σ| <= n. Chitturi et al. showed that the swap distance problem can be solved by using an algorithm for counting the inversions of a permutation. Dietz had a data structure that can be used to do the counting in O(n lg n / lg lg n) time. Therefore, the swap distance problem can be solved in the same time. For small alphabet Σ, Chitturi et al. proposed an O(n|Σ|)-time and O(n|Σ|)-space algorithm. In this thesis, we give an improved algorithm that requires O(n|Σ|) time and O(n) space. Our algorithm is more space-efficient than Chitturi et al.'s. The sorting by swaps problem is to find the minimum number of adjacent swaps needed to sort a given string, which is a special case of the swap distance problem. We also study the sorting by swaps problem and propose an O(n + n lg |Σ| / lg lg n)-time and O(n)-space algorithm. The proposed algorithm is more efficient than directly applying the previous two best solutions of the swap distance problem to this special case. Especially, when |Σ| = O((lg n)^c) for some constant c, our algorithm runs in linear time and space. It is easy to extend our algorithms to solve the signed version of corresponding problems.
在許多領域中,估計兩字串之間的相似程度是個非常重要且實用的問題。例如:計算生物學、排序程度測量、排名聚合演算法及音樂理論。本論文研究其中一種測量方式,稱為「交換距離問題」(swap distance problems),定義如下。給定一個字串 α = α1 α2 … αn,相鄰交換 σ(i, i + 1) (adjacent swap) 會將 αi 和 αi+1的順序交換,也就是將 α 變成另一個字串 α' = α1 α2 … αi-1 αi+1 αi αi+2 … αn。交換距離問題的目標是計算至少需要幾個相鄰交換,始能將一給定的字串轉變成另一個給定的字串。假設字母集的大小 |Σ| <= n,Chitturi 等人發現此問題可以藉由計算相對應排列 (permutation) 中的反轉 (inversion) 數目來解決。Dietz 提供一個資料結構能夠在 O(n lg n / lg lg n) 時間內計算出排列中的反轉數目,因此交換距離問題也能在同樣的時間內求解。在字母集很小的情況下,Chitturi 等人提出一個 O(n|Σ|) 時間及 O(n|Σ|) 空間的演算法。在本論文中,我們提出一個 O(n|Σ|) 時間及 O(n) 空間的改進演算法,此演算法使用較少的空間。「交換排序問題」 (sorting by swaps problems) 是交換距離問題的一種特例,其目標是計算至少需要幾個相鄰交換,始能排序一個給定的字串。對於交換排序問題,我們提出一個 O(n + n lg |Σ| / lg lg n) 時間及 O(n) 空間的演算法。在這個特例下,此演算法比直接套用交換距離問題演算法更有效率,特別當字母集的大小 |Σ| = O((lg n)^c) 時,此演算法只需要 O(n) 時間及空間。藉由簡單的修改,我們所提出的演算法也能解決在有號 (signed) 情況下對應的問題。