透過您的圖書館登入
IP:13.58.150.59
  • 學位論文

部分酶切問題之研究

A Study On the Partial Digest Problem

指導教授 : 王炳豐

摘要


這篇論文主要探討「部分酶切問題 (partial digest problem)」,此問題是用來解決一個著名「限制位點分析 (restriction sites analysis)」問題的數學模型。目前此問題尚未被證明是否為 NP-complete 也沒有人提出一個可以在多項式時間內解決的演算法。Skiena et al. 提出一個時間複雜度為 O(t n lg n) 的回溯演算法,其中 n 為限制位點的個數,t >= n為整個回溯過程中所產生的樹節點個數。在實際的實驗中顯示,他們的演算法通常只需要 O(n^2 lg n) 的時間,但是在分析上最差的情況下會需要 O(n lg n 2^n) 的時間。Zhang 提出一個 Skiena et al. 的演算法執行時間為指數 (exponential) 的例子。Abbas and Bahig 提出一個可以避免重複展開相同子樹的方法來加速 Skiena et al. 的演算法,其時間複雜度為 O(t' n^2),其中t' <= t。 這篇論文提出 4 個演算法,分別稱為 Algorithm 1、2、3 和 4。Algorithm 1 將 Skiena et al. 的最差時間複雜度由 O(n lg n 2^n) 改進成 O(n 2^n)。實驗證實此演算法在 Zhang 的例子上會得到很好的加速。和 Skiena et al. 的演算法相同,Algorithm 2 的時間也是 expected O(n^2 lg n),但是我們透過實驗證實 Algorithm 2 的係數比較小,少於 1/2。Algorithm 3 是 Abbas and Bahig 演算法的改進版,時間由 O(t' n^2) 改進為 O(t' n lg n);除此之外,也將空間由 O(n^2 2^(n/2)) 大幅改進為 O(n^2)。Algorithm 4 結合了 Algorithm 1 和 3,在 Zhang 的例子上可以將Algorithm 1 加速 2^(2n/5) 倍並且將Algorithm 3 加速 lg n 的。這篇論文最後提出一個簡單的技巧稱為 longest-first checking,可以稍微加速所有由 Skiena 想法為基底的演算法,包含以上提及的所有演算法。論文中同時提供理論分析並透過實驗來驗證這個技巧。

並列摘要


The focus of this thesis is the partial digest problem (PDP), which is the mathematical model of a well-known method for restriction sites analysis. So far neither a proof of NP-completeness nor a polynomial time algorithm is known for PDP. A famous solution for PDP is a backtracking algorithm given by Skiena et al., which requires O(t n lg n) time, where n is the number of restriction sites and t >= n is the number of visited nodes of the solution tree. Skiena et al.'s algorithm requires O(n^2 lg n) time in practice, but may require O(n lg n 2^n) time in the worst case. Zhang had an example for which this algorithm takes exponential time. Abbas and Bahig speeded up Skiena et al.'s algorithm by avoiding exploring identical subtrees, which results in an O(t' n^2)-time algorithm, where t' <= t. This thesis proposes four algorithms for PDP, denoted by Algorithms 1, 2, 3, and 4, respectively. Algorithm 1 improves Skiena et al.'s upper bound from O(n lg n 2^n) to O(n 2^n). Experimental results showed that our algorithm is more efficient on Zhang's example. As compared with Skiena et al.'s algorithm, Algorithm 2 has the same expected time O(n^2 lg n). However, Algorithm 2 has a smaller constant factor. Empirically, Algorithm 2 is at least two times faster. Algorithm 3 is an improved version of Abbas and Bahig's algorithm. The time complexity is reduced from O(t' n^2) to O(t' n lg n); in addition, the worst case space is significantly reduced from O(n^2 2^(n/2)) to O(n^2). Algorithm 4 is a combination of Algorithms 1 and 3. For Zhang's example, Algorithm 4 improves Algorithms 1 and 3, respectively, by a factor of 2^(2n/5) and lg n. This thesis also presents a simple and interesting technique, called the longest-first checking, which can slightly speed up all algorithms designed based upon Skiena et al.'s idea, including all the algorithms mentioned above. A theoretical analysis that supports this technique is given. In addition, its effectiveness is demonstrated by experiments.

參考文獻


1. M. M. Abbas, H. M. Bahig, "A fast exact algorithm for the partial digest problem," BMC Bioinformatics, vol. 17, suppl. 19, pp. 139–148, 2016.
2. Z. Abrams, H. L. Chen, "The simplified partial digest problem: hardness and a probabilistic analysis," in Proceedings of RECOMB Satellite Meeting on DNA Sequencing Technologies and Computation, 2004.
3. H. Ahrabian, M. Ganjtabesh, A. Nowzari-Dalini, Z. Razaghi-Moghadam-Kashani, "Genetic algorithm solution for partial digest problem," International Journal of Bioinformatics Research and Applications, vol. 9, no. 6, pp. 584–594, 2013.
4. H. M. Bahig, M. M. Abbas, M.M. Mohie-Eldin, "Parallelizing partial digest problem on multicore system," Lecture Notes in Computer Science, vol. 10209, pp. 95–104, 2017.
5. H. M. Bahig, M. M. Abbas, "Speeding up the partial digest algorithm," Journal of Informatics and Mathematical Sciences, vol. 10, no. 1–2, pp. 217–225, 2018.

延伸閱讀


國際替代計量