透過您的圖書館登入
IP:3.144.17.45
  • 學位論文

最小列分解問題之演算法研究

Algorithms for the Minimum Split-Row Problem

指導教授 : 王炳豐

摘要


著眼於癌症基因學上的應用,Hajirasouliha 與 Raphael [WABI 2014] 提出了「最小列分解問題」。此問題的輸入為一個 m 列 n 行的二元矩陣 M。「列分解」是一種作用在 M 上的操作,其定義如下:取 k > 1 個列 r^{(1)}, r^{(2)}, ..., r^{(k)} 來取代 M 中原有的的任一列 r,其中 r^{(1)}, r^{(2)}, ..., r^{(k)} 的按位元或 (bitwise OR) 之結果須等於 r。列分解操作的花費等於 M 在操作後增加的列之數量,即 k − 1。此問題的目標是找出花費最少的一系列列分解操作,使得這些操作依序作用在 M 上後所得到的矩陣是一棵完美演化樹的矩陣表示法。在近來的一篇研究論文中,Hujdurović et al. [TALG 2018] 證明最小列分解問題是個 APX-hard 問題,並提出有效率的精準演算法與近似演算法,該論文最終建議以參數化研究做為最小列分解問題未來的研究方向。令 ε(M) 表示將二元矩陣 M 轉換為一棵完美演化樹之矩陣表示法所需要的最小花費。本論文提出一個在 O*(2^{min(n, 2ε(M))}) 時間內解決最小列分解問題的精準演算法。在參數化複雜度的術語中,此成果代表最小列分解問題由 ε(M) 作為參數時,是一個「固定參數可處理」問題。此外,在最差情況下,此演算法花費至多 O*(2^n) 的時間,顯著地改進了現有最佳演算法之時間複雜度 O*(n^n)。由應用面來看,列分解操作之目的為將 m 個混合的腫瘤樣本分解為 m + ε(M) 個細胞子群體,其中樣本混合之肇因來自於現有 DNA 定序技術之限制。在此觀點下,本論文的參數化成果代表著當樣本的混合情形不嚴重時,最小列分解問題可以有效率地被解決。經過適當的延伸後,本論文的演算法可被修改為枚舉最小列分解問題的所有最佳解,此延伸演算法之預處理時間為 O*(3^{min(n, 2ε(M))}),且任兩輸出之間的時間延遲與後輸出者之大小呈線性關係。 Hujdurović et al. 的演算法經過修改後可以解決最小列分解問題的一個變化問題,稱作「最小相異列分解問題」。本論文對於最小列分解的精確演算法與枚舉演算法經過延伸後也可解決這一變化問題。此外,本論文對於最小列分解問題和最小相異列分解問題的演算法可被延伸為解決此兩問題帶有以下額外限制的版本之演算法:只有輸入指定的一個列的子集合可以被分解。現有所有針對最小列分解問題之演算法及本論文所提出之演算法皆需要預先計算一張有向圖──稱作包含關係圖──來表示輸入矩陣。過往的論文皆使用一個簡單的 O(mn^2) 演算法建構此關係圖,本論文提出了一個較有效率的建構方法法,其時間複雜度為 max{O(m^0.373 n^2), O(mn^1.373)}。

並列摘要


Motivated by an application in cancer genomics, Hajirasouliha and Raphael [WABI 2014] proposed the minimum split-row problem (MSRP). In this problem, an m × n binary matrix M is given. A split-row operation on M is defined as replacing a row r by k > 1 rows r^{(1)}, r^{(2)}, ..., r^{(k)} whose bitwise OR is equal to r. The cost of the operation is the number of ad-ditional rows induced, that is, k − 1. The goal is to find a sequence of split-row operations that transforms M into a matrix corresponding to a perfect phylogeny and the total cost is minimized. Recently, Hujdurović et al. [TALG 2018] proved the APX-hardness of MSRP and presented efficient exact and approximation algorithms. The parameterized study of MSRP was left as a direction for future work. Let ε(M) denote the minimum cost of trans-forming a binary matrix M into a matrix corresponding to a perfect phylogeny. This thesis gives an O*(2^{min(n, 2ε(M))})-time exact algorithm for MSRP. This result implies that MSRP is fixed-parameter tractable when parameterized by ε(M). In addition, in the worst case, the new algorithm requires O*(2^n) time, significantly improving the previous upper bound of O*(n^n). For the application in cancer genomics, split-row operations are performed in order to decompose m mixed tumor samples into m + ε(M) tumor cell subpopulations. In this perspec-tive, the fixed-parameter tractability of MSRP indicates that when the amount of mixing, which results from the technical limitation of current sequencing technologies, in the samples is small, MSRP can be solved efficiently. The new algorithm can be extended to enumerate all optimal solutions with the following bounds: the preprocessing time is O*(3^{min(n, 2ε(M))}) and the delay between two consecutive outputs is linear in the size of the next output. Hujdurović et al.'s exact algorithm can be modified to solve a variant of MSRP, called the minimum distinct conflict-free row split problem (MDCRSP). The new exact and enumeration algorithms for MSRP can be modified to solve this variant as well. In addition, the new MSRP and MDCRSP algorithms can be extended to solve MSRP and MDCRSP with the following additional constraint: only the rows in a given subset are allowed to be split. All of the previous algorithms for MSRP and the algorithms presented in this thesis precompute a directed graph, called the containment digraph, to represent the input matrix. A naive construction of the graph requires O(mn^2) time. This thesis gives a more efficient construction, which requires max{O(m^0.373 n^2), O(mn^1.373)} time.

參考文獻


[1] A. V. Aho, M. R. Garey, and J. D. Ullman, "The Transitive Reduction of a Directed Graph," SIAM Journal on Computing, vol. 1, no. 2, pp. 131−137, 1972.
[2] A. Bashashati, G. Ha, A. Tone, J. Ding, L. M. Prentice, A. Roth, J. Rosner, K. Shumansky, S. Kalloger, J. Senz, W. Yang, M. McConechy, N. Melnyk, M. Anglesio, M. T. Y. Luk, K. Tse, T. Zeng, R. Moore, Y. Zhao, M. A. Marra, B. Gilks, S. Yip, D. G. Huntsman, J. N. McAlpine, and S. P. Shah, "Distinct evolutionary trajectories of primary high-grade serous ovarian can-cers revealed through spatial mutational profiling," The Journal of Pathology, vol. 231, no. 1, pp. 21−34, 2013.
[3] A. Björklund, T. Husfeldt, P. Kaski, and M. Koivisto, "Fourier Meets MöBius: Fast Subset Convolution," In Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing (STOC '07), pp. 67–74, 2007.
[4] P. J. Campbell, E. D. Pleasance, P. J. Stephens, E. Dicks, R. Rance, I. Goodhead, G. A. Fol-lows, A. R. Green, P. A. Futreal, and M. R. Stratton, "Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing," in Proceedings of the National Academy of Sci-ences, vol. 105, no. 35, pp. 13081−13086, 2008.
[5] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms (3rd ed.), The MIT Press, 2009.

延伸閱讀