以最大概略估計為基礎之捉放法模型估算伺服器數量

影音服務的品質與順暢度取決於內容傳遞網路 (CDN) 的規模與設備完善程度。近年來因為影音服務需求的成長，為了因應客戶的需求，內容傳遞網路中的伺服器數量也顯著的增加。因為 Twitch 廣泛的影音應用，我們認為長時間持續續的探究其內容傳遞網路架構是一個重要課題。發想於內容傳遞網路中的伺服器數量的新增淘汰和動物群體出生死亡行為的相似性，我們認為可以套用野生群體數量估算的捉放法，透過每次少量的網路流量取樣以達到估算整體內容傳遞網路的伺服器數量。在我們過去發表的 AINTEC 論文(2021)中， Cormack-Jolly-Seber (CJS) 模型能夠相對準確的估算出伺服器總數。然而，傳統的 CJS 模型的機率假設仍有很多的限制。因為其需要較多採樣間距估算才能收斂，導致這個模型僅限於離線的估算。此外，這個模型假設所有個體被抓捕和生存的機率都是一樣的，這個假設並不符合 Twitch 的內容傳遞網路中伺服器的服務型態。因此，我們引入了考慮異質性、以最大概似估計為基礎的 CJS 捉放法模型來解決這兩個議題。這個模型不僅可以賦予每一台伺服器不同的參數設定，還會以最大概略估計一次性估算 CJS 機率模型中的所有參數。不過每一伺服器都有相對應的參數會導致整個機率模型過於複雜，我們因此使用分群法按照提供服務的模式將伺服器分群，透過讓同一群伺服器共用參數以達到減少模型的參數數量。我們使用 2021 年五月蒐集的資料集做測試，發現以最大概似估計為基礎的 CJS 模型的確在在線估算中有較好的表現，而異質性和伺服器分群在實驗中並未有效提升估算準確率，我們透過檢視各分群的估算結果詳細分析其中原因。

關鍵字

Twitch ；捉放法；伺服器數量估計；最大概略估計；異質性

並列摘要

The quality and continuity of the video services such as Twitch depend on the scale and well-being of their content distribution networks (CDNs). Due to the growing demand for video services, server numbers in the CDNs have rapidly increased to feed videos to the clients. Given the widespread use of Twitch, we find continuous survey of its CDN an important subject of study. Inspired by Capture-Mark-Recapture(CMR), a methodology widely used to estimate animal population, we developed a system to continuously observe its CDN size (i.e., the number of servers) with lightweight probing. According to our previous research in AINTEC, the Cormack-Jolly-Seber (CJS) model can estimate the CDN size at each sample time with relatively low errors. Nevertheless, the assumptions of the traditional CJS model are still restrictive. Due to its long converging period, the model can only estimate server population offline. Besides, it assumes that all servers share the same capturing and survival rates, which does not meet the server patterns in Twitch's CDN. Therefore, we introduce the Maximum-Likelihood-Estimation-based (MLE) CJS model with heterogeneity to address these two issues. It not only allows different parameters for each server but also co-estimates all parameters in the CJS probability model. The resulting MLE model is too complicated, and thus we try server clustering to reduce the parameter space. Using a data set collected in May 2021, we find the MLE-based CJS indeed performs better in online estimation. Heterogeneity and server clustering, on the other hand, do not improve the estimation accuracy. For these worse results, we identify the detailed reasons with the estimation results in each group.

並列關鍵字

Twitch ； Capture-Mark-Recapture ； Server Population Estimation ； Maximum Likelihood Estimation ； Heterogeneity

參考文獻

H. Akaike. Information Theory and an Extension of the Maximum Likelihood Principle, pages 199–213. Springer New York, New York, NY, 1998.

Google Scholar

D. Borchers and M. Efford. Spatially explicit maximum likelihood methods for capture-recapture studies. Biometrics, 64:377–85, 07 2008.

Google Scholar

T. Böttger, F. Cuadrado, G. Tyson, I. Castro, and S. Uhlig. Open connect everywhere: A glimpse at the internet ecosystem through the lens of the netflix cdn. SIGCOMM Comput. Commun. Rev., 48(1):28–34, Apr. 2018.

Google Scholar

L. Breiman. Random forests. In Machine Learning, pages 5–32, 2001.

Google Scholar

C. Brownie and D. S. Robson. Models allowing for age-dependent survival rates for band-return data. Biometrics, 32(2):305–323, 1976.

Google Scholar

延伸閱讀

李政翰（2013）。基於地域性的需求分配策略以提供負載平衡於支援大型多人線上遊戲的伺服器叢集〔碩士論文，國立暨南國際大學〕。華藝線上圖書館。https://doi.org/10.6837/NCNU.2013.00041
Juan, Y. N. (2019). 建立使用非同步隨機梯度下降法的分散式訓練之多參數伺服器模型 [master's thesis, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU201903787
楊書瑋（2015）。適用於多資料收集伺服器感測網路之資料蒐集機制〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2015.00548
Gugamsetty, B., Wei, H., Liu, C. N., Awasthi, A., Hsu, S. C., Tsai, C. J., Roam, G. D., Wu, Y. C., & Chen, C. F. (2012). Source Characterization and Apportionment of PM10, PM2.5 and PM0.1 by Using Positive Matrix Factorization. Aerosol and Air Quality Research, 12(4), 476-491. https://www.airitilibrary.com/Article/Detail?DocID=16808584-201208-201209030005-201209030005-476-491
Molnár, P., Johannesson, S., & Quass, U. (2014). Source Apportionment of PM2.5 Using Positive Matrix Factorization (PMF) and PMF with Factor Selection. Aerosol and Air Quality Research, 14(3), 725-733(i-iii). https://www.airitilibrary.com/Article/Detail?DocID=16808584-201404-201405060001-201405060001-725-733(i-iii)

國際替代計量

以最大概略估計為基礎之捉放法模型估算伺服器數量

全文下載

主題瀏覽