多變量時間序列數據包含在許多領域,例如能源監測、環境和醫療保健。有許多基於深度學習的方法試圖學習多元時間序列數據的有效表示法。然而,這些工作通常以同一個時間戳的所有變量當作模型的輸入,這導致了模型容易強調變量之間的時間關係。在這篇論文中,我們關注的資料為電子健康記錄數據。這種多元時間序列數據由於不規則採樣和異步測量而導致了非常可觀的缺失值。這種不規則的多變量時間序列數據對有效的表徵學習提出了挑戰。為了應對上述挑戰,我們提出了“可擴展數值嵌入”。可擴展數值嵌入是基於「值作為token」的概念,獨立地將每個值嵌入為輸入模型的向量。使用可擴展數值嵌入,特徵提取器不僅可以學習變量之間的時間關係,更有機會學習到不同變量之間的關係。我們進一步結合可擴展數值嵌入與Transformer encoder來構成TranSCANE。透過Transformer encoder的屏蔽機制和可擴展數值嵌入的幫助,TranSCANE能夠避免關注缺失值。也就是說,TranSCANE針對碎片化多變量時間序列數據而言,可以不需要對缺失值補值。此外,我們還提出了專門為TranSCANE設計的改良型滾動注意力計算,提高了我們模型的可解釋性。實驗結果表明,TranSCANE在三個不同的電子健康紀錄數據集上有最佳的表現。TranSCANE具有學習變量之間更多特徵關係的潛力,以及基於它不需補值而對不同插補的強健性。有了這些結果,我們相信TranSCANE是一個強大的在不規則多元時間序列數據之表示學習模型。
Multivariate time series (MTS) data often arise in numerous domains, such as energy monitoring, environment, and healthcare. Numerous deep-learning-based methods have been proposed that attempt to learn an effective representation of MTS data. However, these works commonly take variables at the same timestamp as model inputs, emphasizing only the temporal relation. This study focuses on electronic health records (EHR) data, which is full of missing values due to irregular sampling and asynchronous measurement. This irregular MTS data poses additional challenges for effective representation learning. To tackle the challenges mentioned above, we propose “SCAlable Numerical Embedding” (SCANE). SCANE is based on the concept of “value as a token” and embeds each value independently. With SCANE, the feature extractor can learn not only the temporal but also the feature-wise relation between variables. We further integrate $\\mathrm{SCANE}$ with the Transformer encoder to form TranSCANE. With the masking mechanism and SCANE, TranSCANE can avoid paying unnecessary attention to missing values. That is, TranSCANE is an imputation-free model for fragmentary MTS data. Moreover, we propose the revised rollout attention toiled for TranSCANE. It improves the interpretability of our model. Experiment results show TranSCANE performs best on three different EHR datasets. It has the potential to learn more feature-wise relations between variables. Furthermore, it is robust against different imputations due to its "imputation-free" nature. As a result, we believe TranSCANE is a powerful representation learning model for irregular MTS data.