有別於傳統關聯式資料庫需依賴JOIN才能進行跨表單查詢的作法,非關聯式資料庫(NoSQL:Not Only SQL)具有Schema-free的資料儲存特性與Sharding的資料分片機制,因此適合被用來處理巨量資料。此外,依據Shard Key設定值執行Sharding,將巨量資料切割成小範圍區塊來加速查詢速度。即便如此,就我們所知,目前尚未有一個成熟且系統化的方式管理(包含檢索與視覺化)巨量健保資料。因此,本研究以病患歸人檔之文件導向方式儲存巨量健保資料,並探討健保資料庫所提供的欄位屬性,歸納出12項在診斷治療上的重要欄位,將此選定為Shard Key進行Sharding並執行目標查詢(Targeted Query)以提高檢索效率。並以一範例進行查詢時間效能測試,據實驗結果顯示,本研究所提出的資料處理方法確實能大幅度地縮短巨量的健保資料的查詢時間。
NoSQL (Not Only SQL) database has schema-free data format and the function of sharding. Comparing the NoSQL database with the relational database, the NoSQL database is more suitable to handle the big data. The big data is sharding into small blocks which are based on shard keys to speed up queries answering. To our knowledge, there is not yet a mature and systematic approach (including retrieval and visualization) to managing the big data derived from the National Health Insurance Research Database. Therefore, our research used patient document-oriented way to average the big medical data storage and to explore the field properties of the health insurance research database. By summarizing 12 important fields as shard keys in theranostics, before executing target queries can improve search efficiency. According to our experimental results, it shows that the proposed method of data processing can indeed significantly reduce the big data query time.