社會科學量化資料再用研究－以2011至2015 TSSCI期刊論文為例

資料分享運動於近年來開始蓬勃發展，政府機關、學術機構與期刊出版商紛紛規定或鼓勵學者分享資料。但是資料分享的前置作業流程繁複，會耗費資料分享者不少的時間與勞力，因此現有的實徵研究也開始觀察學界的資料再用狀況，以評估資料分享的效益。本研究探討的是臺灣社會科學領域資料再用的狀況，以及社會科學學者資料再用的行為特徵。本研究首先以內容分析法分析2011至2015年TSSCI劃分的社會、政治、教育、經濟與心理學門底下的期刊論文，分析的項目分別為(1)再用論文特徵，包含再用論文比例、年代、資料使用數量，以及資料交代狀況；(2)被用資料特徵，包含資料主題、變數主題、資料蒐集者�彙編者、資料類型以及資料與論文的年代差距。另一方面，本文以半結構式的深度訪談法訪談由前述內容分析辨識出的再用論文作者14位。訪談與分析的面向包含資料再用的動機、得知資料的管道、如何評估資料，以及獲得資料後的後續處理行為。研究結果顯示，再用論文共有511篇，佔實徵論文數量的17.33%。在資料使用數量方面，多數再用論文僅使用1筆資料，較為不同的是經濟學門有將近一半的論文同時使用2筆以上的資料。在資料交代狀況方面，所有論文均會在正文交代資料，但是在摘要、謝辭、表格或參考文獻交代資料的論文比例均不到一半。在資料的識別資訊交代方面（包含資料的網址�DOI、題名、蒐集者�彙編者以及年代），除了資料的網址�DOI，多數再用論文會完整交代資料識別資訊。在被用資料的特徵方面，本文一共分析了875筆被用資料。在資料主題方面，有一半左右的被用資料主題與政治、教育以及經濟有關。在被使用的變數主題上，除了與研究主題相關的變數外，社會、政治與教育學門常使用的變數多與人類社會特徵有關（例如年齡、職業、薪資以及婚姻狀態）；相對而言，經濟學門常用的變數多與國家或機構組織的發展有關。在資料的蒐集者�彙編者方面，整體資料主要是來自政府機關（46.86%），其次分別為學術機構（23.77%）與民間機構（21.14%），來自個別研究的資料不到6%。在資料類型方面，整體而言，被用資料的類型是以業務資料為主（55.54%），系列調查資料次之（34.63%），屬於一次性研究的資料僅佔8.34%。在資料年代方面，社會與經濟學門使用的資料與論文的年代差距較大；相對而言，政治與教育學門使用的資料年代差距較小。在資料使用的行為特徵方面，本研究發現(1)資料再用動機，可分為所需資料無法自行蒐集、資料具有公信力、延伸同儕研究、探索潛在研究題目以及學科領域文化的影響；(2)得知資料的管道包含學術文獻、同儕與指導教授、政府與學術機構網站、學會與調查機構的推廣以及紙本統計資料；(3)受訪者會評估問卷內容的可用性、資料分析結果是否具有發表價值、資料蒐集過程的品質、樣本代表性、資料年代以及資料易得性；(4)在分析資料前，會為資料進行描述統計、基本的資料處理，以及合併資料或補充不足的資料。

關鍵字

社會科學；量化資料；資料再用；資料引用；資料尋求行為

並列摘要

With the increasing calls for data sharing, governments, academic institutions and journal publishers have mandated or encouraged scholars to share their research data. Data sharing is a complicated issue involving technical and social rearrangement. There are also calls for empirical examination of research data reuse activities to evaluate the outcome and benefit of data sharing. This study examined the state of data reuse in the Taiwan social sciences journals as well as the data reuse behavior of social sciences scholars. This study employed a content-analysis approach to analyze journal articles indexed in TSSCI. Five TSSCI domains were chosen for the analysis, including sociology, political sciences, education, economics and psychology. Journal articles from 2011 to 2015 were used as the sample for this study. The analyses focused on: (1) the characteristic of the data reuse papers, i.e., proportions of data reuse papers, publishing year, amount of datasets used in each reuse paper, and data-reporting state of paper; (2) the characteristic of the reused data, i.e., the subject distribution of datasets, subject distribution of variables, origination of data, types of data, and year gap between reuse papers and used data. It also used semi-structured in-depth interviews to examine 14 social scientists’ data reuse behavior. The interviews focused on (1) the reason for data reusing; (2) channels for data seeking and discovery; (3) principles governing assessment of the found data; and (4) the preparation treatment of the data prior to its reuse. Based on the analysis, this study found 511 reuse papers published in the said period (17.33% of the total empirical papers), most of which used one dataset. Almost half of the economics papers had used two or more datasets, making it distinct from other social sciences domains. Less than half of the papers had reported data in abstracts, tables, acknowledgements and references. However, most of reuse papers had provided sufficient identification information for the data, e.g., titles, collectors, and year of data. This study also identified 875 reused datasets. It was found that half of the datasets were in economics, political sciences and education. As to the variables used in the reuse papers, sociology, political sciences and education papers tended to use variables related to social characteristics, e.g., race, salary, and gender. On the contrary, economics papers had tended to use macro-level variables relating to country or institutional phenomenon. 46.86% of the reused datasets were originated from governments, followed by academic institutions (23.77%) and corporations (21.14%). Less than 6% of the datasets were from previous individual research. More than half of the datasets were business-transaction data, followed by series surveys (34.63%) and one-time study (8.34%). The year gap between reuse papers and datasets were relatively long in economics and sociology, but shorter in political sciences and education. The interviews revealed that scholars were motivated to reuse data mainly because of the barrier to collect data on their own, good credibility of existing data, ability to extend existing research, explore potential research questions, and the influences of subject disciplines. Scholars sought data through journal articles, colleagues and advisors, websites of government agencies and academic institutions, the promotion of academic institutes and hard copy statistics data. Prior to data use, a researcher would assess the usability and quality of the data, including the collection processes, representativeness of samples, timeliness and accessibility of data. Prior to data reanalysis, researchers may also observe the descriptive statistics of the datasets and conduct the necessary data cleaning and re-processing activities.

並列關鍵字

Social Sciences ； Quantitative Data ； Data Reuse ； Data Citation ； Data Seeking Behavior

參考文獻

林玉、劉京玫（2012）。資料庋用詮釋資料之探析。在陳雪華、陳光華（編著），e-Research 學術圖書館創新服務（頁123-152）。臺北市：臺大圖書館。

Berman, F., Wilkinson, R., & Wood, J. (2014). Building global infrastructure for data sharing and exchange through the research data alliance. D-Lib Magazine, 20(1/2). doi: 10.1045/january2014-berman

Brase, J., Sens, I., & Lautenschlager, M. (2015). The tenth anniversary of assigning DOI names to scientific data and a five year history of DataCite. D-Lib Magazine, 21(1/2). Doi: 10.1045/january2015-brase

Broom, A., Cheshire, L., & Emmison, M. (2009). Qualitative researchers’ understandings of their practice and the implications for data archiving and sharing. Sciology, 43(6), 1163-1180.

Butler, D. (2006). Mashups mix data into global service: Is this the future for scientiﬁc analysis? Nature, 439(7072), 6–7.

國際替代計量

社會科學量化資料再用研究－以2011至2015 TSSCI期刊論文為例

全文下載

主題瀏覽