簡易檢索 / 詳目顯示

研究生: 陳建豪
Chen, Jian-Hao
論文名稱: 使用人工智慧晶片實作之自動樂譜辨識與打擊樂演奏系統
Robotic Percussion System Incorporating an Automatic Sheet Music Recognition System Using Artificial Intelligence Chip
指導教授: 王偉彥
Wang, Wei-Yen
口試委員: 翁慶昌
Wong, Ching-chang
盧明智
Lu, Ming-Chih
呂成凱
Lu, Cheng-Kai
許陳鑑
Hsu, Chen-Chien
王偉彥
Wang, Wei-Yen
口試日期: 2022/08/17
學位類別: 碩士
Master
系所名稱: 電機工程學系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 55
中文關鍵詞: 樂譜辨識深度學習Delta 機械手臂人工智慧晶片
英文關鍵詞: music score recognition, deep learning, delta robot, artificial intelligence chip
研究方法: 實驗設計法
DOI URL: http://doi.org/10.6345/NTNU202201582
論文種類: 學術論文
相關次數: 點閱:53下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近幾年的神經網路研究,針對高解析度光學影像辨識系統已達到成熟階段,然而龐大的卷積神經網路(Convolutional Neural Network, CNN)架構往往有著極大的計算成本,如何維持可接受的正確率並降低計算負擔是一個值得研究的方向。因此本論文使用專精電腦視覺任務的人工智慧晶片替換龐大的目標偵測CNN來偵測音符位置,並以自行設計之輕量CNN辨識音階資訊。將複雜的任務分配給兩個輕量CNN來實現一套光學樂譜辨識系統。本論文亦設計控制程式整合光學樂譜辨識與Delta機械手臂控制。透過鏡頭偵測與辨識拍攝到的紙本樂譜,並且以通用非同步收發傳輸器(Universal Asynchronous Receiver/Transmitter, UART)取得辨識結果。接著以辨識結果確定演奏順序後,驅動Delta機械手臂自動演奏鐵琴。最後以紙本樂譜實際測試本論文提出之光學樂譜辨識系統,驗證此系統的辨識正確率。

    In recent years, neural network research has reached a mature stage for high-resolution optical image recognition systems. However, huge Convolutional Neural Network (CNN) architectures often have huge computational costs, and it is worth studying how to maintain acceptable accuracy and reduce the computational cost. Therefore, this thesis uses an artificial intelligence chip specializing in computer vision tasks to replace the huge target detection CNN for detecting music score coordinates. This thesis also proposes a lightweight CNN to recognize the music scale of detected music score. A complex task is assigned to two lightweight CNNs to implement an optical music score recognition (OMR) system. This thesis also proposes the control program to integrate OMR system and Delta robot. The OMR system detects music score from captured sheet music through the lens and transfers results with Universal Asynchronous Receiver/Transmitter (UART) to control program. The program drives Delta robot to play percussion after the playing order is determined with the recognition results. Finally, we tested the OMR system with sheet music to verify the accuracy of this system.

    第一章 緒論 1 1.1 研究動機與背景 1 1.2 文獻探討 2 1.2.1 人工神經網路 2 1.2.2 卷積神經網路 4 1.3 論文架構 7 第二章 軟/硬體架構與設計 8 2.1 系統架構 8 2.2 硬體平台介紹 10 2.2.1 Mipy深度學習AI開發板[20] 10 2.2.2 自動演奏機構 11 2.3 軟體整合設計 15 2.3.1 主控制程式 15 2.3.2 神經網路程式 18 第三章 基於卷積神經網路之辨識演算法 19 3.1 音符辨識CNN 19 3.1.1 Mipy深度學習AI開發板訓練工具集 19 3.1.2 模型訓練流程 21 3.1.3 Mipy深度學習AI開發板偵測框生成原理 29 3.2 音階辨識CNN 35 3.2.1 卷積核大小對運算效能之影響 36 3.2.2 激發函數替換 37 3.2.3 訓練方法 39 第四章 實驗結果與分析 41 4.1 基於Mipy深度學習AI開發板之音符偵測辨識實驗結果 41 4.2 基於CNN之音階辨識實驗結果 44 4.2.1 網路架構消融實驗 44 4.2.2 音階辨識正確率 46 4.3 基於樂譜辨識機械手臂演奏自動化實驗 49 第五章 結論與未來展望 52 5.1 結論 52 5.2 未來展望 52 參考文獻 53

    [1] B. Dynamics. "Spot Arm - Mobile Manipulation." https://www.bostondynamics.com/products/spot/arm (accessed Aug. 30, 2022).
    [2] 蔡自偉, "印刷樂譜辨識系統," 國立中山大學, 資訊工程學系碩士論文, 2004.
    [3] 黃朝慶, "自動樂譜辨識與打擊樂機器人系統," 國立臺灣師範大學, 電機工程學系碩士論文, 2020.
    [4] W. S. McCulloch and W. Pitts, "A logical calculus of the ideas immanent in nervous activity," (in en), Bulletin of Mathematical Biophysics, vol. 5, no. 4, pp. 115-133, Dec. 1943, doi: 10.1007/BF02478259.
    [5] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," (in en), Nature, vol. 323, no. 6088, pp. 533-536, Oct. 1986, doi: 10.1038/323533a0.
    [6] G. E. Hinton, S. Osindero, and Y.-W. Teh, "A Fast Learning Algorithm for Deep Belief Nets," (in en), Neural Computation, vol. 18, no. 7, pp. 1527-1554, Jul. 2006, doi: 10.1162/neco.2006.18.7.1527.
    [7] C. Michael. "The Difference Between AI, Machine Learning, and Deep Learning?" https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/ (accessed Aug. 30, 2022).
    [8] K. Fukushima, "Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position," (in en), Biol. Cybernetics, vol. 36, no. 4, pp. 193-202, Apr. 1980, doi: 10.1007/BF00344251.
    [9] J. Schmidhuber, "Deep Learning in Neural Networks: An Overview," Neural Networks, vol. 61, pp. 85-117, Jan. 2015, doi: 10.1016/j.neunet.2014.09.003.
    [10] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998, doi: 10.1109/5.726791.
    [11] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," in NIPS Conf., Lake Tahoe, Nevada, USA, Dec. 2012, vol. 25: Curran Associates, Inc., pp. 1097-1105.
    [12] O. Russakovsky et al., "ImageNet Large Scale Visual Recognition Challenge," (in en), Int J Comput Vis, vol. 115, no. 3, pp. 211-252, Dec. 2015, doi: 10.1007/s11263-015-0816-y.
    [13] K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," in International Conference on Learning Representations (ICLR), San Diego, CA, USA, Apr. 2015, pp. 1-14.
    [14] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
    [15] C. Szegedy et al., "Going deeper with convolutions," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2015, pp. 1-9, doi: 10.1109/CVPR.2015.7298594.
    [16] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, Jun. 2016, pp. 779-788, doi: 10.1109/CVPR.2016.91.
    [17] J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, USA, Jul. 2017, pp. 6517-6525, doi: 10.1109/CVPR.2017.690.
    [18] J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement," arXiv preprint arXiv:1804.02767, 2018. [Online]. Available: https://arxiv.org/abs/1804.02767.
    [19] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 4510-4520.
    [20] 視芯有限公司. "視芯AVSdsp | AI晶片與應用模組開發 - Mipy AI 簡易應用發展系統." https://sites.google.com/avsdsp.com/avsdsp/module/mipy-system (accessed Aug. 30, 2022).
    [21] 視芯有限公司. "視芯AVSdsp | AI晶片與應用模組開發 - 第四代AI晶片 AVS05P." https://sites.google.com/avsdsp.com/avsdsp/chip/avs05p (accessed Aug. 30, 2022).
    [22] 視芯有限公司. "視芯AVSdsp | AI晶片與應用模組開發 - 第五代AI晶片 AI860." https://sites.google.com/avsdsp.com/avsdsp/chip/ai860 (accessed Aug. 30, 2022).
    [23] 採智科技股份有限公司. "MX-28 系列全向智能馬達清單." https://idminer.com.tw/product/mx-28-%e7%b3%bb%e5%88%97%e5%85%a8%e5%90%91%e6%99%ba%e8%83%bd%e9%a6%ac%e9%81%94%e6%b8%85%e5%96%ae/ (accessed Aug. 29, 2022).
    [24] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, Jun. 2017, doi: 10.1109/TPAMI.2016.2577031.
    [25] A. L. Maas, A. Y. Hannun, and A. Y. Ng, "Rectifier nonlinearities improve neural network acoustic models," in Proc. icml, 2013, vol. 30, no. 1: Citeseer, p. 3.
    [26] Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, "Efficient BackProp," in Neural Networks: Tricks of the Trade: Second Edition, G. Montavon, G. B. Orr, and K.-R. Müller Eds., (Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, 2012, pp. 9-48.

    無法下載圖示 電子全文延後公開
    2027/09/10
    QR CODE