微調大型語言模型的業配文貼文生成之應用：基於英文與繁體中文數據集的洞悉

本論文提出了一種通過微調大型語言模型生成業配貼文內容的方法論。我們利用英語Instagram貼文之公開資料集，並採用Quantized Low-Rank Adaptation（QLoRA）方法微調了Llama3-8b模型。我們的方法包括向大型語言模型提供各種輸入特徵，包括關鍵意見領袖意即網紅的簡介、貼文標籤、標記的用戶名和由BLIP2生成的圖像描述，以評估這些特徵對生成內容的品質影響。此外，我們與iKala公司合作，收集了一個繁體中文的Instagram業配文資料集，該資料集雖僅有基於文本的貼文資料，不包括圖片或網紅簡介，但我們仍透過在該資料集上微調Llama3-8b和yi-6b模型，有效評估了其在生成高價值的業配內容貼文方面的性能。我們的微調方法涉及多種新技術。像是使用QLoRA來減少權重縮放，實現更高效的內存利用，並能夠在資源較少的情況下加快微調，或是通過xformers和Tri Dao的集成方法Flash Attention來更有效率地微調模型，並使用Causal Mask來加速訓練。此外，我們通過混合精度訓練優化了Cross Entropy Loss，這通過使用半精度進行計算和全精度進行權重更新來達到同時減少內存消耗與保持精度。成效評估方面，我們使用了五種指標：CLIPScore，來衡量圖片和文本之間的適配度；LLM-Eval，一種基於GPT-4o的評估指標，用於評估內容的品質；正確的貼文標籤及標記用戶名之比例；Cosine Similarity，使用預訓練的sentence transformer計算文本的語意適配度；以及人為的綜合評估。我們在英語和中文任務中皆分別使用了GPT-3.5和GPT-4o模型作為比較的基準。我們的研究結果表明，微調的Llama3-8b和yi-6b模型儘管參數僅有幾十億，但在業配貼文內容方面的表現對比具有約一兆參數量的GPT-4o等明顯更大的模型有著相當具競爭力的表現。這是在使用單個一般消費者級別之GPU的情況下完成的，突顯了這些模型在微調和推理中的高效性。我們還發現，在微調過程中變化輸入內容，例如包括或排除圖像描述和KOL簡介，會顯著影響生成標題的質量和一致性。本研究提供了一種可行且資源利用高效的生成高品質業配貼文內容的方法論，並為未來在此領域的研究建立了一個標竿。

關鍵字

大型語言模型；微調；業配；網紅行銷；社群媒體

並列摘要

This paper presents a novel approach to generating sponsored post captions by fine-tuning large language models (LLMs). Leveraging an open dataset of English Instagram posts labeled as sponsored content, we fine-tuned the Llama3-8b model using Low-Rank Adaptation (LoRA). Our methodology involved providing the LLM with various input features, including Key Opinion Leader (KOL) biographies, post hashtags, tagged usernames, and image descriptions generated by BLIP2, to assess their impact on the quality and effectiveness of generated captions. Additionally, in collaboration with iKala, we collected a Traditional Chinese (zh-tw) sponsored post dataset, focusing on text-based post information without image descriptions or KOL biographies. We fine-tuned both the Llama3-8b and yi-6b models on this dataset to evaluate their performance in generating engaging sponsored post captions. Our fine-tuning strategy involved several innovative techniques. We employed Quantized Low-Rank Adaptation (QLoRA) to reduce weight scaling, achieving a more efficient memory footprint and enabling faster fine-tuning with fewer resources. We integrated Flash Attention via xformers and Tri Dao's implementation to optimize transformer models, and utilized a causal mask to speed up training. Additionally, we optimized Cross Entropy Loss through mixed precision training, which reduced memory consumption and enhanced precision by using half-precision for computation and full-precision for weight updates. The evaluation of generated captions was conducted using three methods: CLIPScore to measure alignment between images and text; LLM-Eval, a GPT-4o based evaluation metric assessing content quality; and a comprehensive manual evaluation involving human raters. For baseline comparisons, we employed GPT-3.5 and GPT-4o models in both English and Chinese contexts. Our findings demonstrate that the fine-tuned Llama3-8b and yi-6b models, despite their relatively small size of a few billion parameters, achieve competitive results with significantly larger models such as GPT-4, which has around one trillion parameters. This was accomplished using a single consumer-grade GPU, highlighting the model's efficiency in both fine-tuning and batched inference. We also identified that varying the input content during fine-tuning, such as including or excluding image descriptions and KOL biographies, significantly affects the quality and consistency of the generated captions. This research contributes a viable and resource-efficient pipeline for generating high-quality sponsored post captions and establishes a benchmark for future studies in this domain.

並列關鍵字

Large Language Model Model ； Finetuning ； Sponsorship ； Influencer Marketing ； Social Media

參考文獻

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., and Zheng, X. (2016). Tensorflow: a system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI’16, page 265–283, USA. USENIX Association.

Google Scholar

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al. (2023). Gpt-4 technical report. arXiv preprint arXiv:2303.08774.

Google Scholar

Ainslie, J., Lee-Thorp, J., de Jong, M., Zemlyanskiy, Y., Lebron, F., and Sanghai, S. (2023). GQA: Training generalized multi-query transformer models from multi-head checkpoints. In Bouamor, H., Pino, J., and Bali, K., editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 4895–4901, Singapore. Association for Computational Linguistics.

Google Scholar

Amazon (2023a). Amazon launches generative ai to help sellers write product descriptions.

Google Scholar

Amazon (2023b). Amazon rolls out ai-powered image generation to help advertisers deliver a better ad experience for customers.

Google Scholar

主題瀏覽