有限存貨下的動態定價

在這篇論文中，我們研究動態定價問題，並提供兩個有趣的情境。我們主要針對線上學習的情境設計演算法，並且考慮有限存貨的情況。對一位賣家而言，他的目標是在有限的時間，將有限的商品賣出，並以達到最高的累積收益為目標。為了描述兩個我們關心的變數對這個問題所造成的影響，我們建構了兩個理論模型。第一個模型，賣家對每個買家的資訊有所了解，在他針對買家定價前，他會先得到一些買家相關的資訊，這類的情境在網路購物等環境下較為常見。我們對買家的類型 (賣家所看到的資訊) 並沒有機率分佈的假設，加上商品有限的假設，讓評估線上的動態定價演算法變得困難。我們提出了一個標準來評估線上動態定價演算法，針對此標準設計了一個演算法，並且提供該演算法期望收益的理論保證。第二個模型，我們假設每個買家可能在賣場待上一段時間，而即便他們看到一個可接受的價格，他們也可能策略性的等待更低的價格。針對這個模型，我們提供了一個新的買賣機制，並基於該機制設計一個線上動態定價演算法。同樣的，我們提供了該演算法在期望收益上的的理論保證。

關鍵字

動態定價；收益管理；線上學習；多臂吃角子老虎機；情境式吃角子老虎機；賽局理論；機制設計

並列摘要

This thesis introduces scenarios for the well-known dynamic pricing problem, and presents corresponding learning algorithms. Different form the previous works, we mainly focus on the scenario that initially, the seller is given a finite inventory, and want to sell them out in a finite period of time. We build two different theoretical models to describe this problem under different concerns. For the first model, the seller observe a context vector of each consumer before deciding the posted price for her, also the context of each consumer is adversarially given. In general, the objective of the seller is to maximize the revenue, however, it’s not as trivial under the adversarial setting with limited inventory. We introduce a criterion to evaluate the performance of an learning algorithm, and then design an algorithm with performance guarantee on top of such criterion. For the second model, all consumers may stay in the market for a period of time, and they may wait for lower payment in order to maximize their utility. In this model, we introduce a new selling mechanism with good properties, and design a learning algorithm with performance guarantee based on the new mechanism.

並列關鍵字

Dynamic Pricing ； Revenue Management ； Online learning ； Multi-Armed Bandit ； Contextual Bandit ； Game Theory ； Mechanism Design

參考文獻

[1] Y. Abbasi-Yadkori, D. Pál, and C. Szepesvári. Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems, pages 2312–2320, 2011.

Google Scholar

[2] S. Agrawal and N. Goyal. Thompson sampling for contextual bandits with linear payoffs. In the 30 th International Conference on Machine Learning, pages 127– 135, 2013.

Google Scholar

[3] P. Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3:397–422, 2002.

Google Scholar

[4] P. Auer and N. C.-B. and Paul Fischer. Finite-time analysis of the multi armed bandit problem. Machine Learning, 47(2-3):235–256, 2002.

Google Scholar

[5] Y. Aviv, Y. Levin, and M. Nediak. Counteracting strategic consumer behavior in dynamic pricing systems. In Consumer-Driven Demand and Operations Management Models, pages 323–352. Springer, 2009.

Google Scholar

國際替代計量

有限存貨下的動態定價

主題瀏覽