How profitable to sell a series of goods to the same buyer?

Recently, at the conference WWW'2017 (26th International World Wide Web Conference), Alexey Drutsa has presented the paper "Horizon-Independent Optimal Pricing in Repeated Auctions with Truthful and Strategic Buyers". We will briefly explain the core of this study.
The studied subject. The author of this work studies machine learning algorithms that optimize revenue in repeated posted-price auctions in which the seller (who is also the organizer of the auctions) offers same goods (one per round) to a single buyer. The buyer holds a fixed private valuation for a good, i.e., the valuation is unknown to the seller and is equal for goods offered in all rounds. The seller is able to set a price for the goods in each round, based only on the buyer decisions (to accept or to reject a price) made at previous rounds. It is assumed that the seller announces his pricing algorithm in advance, and, hence, the buyer may act strategically against this pricing (to optimize the sequence of his actions in order to maximize his cumulative surplus over all rounds).
The studied problem has a high practical value since posted-price auctions are a special case of second-price auctions with a reserve price that involve only one buyer. Such auctions are widely applied to sell online ad inventory on web sites and, thus, their study is an important component of revenue lift earned from advertising, e.g., in Yandex Advertising Network.
The main research goal: Which pricing algorithm  should be chosen by the seller to attain his cumulative regret as minimal as possible in the worst case of the buyer valuation? The optimality of the algorithm in this case is treated in terms of asymptotic tight bounds that grow as the time horizon T of the game (number of rounds, i.e., interactions between the buyer and the seller) goes to infinity: the slower the growth, the faster (on average) the seller is able to set prices as close as possible to the actucal buyer valuation.
Prior state of the art. Until recently, on the one hand, the best known algorithm in this setting was the algorithm PFS that has a strategic regret upper bound in O(logT * loglogT). On the other hand, it is known that the regret of any pricing algorithm is lower bounded by C*loglogT for some positive constant C. Hence, the algorithm PFS has a logarithmic gap between its best known lower and upper bounds for the strategic regret and is thus non-optimal in the considered setting. Another disadvantage of existing algorithms is their dependence on the time horizon T, that is, for their effective application, the seller has to know in advance the number of times he will interact with the buyer (otherwise, an applied algorithm will incur a linear regret). However, this requirement to an algorithm diverges from a common practice, when the seller does not know in advance how many times it will interact with the buyer.
Main contributions. In the paper presented at the conference WWW'2017, the algorithm PRRFES (Penalized Reject-Revising Fast Exploiting Search) was proposed, which
  1. has a tight strategic regret bound in O(loglogT):
    • this result closes the logarithmic gap between the previously best known upper and lower bounds on strategic regret;
  2. is independent of the time horizon:
    • the seller should not thus know in advance the number of times he will interact with the buyer in order to efficiently apply the algorithm.