Conrad Bailey

Greedy Florist

This problem comes from HackerRank.

You can see my full solution in the GitHub repo.

Problem Summary

This problem is poorly written, so I advise you to refer to my summary. A group of k people want to work together to buy flowers f1, f2...fn, which have base prices p1, p2...pn. The florist works with a single customer at a time, and each time a customer buys a flower a multiplier associated with that customer is increased by one. So if customer1 buys flowers f1, f2, and f3 then customer1's total cost will be p1 + 2*p2 + 3*p3.

Furthermore if customer2 buys flowers f4 and f5, then the total price for the group is p1 + 2*p2 + 3*p3 + p4 + 2*p5.

Your objective is to determine the minimum cost for the group of customers to buy all of the flowers.

Naive Approach

After working through some small examples you might sense that it's best for the group to take turns buying the most expensive flower, until they've bought them all.

This is a greedy algorithm; one where optimization occurs by way of a single decision policy without looking ahead. A good programmer is wary of greedy algorithms because they are rarely optimal and often a distraction from the dynamic programming solution that's actually required.

But in this case, try as you might, you cannot devise a case where this greedy policy is suboptimal. But before you can totally trust it, you must prove it.

Proving the Greedy Algorithm

Our approach here is simple: state what we know as facts, then transform what we think we know into those facts. We should end up with a clear confirmation or contradiction of the facts.

Earlier we described our approach as the group "taking turns". We will call one set of turns a round, so in round1 all purchases occur with multiplier 1, in round5 all purchases occur with multiplier 5, and generally in roundM all purchases occur with multiplier M.

We want to prove that buying the maximum-price flower in an early round always yields a cost smaller than buying the maximum-price flower in a later round.

Assume p1 is the maximum flower price and p2 is sub-maximum, i.e. p1 > p2. Assume r1 is the early round multiplier and r2 is the later round multiplier, i.e. r1 < r2. We want to show that p1*r1 + p2*r2 < p1*r2 + p2*r1, so we simplify

Original:             p1*r1 + p2*r2 < p1*r2 + p2*r1
Subtract p1*r2:       p1*r1 + p2*r2 - p1*r2 < p2*r1
Subtract p2*r2:       p1*r1 - p1*r2 < p2*r1 - p2*r2
Factor out p1 and p2: p1(r1 - r2) < p2(r1 - r2)
Divide by (r1 - r2):  p1 < p2

But wait, that contradicts our assumption p1 > p2. What has gone wrong?

If you're comfortable with proofs, then you already know my mistake. I had to go through the stages with real numbers before I noticed it, and I felt dumb for it. Remember when we stated r1 < r2? Well that implies r1 - r2 < 0, and whenever you multiply or divide by a negative on an inequality, you must flip the sign. Therefore the correct last step in the proof is

Divide by (r1 - r2): p1 > p2

This confirms our intuitions and we can confidently implement the algorithm.

Implementation

In deciding how to store flower costs we need random insertion, maximum access, and maximum deletion: this sounds just like a max-heap!

To determine the price multiplier for the ith flower, we need only calculate floor(i / k) + 1. This is a clever translation of what 1-based round the purchase belongs to.

Besides I/O the meat of the thing is only a couple lines

long long cost{0};
int count{0};
while (!costs.empty()) {
  cost += costs.top() * ((count / k) + 1);
  costs.pop();
  ++count;
}

Final Complexity

Given n flowers, each max-heap insertion is O(log(n)), so ingestion is O(nlog(n)) (i.e. we had to compare-sort the costs).

Purchasing a flower requires one max-access (O(1)), one max-deletion (O(log(n)) thanks to reheapification), and some constant arithmetic. Therefore purchasing all n flowers is O(nlog(n)).

We could use a Fibonacci heap which has constant insertion cost, making ingestion a O(n) operation, but we can not alleviate the O(nlog(n)) bottleneck in the purchasing loop with heap variants.