(i) Where do the heuristic values for shortest distance come from? There is mention of "straight roads" but I don't understand this. I know "heuristic" is "informed guess" but why not values 1237, 978, 516, ... etc. or any other arbitrary, descending values for the heuristic distance "still to go" ?
When you decide to use A* to solve a problem, you need to design an appropriate heuristic. Designing a heuristic is a creative act, so one can't really give advice on how to do it. Ideally, though, the heuristic should give a good estimate of the true cost.
The purpose of the heuristic is to guide the search and a search that receives accurate guidance will terminate faster than one that receives poor guidance. There is, however, a trade-off. If your heuristic is perfect, then it will guide the search so well that the optimal route is the first one it examines. But, of course, if you could compute this perfect heuristic, you wouldn't need to be using search in the first place! Having a better heuristic means you spend less time searching, but you need to balance that against the fact that you'll probably spend more time computing this better heuristic.
(ii) What is the significance and meaning of "the heuristic must never make an over-estimate"? How do we know that, and why is it important?
You know it because you figured it out and you designed the heuristic to achieve that. The classic example is using straight-line distance as a heuristic for navigation. You know that the straight-line distance is the shortest possible distance between two points, so it cannot overestimate the distance you'd have to travel on the road network.
I wrote an answer a while ago about why it's important that you don't overestimate. Essentially, if you say "every route via X is long", the algorithm will first look at routes via other places, and potentially find one. When it finds an answer, it will stop. But it never considered routes via X so, if it turns out that the shortest route was via X, you missed it because your heuristic overestimated.
Surely all the intermediate distances before the completion of the A* algorithm are "over-estimates".
No. A*'s use of the heuristic means that, any time you expand a node in the search, you've found the shortest possible path to that node. So, every time you expand a node, you've found the optimal path to that node, plus an underestimating heuristic to the goal: all your intermediate states that get expanded are underestimates. There might be over-estimates in the frontier (i.e., suboptimal routes to intermediate goals) but they'll never get expanded.