As promised, I have written a very long article about how I optimised the shadow renderer in my game "Dark".

(source: andrewrussell.net)
My article goes into a lot of detail, reaching a whopping 2500 words, plus illustrations, so I won't repost it here. But I will give a quick overview:
There are two major things that I had to optimise, for Dark: First of all, my fill-rate. Each light requires drawing over large areas of the screen. And second of all, the number of batches I was sending to the GPU - as every light must redraw the level geometry in order to cast shadows against it.
The major techniques I used were scissor rectangles, and culling respectively. Although merging static geometry would have been better, for the latter. More interesting, however, is the tricky ways that I made use of these techniques to squeeze out even more performance and get more lights into my levels.
But the he most vitally important thing that I describe in my article is the measuring that I did to assess the performance impact of particular things, determining what measurement constituted the "limit" for that thing, and then figuring out how to bring that thing in under that limit.
(Speaking of performance limits, I have an answer over here that lists many of the limits you will come up against when doing graphics programming and game development.)
I really hope that helps :)