I can suggest two possible methods.
#SAT
This is closely related to #SAT, and one approach might be to try applying an off-the-shelf #SAT (or approximate-#SAT) solver.
Let $t_k$ denote the number of $W$-sets that contain $S_k$ but not any of $S_1,\dots,S_{k-1}$, i.e., $t_k$ is the cardinality of the set
$$\{S \subseteq \mathcal{U} : |S|=W, S_k \subseteq S, S_1 \not\subseteq S, \dots, S_{k-1} \not\subseteq S\}.$$
It is easy to see that you want to compute $t_1+t_2+\dots+t_M$, so if we can figure out how to compute $t_k$, we are done.
How shall we compute $t_k$? Well, any set $S$ that meets the conditions must cover $S_k$, so all of the elements of $S_k$ are in $S$, and the only question is the remaining elements. Set $\mathcal{U}' = \mathcal{U} \setminus S_k$, $S'_i = S_i \setminus S_k$, and $W' = W-|S_k|$. Then $t_k$ is the number of $W'$-sets (over universe $\mathcal{U}'$) that don't cover any of $S'_1,\dots,S'_{k-1}$, i.e., the cardinality of the set
$$\{S' \subseteq \mathcal{U}' : |S'|=W', S'_1 \not\subseteq S', \dots, S'_{k-1} \not\subseteq S'\}.$$
As you stare at this, the condition on $S'$ turns out to be expressible as a boolean formula in CNF form. Let $x_1,\dots,x_{N'}$ (where $N'=|\mathcal{U}'|$) be boolean variables; the idea is that $x_i$ is true iff $i \in S$. Then the condition $S'_1 \not\subseteq S', \dots, S'_{k-1} \not\subseteq S'$ is equivalent to a CNF formula on $x_1,\dots,x_{N'}$ with $k-1$ clauses and $N'$ variables; each constraint $S'_i \not\subseteq S'$ induces a clause $\lor_{j \in S'_i} \neg x_j$. Then, $t_k$ is the number of satisfying assignments of weight $k$ for this formula. You can easily conjoin clauses to this formula that require the assignment to have weight $k$.
In this way, we have reduced the problem of computing $t_k$ to the problem of counting the number of satisfying assignments of a boolean formula in CNF form. This is exactly the #SAT problem. The problem is known to be hard, but there are off-the-shelf solvers you could try. There are also solvers that yield an approximation to the number, which are worth trying as well. See, e.g., Is there a sometimes-efficient algorithm to solve #SAT? and https://cstheory.stackexchange.com/q/18135/5038.
I don't know whether this will be fast enough for your needs, but it is a possible approach you could consider.
Inclusion-Exclusion
If $I \subseteq \{1,2,\dots,M\}$ is a set of indices, let $u_I$ denote number of $W$-sets that contain $S_i$ for each $i \in I$, i.e., the cardinality of the set
$$\{S \subseteq \mathcal{U} : |S|=W, \forall i \in I . S_i \subseteq S\}.$$
You can express the number you want using the inclusion-exclusion formula. In particular, you want to compute
$$u = \sum_I (-1)^{|I|} \cdot u_I$$
where $I$ ranges over all non-empty subsets of $\{1,2,\dots,M\}$.
Heuristically, we can expect that the larger $I$ is, the smaller $u_I$ will be, so we can approximate the above sum as follows:
$$u \approx \sum_{|I| \le 8} (-1)^{|I|} \cdot u_I.$$
Note that you can compute $u_I$ easily for any given $I$; in particular,
$$u_I = {N-|S_I| \choose W-|S_I|}$$
where we define $S_I = \cup_{i \in U} S_i$. Now since ${50 \choose 8} \approx 2^{29}$, for your particular parameters it is easy enough to enumerate all index sets $I$ such that $|I|\le 8$ and compute $u_I$ for each, then use the approximate inclusion-exclusion formula above. You can adjust the parameter 8 to trade off computation time vs quality of the approximation.
The disadvantage is that there is no a priori guarantee on how good the approximation will be -- but this might be useful nonetheless.
It might be possible to refine and improve this approach further. While it's true that on average we expect larger sets $I$ correspond to smaller values of $u_I$, this isn't necessarily true. Therefore, it's possible that you might get a better approximation by picking a threshold $\tau$ and then using
$$u \approx \sum_{u_I \ge \tau} (-1)^{|I|} \cdot u_I.$$
This requires you to iterate through sets $I$ in order of decreasing $u_I$, and then stop once you reach the threshold $\tau$. How do you do this? The key insight is that as you add elements to $I$, the value of $u_I$ gets monotonically smaller. Therefore, you can use breadth-first search to explore which elements to add to the set. You start out with all sets $I$ of size 1 and push them onto a priority queue. The elements of the priority queue are ordered by the value of $u_I$. In each step, you pop from the priority queue a set $I$ with the largest value of $u_I$ out of all on the queue; and assuming you haven't processed $I$ before, you add $(-1)^{|I|} \cdot u_I$ to your running total and then push the sets $I \cup \{j\}$ onto the queue for each $j \notin I$.
I suggest trying both variants on inclusion-exclusion to see which gives a better approximation.