I understand your questions as follows:
Is it valid to have an all-zeros output for a certain input?
Yes its possible in some cases like:
In the tutorial "jigsaw-toxic-comment-classification-challenge", the data is taken from wikipedia comments, so generally there is no toxic behavior in people's comment, it is very rare that a person posts "bad" comments in an informative source like wikipedia, this might lead to all the labels being zero for a particular example.
In a single label classification like predicting whether a person has a rare disease,the dataset would contain labels that are mostly "zero"-(no disease)-skewed towards zero
This happens in datasets where the positive output to be predicted is a very rare case.
I guess your second question is -whether "accuracy" is a good way of evaluating a model for this type of a problem
You are right, in such a case even a simple program/model that outputs zeros for all the inputs would achieve an accuracy of more than 90%, so here accuracy is not a good metric to evaluate the model on.
You should go through the metrics f1_score,recall, and precision which are ideal for this type of problems.
Basically here we are interested in "out of those which are predicted positive, how many are really positive -precision" and "out of those which should have been predicted positive, how many are really predicted positive"
If my definitions seem confusing to understand, please got through the link below:
f1_score/recall/precision
Hope I was helpful