4

I have users reports about an accident, i want to know how to make sure that the number of reports is enough to take that accident as a true accident not a spam.

My idea is to consider a minimum number of reports in a specific time interval, for example 4 reports in 20 minutes are good enough to believe the existence of that accident.

My question is how can I choose the number of minimum reports and that time interval? Is there another logic to take that decision? I will appreciate your answers .

Sean Owen
  • 6,595
  • 6
  • 31
  • 43
honeyyy
  • 41
  • 1
  • 2
    Do you have data with true accident/spam labels? You would need some basis to decide how good a specific threshold is. Do you have other data points relating to the accident - descriptions, etc.? – raghu Apr 30 '18 at 17:51

1 Answers1

1

You don't need a prediction model for this. Maybe if you have had users' data. But without anything else, then you just need labeled data. Historical data that you know if it was a real accident or not.

When you have your labeled data, then you can follow a process like this, which is still heavily dependent on the kind of your data.

Start iterating on your labeled dataset and calculate the accuracy of a real accident's report for different combinations (5, 10, 15, 20, 25, 30 ... mins) and (1, 2, 3, 4, 5, 6, 7, etc users).

You will have a 2D matrix. I guess, acting fast on an accident is important in your case. Set an acceptable accuracy and choose the combination with the smallest interval, above that accuracy.

Tasos
  • 3,920
  • 4
  • 23
  • 54
  • 3
    How is that not a predictive model? You're constructing a classifier subject to two features and picking a decision threshold relative to a cost function (risk aversion). – David Marx Mar 30 '18 at 18:14
  • @DavidMarx this is simply manual machine learning. – JahKnows Oct 27 '18 at 03:31