I've read the following article about how to treat outliers in a dataset: http://napitupulu-jon.appspot.com/posts/outliers-ud120.html
Basically, he removes all the y which has a huge difference with the majority:
def outlierCleaner(predictions, ages, net_worths):
"""
clean away the 10% of points that have the largest
residual errors (different between the prediction
and the actual net worth)
return a list of tuples named cleaned_data where
each tuple is of the form (age, net_worth, error)
"""
#calculate the error,make it descend sort, and fetch 90% of the data
errors = (net_worths-predictions)**2
cleaned_data =zip(ages,net_worths,errors)
cleaned_data = sorted(cleaned_data,key=lambda x:x[2][0], reverse=True)
limit = int(len(net_worths)*0.1)
return cleaned_data[limit:]
But how may I apply this technique to a time series dataset if its rows are correlative?