I'm actually new to Data Science and I'm trying to make a simple linear regression with only one feature X ( which I added the feature log(X) before adding a polynomial features) on a motley dataset using Python an all the Data Science stack that comes with it (numpy, pandas, sci-kit learn, ...)
Here you can find a piece of code of my regression using scikitlearn:
def add_log(x):
return np.concatenate((x, np.log(x)), axis=1)
# Fetch the training set
_X = np.array(X).reshape(-1, 1) # X = [1, 26, 45, ..., 100, ..., 8000 ]
_Y = np.array(Y).reshape(-1, 1) # Y = [1206.78, 412.4, 20.8, ..., 1.34, ..., 0.034]
Y_train = _Y
X_train = add_log(_X) if use_log else _X
# Create the pipeline
steps = [
('scalar', StandardScaler()),
('poly', PolynomialFeatures(6)),
('model', Lasso(alpha=alpha, fit_intercept=True))
]
pipeline = Pipeline(steps)
pipeline.fit(X_train, Y_train)
My feature X can go between 1 to ~80 000 and Y can go between 0 and ~2M
There is one thing I know about the curve I should obtain is that it should always decrease so the derivative should be always negative
I make a little schema to explain what I expect vs what I have:
Therefore I would like to reward prediction where derivative is always negative even if my data suggest the opposite.
Is there a way to do that with sci-kit learn?
Or maybe I'm suggesting a bad solution to my problem and there is another way to obtain what I want ?
Thank you