The sklearn implementation of Lasso that can force non-negative weights (as in this answer) is based on the coordinate descent algorithm. You can reimplement it, using for example coordinate-wise Newton method. For simplicity, I did not inclide intercept into the model:
import numpy as np
# generate the data
np.random.seed(1)
n = 1000
X = np.random.normal(size=(n, 5))
y = np.dot(X, [0,0,1,2,3]) + np.random.normal(size=n, scale=3)
# initial solution (with some negative weights):
beta = np.dot(np.linalg.inv(np.dot(X.transpose(),X)), np.dot(X.transpose(), y))
print(beta)
# clip the solution from below with zero
prev_beta = beta.copy()
beta = np.maximum(beta, 1)
# improve the solution by restricted coordinate-wise Newton descent
hessian = np.dot(X.transpose(), X)
while not (prev_beta == beta).all():
prev_beta = beta.copy()
for i in range(len(beta)):
grad = np.dot(np.dot(X,beta)-y, X)
beta[i] = np.maximum(0, beta[i] - grad[i] / hessian[i,i])
print(beta)
This code will output initial and final beta's:
[-0.01404546 -0.02633036 1.06028543 1.99696564 2.93511618]
[ 0. 0. 1.05919989 1.99673774 2.93442334]
You can see that OLS beta differ from your optimal beta not only in the first two coefficients (that have been negative), but the rest of coefficients were also adjusted.