we develop a ML model and deploy but who takes care of model monitoring and re-training. Unless we do it, ur project is no more than a college assignment.
There are many techniques to incorporate concept drift/model drift.
Active approach- where retraining happens only when drift is detected in data. Passive approach- when continuous retraining happens assuming drift is always present in data. many approaches also detect warning zone along with 'if there is drift or not.'
data:image/s3,"s3://crabby-images/e8242/e8242007b5c23c776fc06e9bd0ec11e6fe836de0" alt=""
Drift Detection Method based on Hoeffding’s bounds with moving weighted average-test
# create a function to generate data
def gen_data(mean, sd ,sample_size, number_cols):
'''
sd= standard deviation
number_cols= number of. cols in output data
'''
df= np.random.normal(3, 2.5, size=(sample_size, number_cols))
return(df)
# creating n distributions
df= np.random.normal(3, 0, size=(30, 3))
for i in range(30):
df2= np.random.normal(3, i+1, size=(30, 3))
df= np.vstack([df, df2])
# Plotting data
import matplotlib.pyplot as plt
plt.plot(df[:,0])
plt.show() # plotting for 1 col only
data:image/s3,"s3://crabby-images/b1f83/b1f836aac3c8351bd0c789810e39bf1cdac74d86" alt=""
# test by Hoeffding bound method of drify detection
from skmultiflow.drift_detection.hddm_w import HDDM_W
hddm_w = HDDM_W()
for i in range(len(df)):
hddm_w.add_element(df[i,0])
if hddm_w.detected_warning_zone():
print('Warning zone has been detected in data: ' + str(df[i]) + ' - of index: ' + str(i))
if hddm_w.detected_change():
print('Change has been detected in data: ' + str(df[i]) + ' - of index: ' + str(i))
# output of drift detection-
data:image/s3,"s3://crabby-images/170b6/170b631b5161978916e2d5e4f0488ad013e8da6a" alt=""
fitting Adaptive Random Forest for forecasting.
# Imports
from skmultiflow.data import RegressionGenerator
from skmultiflow.meta import AdaptiveRandomForestRegressor
import numpy as np
# fitting the model for 200 data points
# Setup the Adaptive Random Forest regressor
arf_reg = AdaptiveRandomForestRegressor(random_state=123456)
# Run test-then-train loop for max_samples and while there is data
y_pred=[]
for i in range(200):
X=df[i,0:2].reshape(1,2)
y=df[i,2].reshape(1)
y_pred.append(arf_reg.predict(X)[0])
arf_reg.partial_fit(X, y)
# checking the model performance
y_true = list(df[:200,2].reshape(200,))
# Display results
print('Adaptive Random Forest regressor example')
print('Mean absolute error: {}'.format(np.mean(np.abs(np.array(y_true) - np.array(y_pred)))))
print('Mean absolute error: {}'.format(np.mean(np.abs(y_true - y_pred))))
data:image/s3,"s3://crabby-images/34c41/34c41972253d7dd2fbc0e9aadb6ce09911f839f4" alt=""
# Visualise the performance
y_true= np.array(y_true).reshape(200,)
data_for_plot= np.abs(y_true - y_pred)
plt.plot(data_for_plot)
plt.title('performace on continious model retraining')
plt.ylabel('error-prediction')
plt.show()
data:image/s3,"s3://crabby-images/69c9f/69c9fe127b22faed06321a9a15c66ef63370ca5d" alt=""
references-
file:///Users/apple/Downloads/smartcities-04-00021-v3.pdf
Comentarios