MLflow 란?

728x90

MLflow 란?

MLflow는 머신러닝 실험을 체계적으로 관리할 수 있도록 설계된 오픈소스 플랫폼이다. 주로 다음 네 가지 핵심 기능을 제공한다.

1) MLflow Tracking

모델 학습 과정에서 사용된 하이퍼파라미터, 메트릭, 모델 가중치, 이미지 등 을 기록하고 시각화할 수 있다.

2) MLflow Projects

코드와 환경을 체계적으로 정리하여 재현 가능성을 높이는 기능이다.

3) MLflow Models

학습된 모델을 다양한 포맷으로 저장하고, 이후 로드하여 쉽게 배포할 수 있도록 한다.

4) MLflow Registry

모델 버전을 관리하고, 배포 전후의 단계를 추적하는 기능이다.

이러한 기능을 통해 MLflow는 실험 관리를 자동화하고, 체계적으로 비교할 수 있도록 도와준다.

mlflow 는 wandb 처럼 추적하는 것이다.

MLflow 예제코드

import mlflow
import mlflow.sklearn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score

# 데이터 로드
data = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

# MLflow 실험 설정
mlflow.set_experiment("RandomForest Experiment")

# 여러 하이퍼파라미터를 변경하며 실험
n_estimators_list = [10, 50, 100, 200]
max_depth_list = [3, 5, 10, None]

for n_estimators in n_estimators_list:
    for max_depth in max_depth_list:
        with mlflow.start_run():
            # 모델 학습
            model = RandomForestRegressor(n_estimators=n_estimators, max_depth=max_depth, random_state=42)
            model.fit(X_train, y_train)
            
            # 예측 및 평가
            y_pred = model.predict(X_test)
            mse = mean_squared_error(y_test, y_pred)
            r2 = r2_score(y_test, y_pred)

            # MLflow에 로그 기록
            mlflow.log_param("n_estimators", n_estimators)
            mlflow.log_param("max_depth", max_depth)
            mlflow.log_metric("mse", mse)
            mlflow.log_metric("r2_score", r2)

            # 모델 저장
            mlflow.sklearn.log_model(model, "random_forest_model")

            # 그래프 생성 및 저장
            plt.figure(figsize=(6, 4))
            plt.scatter(y_test, y_pred, alpha=0.6, color="blue")
            plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], '--r', linewidth=2)
            plt.xlabel("Actual")
            plt.ylabel("Predicted")
            plt.title(f"Prediction Scatter Plot (n={n_estimators}, depth={max_depth})")
            plt.savefig("scatter_plot.png")
            plt.close()

            mlflow.log_artifact("scatter_plot.png")  # MLflow에 그래프 저장

            print(f"Logged: n_estimators={n_estimators}, max_depth={max_depth}, MSE={mse:.4f}, R2={r2:.4f}")

위의 코드로 간단하게 실행 후 mlflow 를 확인 할 수 있다.

mlflow ui

이후 브라우저에서 http://127.0.0.1:5000 에 접속하면,
각 실험의 하이퍼파라미터, 평가 지표, 그래프(산점도)를 확인할 수 있다.

아래는 접속했을때 화면이다.

MLflow는 머신러닝 실험을 효율적으로 관리하는 데 큰 도움을 준다. 특히 하이퍼파라미터 튜닝을 할 때 각 실험의 성능을 쉽게 비교할 수 있으며, 학습된 모델을 저장하여 배포까지 연결할 수 있다.

앞으로 MLflow를 활용하면 머신러닝 프로젝트의 재현성과 효율성을 극대화할 수 있을 것이다.

끝.

감사합니다.

728x90

저작자표시 비영리 변경금지 (새창열림)

'딥러닝 (Deep Learning) > [07] - Serving' 카테고리의 다른 글

Kubeflow 를 쉽게 이해하기 (피자 가게) (3)	2025.01.31
딥러닝 파일 확장자의 차이점 .mar, .pt, .pth, .onnx (4)	2024.12.19
TorchServe 모델 배포 방법! (3)	2024.12.19
Model Serving 패턴 종류 (4)	2024.12.09
Model Serving 이란?? (3)	2024.12.09

MLflow 란?

MLflow 예제코드

'딥러닝 (Deep Learning) > [07] - Serving' 카테고리의 다른 글

티스토리툴바