Machine Learning Operations Overview, Definition and Architecture

Notice

Recent Posts

Recent Comments

Link

« 2025/08 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Tags more

Archives

Today

Total

관리 메뉴

JUST WRITE

Machine Learning Operations Overview, Definition and Architecture 본문

Paper

Machine Learning Operations Overview, Definition and Architecture

천재보단범재 2023. 2. 20. 20:42

1. Why this paper?

데이터 거버넌스 관리 솔루션 개발자로 첫 커리어를 시작했습니다.

데이터를 다루다보니 자연스럽게 데이터에 깊은 관심을 가지게 되었고 데이터 엔지니어로 꿈을 꾸게 되었고 이직까지 하게 되었습니다.

이직 후 처음 접하게 된 MLOps는 엄청나게 크고 낯설게 큰 산 같았습니다.

팀장님의 소개로 알게 된 이 논문이 저에게 큰 도움이 되었습니다.

이번 포스팅을 통해 다시 한번 복기해 보며 MLOps에 대해 정리해보려 합니다.

2. Summary

해당 논문에서는 MLOps의 정의부터 Architecture까지 전반적인 부분에 대해 정리해주고 있습니다.

MLOps의 정의
MLOps와 관련된 종사자들의 역할
MLOps Architecture
MLOps와 관련된 Tool 소개

MLOps에 대해 A-Z까지 정리할 수 있는 논문입니다.

전문가 Interview와 관련된 Tool의 Review까지 알 수 있습니다.

모든 내용을 다 담기에는 방대해 중요 부분을 정리해보려 합니다.

MLOps에 대해 관심 있으신 분은 꼭 읽어보시기를 추천합니다.

3. Detail

3-1. DevOps

이 논문에서는 MLOps는 DevOps를 바탕에 두고 있다고 소개합니다.

DevOps는 개발조직에서 문제를 해결하기 위한 패러다임 중 하나입니다.

개발과 운영사이에 간격을 줄이고 협력과 커뮤니케이션을 극대화하기 위한 방법론입니다.

아래와 같은 사항들이 요구됩니다.

CI/CD(Continous Integration, Continous Delivery and Continuous Delopyment)
지속적인 Testing, Monitoring, logging, Feedback
Product Quality 보장

DevOps관련된 제품들은 아래와 같습니다.

Collaboration and Knowledge Sharing -> Slack, Trello
Source Code Management -> GitHub, GitLab
Build Process -> Maven
Continous Integration -> Jenkins, GitLab CI
Deployment Automation -> Docker, Kubernetes
Monitroing and Logging -> Prometheus, Logstash

이러한 DevOps의 발전에 기초하여 ML(Machine Learning)의 자동화에도 적용되고 있습니다.

3-2. Components

이 논문에서는 MLOps를 9가지의 Component로 나눠서 설명하고 있습니다.

CI/CD Automation
- Continous Integration, Continous Delivery, Continous Deployment
- build -> test -> delivery -> deploy
- 빠른 피드백/배포 주기를 통해 생산성 향상
Workflow Orchestration
- ML workflow pipeline coordinate
- DAG -> Task들을 관계, 의존성을 고려하여 순서대로 실행
Reproducibility
- reprodue ML experiment
Versioning
- 데이터, 모델, 코드의 Versioning
Collaboration
- collaboration on data, model, code
- 다른 역할 사이의 Domain Silos 절감
Continuous ML training & evaluation
- ML model retraining -> based on new feature data
- 모니터링과 자동화 workflow pipeline 통해 실현
ML Metadata tracking/logging
- 각 training job마다 Metadata tracking 필요
- 예) training 시간, parameters, 결과 metrics, data, code
Continuous monitoring
- Model and Model Serving 성능 monitoring
Feedback loops
1. Quality 보장을 위해서 loop for development or engineering proess (e.g. feature engineering)
2. lopp from monitoring

아래와 같이 Component별로 Tool도 정리해서 소개하고 있습니다.

CI/CD Component	build, test, delivery, deploy	Jenkins, GithHub actions
Souce Code Repository	code storing and versioning	Bitbucket, GitLab, GitHub, Gitea
Workflow Orchestration Component	workflow orchestration	Airflow, Kubeflow Pipelines, Luigi, AWS SageMaker Pipelines, Azure Pipelines
Feature Store System	central storage of used features	Google Feast, Amazon AWS Feature Store, Tecton, Hopswork
Model Training Infrastructure	provide computing resources	Kubernetes, RedHatOpenShift
Model Registry	centrally the trained ML model and metadata	MLflow, Azure Model Registry, Neptune, Simple storage(Azure Storage, Google Cloud Storage, AWS S3)
ML Metadata Stores	metadata store, for tracking of various kinds of metadata	Kubeflow Pipelines, AWS SageMaker Pipelines, Azure ML, IBM Watson Studio, MLflow
Model Serving Component	for predition service(e.g. REST API)	KServing(Kubeflow), TensorFlow Serving, Seldion.io Serving, Azure ML REST API, AWS SageMaker Endpoints, IBM Watson Studio, Google Vertex AI
Monitoring Component	continous monitoring of the model serving performance	Prometheus, Grafana, ELK Stack, TensorBoard, Kubeflow, MLflow, AWS SageMaker model monitor

3-3. Role

MLOps에 대해서 잘 이해하기 위해서는 어떤 이해관계자들이 있는지 알아야 됩니다.

MLOps는 여러 가지 분야에 관계있기 때문입니다.

아래와 같이 이해관계자들이 어떤 역할을 지내고 있는지 정리하고 있습니다.

Business Stakeholder
- ML을 통해 어떠한 Business 목표를 이룰 건지 정함.
- Product Owner, Project Manager
Solution Architect
- 어떠한 기술을 Solution에 적용할 것인지 Architecture 디자인
Data Scientist
- Business 문제를 ML 문제로 변환
- Model Engineering(Algorithm 선택, Hyperparameter Tunning)
Data Engineer
- Data Pipeline, feature engineering Pipeline을 설계, 운영
- feature store에 데이터 통합
Soft Engineer
- ML 문제의 해결방안을 Product에 적용
DevOps Engineer
- 적절한 CI/CD 구성
ML Engineer
- ML workflow pipeline 운영, ML 인프라 Monitoring
- 여러 가지 역할(Data Scientist, Data Engineer, DevOps Engineer)을 결합, cross-domain

3-4. Architecture and Workflow

위 그림으로 MLOps의 Workflow를 한눈에 알 수 있습니다.

그림의 상단에서부터 차근차근 살펴보도록 하겠습니다.

3-4-1. MLOps Project 시작

Business Stakeholder가 ML로 해결할 수 있는 잠재적인 Business 문제를 도출합니다.

Data Scientist가 도출한 Business 문제에서 ML 문제를 고안해 냅니다.

그리고 Data Engineer와 상의해서 고안해 낸 ML 문제에서 어떠한 Data가 필요한지 정리합니다.

어떤 Data Sorce가 필요한지, 어떤 데이터 전처리가 필요한지도 논의합니다.

3-4-2. Feature Engineering Pipeline

Model을 Training 하기 위해서는 raw data가 아닌 feature가 필요합니다.

필요한 feature을 뽑기 위해 Feature Engineering Pipeline을 구성합니다.

먼저 어떻게 데이터 전처리(Transformation, Cleaning)를 할지와 어떤 feature을 뽑을지 Rule을 정합니다.

Data Source로부터 가져와서 전처리 후 feature engineering을 진행합니다.

지속적인 feedback을 통해서 전처리와 feature engineering Rule를 개선합니다.

뽑은 feature는 feature store에 저장합니다.

3-4-3. Experiments

해당 flow에서는 Data Scientist의 역할이 중요합니다.

feature store에 저장된 데이터를 분석하며 필요시 처음 들어오는 raw 데이터도 분석합니다.

분석 후 feedback을 통해 추가로 필요한 데이터를 요청합니다.

모델 학습에 쓰일 데이터를 체크(validation)한 다음 Train, Test 데이터로 나눕니다.

Data Scientist는 최선의 Algorithm을 분석하고 Hyperparameter Tuning을 통해 모델의 성능을 높입니다.

Metric을 분석하여 최선의 Algorithm과 Hyperparameter로 학습한 Model을 저장합니다.

이러한 ML workflow를 위해 DevOps Engineer나 ML Engineer의 도움이 필요합니다.

3-4-4. Automated ML workflow pipeline

DevOps Engineer나 ML Engineer는 자동화된 ML workflow pipeline을 구성하고 운영합니다.

모델 학습에 필요한 인프라(e.g. Hardware or Kubernetes 환경)를 관리합니다.

ML workflow pipeline에는 아래와 같은 component가 필요합니다.

artifact store
isolated environment
workflow orchestration

pipeline의 첫 시작은 feature store에서 version 관리가 되고 있는 feature를 자동으로 가져옵니다.

이후 데이터 체크(validation) 후 Train, Test 데이터 나누는 과정 역시 자동으로 진행됩니다.

이전 학습 모델을 기반으로 새롭게 가져온 데이터(feature)를 추가해 학습합니다.

학습 전에 Algorithm 분석, Hyperparmeter Tuning도 진행합니다.

최선의 성능의 모델이 나오기까지 반복해서 자동으로 학습을 진행할 수 있도록 pipeline을 구성합니다.

모델 학습을 하며 진행된 데이터(parameter, metrics 등)는 ML metadata store에 저장합니다.

이러한 정보를 Model lineage라 하며 학습에 쓰인 feature, training code도 포함합니다.

최선의 모델이 학습되었으면 Product에 적용하기 전 model serving build, test를 진행합니다.

Model Serving Application은 주로 cotainer나 REST API로 구성됩니다.

3-4-5. Monitoring

Model Serving 퍼포먼스와 인프라에 대한 Monitoring을 진행합니다.

빠르게 prediction을 개선할 수 있도록 실시간으로 Montitoring 합니다.

Montoring component에서 feature pipline, ML pipeline Trigger(schduler)을 구성합니다.

Drift가 일어난다면 Automated ML workflow pipeline을 Trigger해 모델을 retraining 할 수 있도록 합니다.

4. 정리

논문의 마지막에 MLOps에 대해서 정의하는데 제 생각에는 이 한 문장이 적절하게 표현한다고 생각하였습니다.

MLOps is aimed at productionizing machine learning systems by bridging the gap between developemnt(Dev) and operations(Ops).

Machine learning을 제품에 적용하기 위해서 MLOps는 필수라는 생각이 들었습니다.

이 논문 덕분에 MLOps에 대해 한걸은 다가갈 수 있었습니다.

이 논문보다 MLOps에 관련된 Tool 포함해서 MLOps에 대해서 A-Z까지 잘 정리된 글이 있을까 싶었습니다.

MLOps에 대해 관심이 있으신 분은 꼭 읽어보시길 추천합니다.

[논문링크]

Machine Learning Operations (MLOps): Overview, Definition, and Architecture

The final goal of all industrial machine learning (ML) projects is to develop ML products and rapidly bring them into production. However, it is highly challenging to automate and operationalize ML products and thus many ML endeavors fail to deliver on the

arxiv.org

728x90

저작자표시 비영리 (새창열림)

'Paper' 카테고리의 다른 글

Feature Store 필요해?! (0)	2023.03.07
IMU 센서를 통한 동작 인식을 위한 Feature engineering (0)	2023.02.06

'Paper' Related Articles

Comments

JUST WRITE

Machine Learning Operations Overview, Definition and Architecture 본문

Machine Learning Operations Overview, Definition and Architecture

1. Why this paper?

2. Summary

3. Detail

3-1. DevOps

3-2. Components

3-3. Role

3-4. Architecture and Workflow

3-4-1. MLOps Project 시작

3-4-2. Feature Engineering Pipeline

3-4-3. Experiments

3-4-4. Automated ML workflow pipeline

3-4-5. Monitoring

4. 정리

[논문링크]

'Paper' 카테고리의 다른 글

티스토리툴바