AIOps Engineer (2 Years Contract)

Data

Contract

Data

Contract

About Us

Do you want to be part of Thailand banking transformation? Data is the core of the new financial services era, and we are open for the opportunity to be part to drive this change at the core.

SCB DATAx is a new venture of the Siam Commercial Bank (SCB) holdings, a leading financial services and digital services holdings in Thailand and ASEAN. As part of the transformation of SCB into a group of product and technology companies, under the SCBx brand, SCB DATAx is the technology company to centralize data and provide AI and data science services and products to the group. With a leading-edge cloud native data & AI platform, our vision is to support the group to providing everyone in our region with the opportunity to prosper.

We work on forward-thinking challenges of centralizing, analyzing and sharing information. We collaborate with companies and experts in many different domains, embrace diversity and all that while having a good laugh and joy in work.

Discover job openings on our career page. To apply, email with the role's title as the subject, attach your CV, and specify your contact information. We're eager to learn more about you.

Apply now

I acknowledge that I have read and agreed to DataX's Terms and Conditions and Privacy Notice

Benefits

Other

Preferred Qualifications

Qualifications

Technical Skills

Strong Python programming skills with experience in production-grade development practices, including writing maintainable, scalable, and well-structured code. Knowledge of TypeScript or Go is a plus.
Developer mindset with solid understanding of backend development, microservices architecture, and RESTful API design.
Experience building automation workflows and data pipelines using Python, ETL processes, Bash scripting, and scheduled jobs (e.g., CronJobs).
Hands-on experience with cloud platforms (Azure) including Azure AI Services, Azure AI Foundry, and related AI infrastructure.
Experience with containerization and orchestration technologies, such as Docker and Kubernetes, for deploying scalable AI services.
Good understanding of Agentic AI architectures, AI Agents, and modern LLM frameworks such as LangChain, LangGraph, and AI SDK.
Ability to design and implement AI agents using:
Pro-code frameworks (e.g., Agent SDK)
Low-code / no-code solutions (e.g., Azure AI Foundry Agents)

AI Observability & Monitoring

Experience implementing AI observability and monitoring systems for LLM-based applications and AI agents.
Hands-on experience with AI observability tools such as LangFuse, and logging platforms such as Azure Log Analytics.
Experience building monitoring dashboards and system telemetry using Grafana.
Strong understanding of AI system logging, tracing, and performance monitoring for production AI systems.
Familiarity with LLM evaluation techniques, including LLM-as-a-Judge frameworks to measure agent quality and response performance.
Understanding of AI evaluation metrics, including:
model accuracy and response quality
latency and throughput
reliability and system health
token usage and cost efficiency

Operational Skills

Experience with monitoring and observability platforms such as Prometheus, Grafana, and ELK Stack.
Ability to design operational dashboards to track AI agent KPIs, system health, and service reliability.
Understanding of model drift detection, data quality monitoring, and AI system observability practices.
Strong communication and collaboration skills to work with AI engineers, risk teams, product teams, and business stakeholders.

Responsibilities

Design and implement AI observability frameworks to monitor the performance, reliability, and behavior of AI agents and LLM-based systems in production environments.
Integrate AI observability platforms with third-party systems (e.g., SAS solutions) for governance, compliance monitoring, and operational reporting, using ETL pipelines and data integration workflows.
Build predictive analytics and automation frameworks to support AI system operations and operational decision-making.
Develop real-time monitoring dashboards and operational analytics tools for AI systems, including capabilities such as:
- anomaly detection
- predictive forecasting
- incident monitoring and alerting
- root cause analysis for system failures or abnormal agent behavior
- Define and monitor AI agent KPIs and performance metrics, including quality evaluation using LLM-as-a-Judge approaches.
Collaborate with AI Engineers, AI Scientists, and Risk teams to define:
- experiment metrics
- evaluation frameworks
- continuous monitoring strategies for AI systems
Work closely with Product teams, customers, and stakeholders to define business-level KPIs for AI agents and measure their impact on business outcomes.
Feed operational insights, monitoring results, and evaluation metrics back into the AI development lifecycle to drive continuous improvement of models and agents.
Manage AI platform operations, including platform upgrades, governance compliance, and SLA monitoring for production AI services.
Design and maintain data pipelines and operational data infrastructure to ensure standardized, clean, and reliable data for analytics, monitoring, and reporting.
Collaborate with DevOps, SRE, and IT teams on tooling, infrastructure, CI/CD pipelines, and deployment processes to ensure reliable AI system delivery.

Home

Career

Career