MLOps Engineer (Edge AI Specialist)

Casablanca, Casablanca-Settat, Morocco

Introduction to the position

Join Haldorix, a startup studio turning industrial challenges into scalable ventures powered by AI.We’re building NITRA, an intelligent industrial vision system revolutionizing real-time monitoring in textile production. As we scale to more sites, we’re transitioning from cloud-based processing to Edge AI, optimizing performance, latency, and cost.We’re looking for an MLOps Engineer – Edge AI Specialist to lead the design and deployment of our on-premise inference infrastructure. You’ll play a pivotal role in enabling our deep learning and generative models (YOLOv8, Stable Diffusion, BERT) to run efficiently on embedded GPU hardware — delivering reliable, low-latency insights directly on the factory floor.

Your role

Architecture & Infrastructure:Design and deploy a hybrid edge/cloud architecture optimized for real-time video analytics. Define hardware specs (Jetson Orin, RTX A2000, Intel NUC) and ensure reliable communication between edge servers and the cloud.Model Optimization:Convert and optimize deep learning models for embedded GPUs using ONNX Runtime and TensorRT. Apply quantization (INT8, FP16) and pruning techniques to reduce latency and memory footprint.MLOps Pipeline:Build and maintain a CI/CD pipeline tailored for edge deployment — containerized models, version control, automated OTA updates, and proactive performance monitoring.Orchestration & Deployment:Deploy and manage fleets of edge servers using K3s/MicroK8s. Implement declarative deployments (ArgoCD/Flux) and centralized management via KubeEdge or AWS IoT Greengrass.Security & Compliance:Enforce full data locality, end-to-end encryption (TLS/mTLS), and anonymization pipelines to ensure GDPR compliance.Monitoring & Reliability:Set up comprehensive dashboards (Prometheus, Grafana, Loki) to track inference performance, GPU utilization, and uptime (>99%).LLM Integration:Support deployment of a centralized LLM server (Claude, GPT-4, or open-source) powering RAG-based analytics and real-time conversational interfaces for clients.Field Operations:Conduct on-site installations, validations, and troubleshooting sessions with client teams. Train local technicians and maintain up-to-date documentation for reproducibility and scalability.

Your team

You’ll join a multidisciplinary engineering team focused on bringing real-time AI to industrial environments. Collaborating closely with computer vision, backend, and infrastructure engineers, you’ll report to the Technical Lead overseeing deployment strategy.Our culture values autonomy, precision, and hands-on problem solving. Every team member contributes to the full lifecycle - from architecture to on-site deployment.

Your qualifications

Required:- 3–5 years of experience deploying AI models in production- Strong expertise in MLOps, edge computing, and embedded GPU environments- Proven track record with TensorRT, ONNX Runtime, quantization (INT8/FP16), and model pruning- Proficiency in Python (PyTorch, TensorFlow, FastAPI) and DevOps tools (Docker, CI/CD, Ansible)- Solid understanding of Kubernetes/K3s, networking, and Linux administration- Experience with Prometheus, Grafana, and GPU performance profiling- Excellent documentation and troubleshooting skillsNice to Have:- Familiarity with NVIDIA Jetson and other embedded AI hardware- Experience with Fleet Management Systems (AWS IoT Greengrass, KubeEdge, Balena)- Knowledge of Stable Diffusion and LLM pipelines (RAG, Pinecone, Weaviate, ChromaDB)- Background in industrial computer vision, IoT, or real-time systems- Understanding of GDPR compliance and data anonymization for on-prem AI systems

Benefits

- Join a startup studio scaling high-impact AI ventures from prototype to production- Work on cutting-edge Edge AI systems deployed across industrial sites- Collaborate with an agile, expert team blending AI, hardware, and DevOps engineering- Gain hands-on experience with inference optimization, GPU benchmarking, and large-scale orchestration- Be part of a project delivering tangible cost and performance breakthroughs in manufacturing AI

Recruitment process

Jobzyn AI interview (25–45 min)

Technical interview (1h) with the Lead Developer or Technical Architect

Practical test (2–3h) simulating a real-world MLOps deployment case

Final interview with the NITRA team and Haldorix partners

MLOps Engineer (Edge AI Specialist)