We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results

Machine Learning Operations Engineer

The Associated Press
life insurance, parental leave, sick time
United States, New York, New York
200 Liberty Street (Show on map)
May 28, 2026

Date: May 28, 2026

Location:
New York, NY, US, 10281

Company:
Associated Press

The Associated Press is an independent global news organization dedicated to factual reporting. Founded in 1846, AP today remains the most trusted source of fast, accurate, unbiasednews in all formats and the essential provider of the technology and services vital to the news business. More than half the world's population sees AP journalism every day.

Why this role matters:
Partnering with Machine LearningEngineers, Data Scientists, and Platform Engineering, theMachine LearningOperationsEngineer owns the production lifecycle ofmachinelearningsystems at AP. This roleis responsible fordeploying,operating, scaling, monitoring, and governingMLworkloads so they run reliably, securely, andcosteffectivelyin production.

TheMachine Learning OperationsEngineer ensures that models and inference pipelines built by ML Engineers can be safely promoted across Dev, QA, and Prod, meet operational SLAs, and evolve without introducing instability or uncontrolled cost.

This is anindividual contributingproduction operations role, focused on runtime behavior, infrastructure, and reliability.It will reportdirectlyto our Director, Application Operations.

What you will do:



  • Design, deploy, andoperateendtoendproduction ML pipelines across Dev, QA, and Prod environments.





  • Set up and manage AWS SageMaker pipelines, endpoints, andmonitoring forlarge scale inference workloads, including embedding generation, named entity recognition, reranking, and video processing.





  • Own GPU and CPU infrastructureselection, scaling, and optimization, including instance benchmarking, autoscaling behavior, and load testing.





  • Deploy,monitor, andoperateinference services that support hundreds of thousands of queries per day across text, image, and video pipelines.





  • Establish standardized ML deployment patterns at AP, including:







    • Containerization and orchestration strategies









    • Environment isolation (Dev / QA / Prod)









    • Versioned promotion, rollback, and recovery mechanisms







  • Implement monitoring, alerting, drift detection, andevaluationmetrics for production ML systems, tracking latency, error rates, throughput, and model/data drift.





  • Enable A/B testing and controlled rollout strategies for ML models in production, in partnership with engineering and product teams.





  • Partner closely with ML Engineers, Data Scientists, DevOps, and Platform teams to:







    • Operationalize new models and pipeline improvements










    • Promote systems across environments safely









    • Ensure deployments meet reliability, scale, and cost targets







  • Manage high-throughput I/O and data movement for large collections of media assets (text, images, video), avoiding CPU, network, and storage bottlenecks.





  • Reduce operational risk by enforcing reproducibility, observability, security, and cost controls across all production ML systems.



Who you are:



  • 5+ years of experiencedeploying and operating ML inference systems in production.





  • Strong experience with AWS SageMaker, including pipelines, endpoints, monitoring, andmultienvironmentdeployments.





  • Expertisedeploying ML models usingPyTorchand TensorFlow from an operational and serving perspective.





  • Proven experience with model deployment and orchestration, including containerized inference and autoscaling.





  • Experience selecting, evaluating, andoptimizingcompute resources (GPU/CPU) forproductionML workloads.





  • Experience setting up monitoring,evaluationmetrics, and A/B testing frameworks for ML systems in production.





  • Ability to collaborate effectively with ML Engineers, Data Scientists, and platform teams in a shared ownership model.



What will set you apart:



  • Operational experience supporting ML systems involving:







    • TransformerbasedNLP models (e.g.,BERTfamilymodels)









    • Computer vision models









    • Ranking and reranking systems







  • Familiarity operating systems that use common ML model types such as:








    • Convolutional andfeedforwardneural networks









    • Ranking algorithms









    • Approximate Nearest Neighbor methods (e.g., HNSW)







  • Experience running ML workloads overlargescaletext, image, and video datasets.



Whyjoin us:



  • A mission-driven, inclusive environment focused on both individual and collective success.





  • Opportunities for professional development to help you reach your career goals.





  • Access to tools, mentorship, and resources tailored to elevate yourproficiencyand contributions.



Salary & Benefits:

The anticipated salary range for this position is $125,000 - $155,000, based on a candidate's skills, qualifications and location. The Associated Press offers comprehensive benefits, which include:



  • Competitive medical, dental and vision coverage





  • Retirement benefits





  • Company paid life insurance





  • Paid vacation and sick days





  • Paid parental leave for any new parent





  • Mentalwell-beingresources





AP seeks to build an inclusive organization grounded in respect for differences. We support all aspects of diversity and provide equal employment opportunities to all employees and applicants without regard to race, color, religion, sex, marital status, national origin, age, sexual orientation, gender identity, disability, status as a veteran, or other characteristic protected by law.






Nearest Major Market: New York City



Job Segment:
Operations Manager, Quality Assurance, Learning, Operations, Technology, Human Resources

Applied = 0

(web-77cf7d65c7-llqmg)