Description
BN31P2- Data Scientist CAS Columbus, OH CAS uses unparalleled scientific content, specialized technology and unmatched human expertise to help R&D organizations across Commercial, Government and Academic sectors create groundbreaking innovations that benefit the world. As the Scientific Information Solutions Division of the American Chemical Society, CAS manages the largest curated reservoir of scientific knowledge, and for 119 years, has helped innovators mine, assess and apply that information to keep businesses thriving. The CAS team is global, diverse, endlessly curious and strives to make actionable scientific insights accessible to innovators worldwide. CAS is currently seeking a Data Scientist. This position will be located in our headquarters in Columbus, Ohio. Job Summary This is an individual contributor role on the Data Analytics and Insights (DAI) team, which builds the AI that powers CAS's scientific information products. As a data scientist, you will build the models, agents, and AI-powered features that ship in those products, working alongside data engineers, product managers, and scientific domain experts. This role requires dual-domain expertise: you must be both a capable software engineer and formally trained in a scientific discipline relevant to CAS's customers (chemistry, life sciences, materials science, or a related field). DAI builds AI that is grounded in CAS's scientific content rather than layered on top of it, so the people who build it need to read the science as well as write the code. You will work on features such as the Newton research assistants in SciFinder and BioFinder, the natural-language query agent in IPFinder, predictive models that ship inside products (for example property, toxicity, or biologic developability prediction), and AI-assisted content curation pipelines, using your scientific training to judge whether their outputs are correct and useful to working scientists. Success in this role requires strong programming fundamentals, formal scientific training, a genuine interest in agentic AI, and a collaborative approach to building reliable, production-grade systems. Candidates who bring only software engineering or only scientific training are not a fit for this position. Job Accountability
- Develop and deploy agent-based workflows, retrieval-augmented generation (RAG) pipelines, and LLM-powered features within CAS products using Python, LangGraph/LangChain, and related tools.
- Develop and deploy machine learning models that ship inside CAS products (for example property, toxicity, or biologic developability prediction), using your scientific training to inform feature design and to validate model behavior.
- Ground agentic and predictive features in CAS's authoritative scientific content, and use your scientific training to judge whether system outputs are accurate, defensible, and useful to researchers.
- Engineer for production, not for demos: deliver clean, tested, well-documented software with appropriate safeguards such as content safety guardrails, entitlement-based access control, evaluation, and model fallback, following team engineering standards (unit, integration, and end-to-end tests, containerized development, and CI/CD practices).
- Work with product managers, scientific domain experts, data engineers, and other teams to translate scientific and research requirements into working technical solutions.
- Contribute to evaluation frameworks for GenAI systems, focusing on accuracy, reliability, scientific correctness, and continuous improvement of agentic features.
- Build and operate solutions on cloud infrastructure (AWS, Azure, GCP, or similar), supporting the production deployment and ongoing operations of AI systems.
- Use agentic software development tools such as Claude Code as a core part of the daily workflow to accelerate delivery and maintain quality.
- Contribute to DAI's innovation and competitive-intelligence work by prototyping emerging AI capabilities and running structured experiments that test where AI can replicate or extend CAS capabilities.
- Stay current with advances in generative AI and agentic architectures, and share knowledge with the team through demos, documentation, and internal learning sessions.
Required Qualifications This role requires expertise in two domains. Both the scientific and the software engineering qualifications below are required, not alternatives. Scientific domain
- Bachelor's degree or higher in a scientific discipline relevant to CAS's customer domains (chemistry, life sciences, materials science, or a closely related field). This scientific training is a gating requirement for the role, not a preference.
- The ability to read scientific content in your field and reason about it accurately enough to evaluate the outputs of AI systems that serve working scientists (for example, recognizing when a predicted property or a generated answer is wrong).
Software engineering and AI domain
- 2+ years of professional experience in software engineering, applied machine learning, or data science, with demonstrated delivery of working systems.
- Strong Python programming skills with solid software engineering fundamentals (version control, testing, code review, documentation).
- Hands-on experience building applications with large language models, including agentic workflows, RAG, prompt engineering, or similar patterns.
- Experience with or strong interest in agentic AI frameworks such as LangGraph, LangChain, or equivalent platforms.
- Proficiency with agentic software development tools (e.g., Claude Code or similar AI-assisted development environments).
- Experience deploying and operating software in at least one cloud environment (AWS, Azure, or GCP).
Both domains
- Strong written and verbal communication skills, with the ability to work effectively in cross-functional teams that include both engineers and scientists.
Desired but Not Required
- Graduate degree (M.S. or Ph.D.) in a scientific discipline relevant to CAS's customer domains, giving deeper command of the science behind the products.
- Coursework, a second degree, or equivalent experience in Computer Science, Engineering, or Statistics that complements the scientific training.
- Experience building and deploying predictive ML models in a scientific or regulated domain.
- Experience with cloud AI/ML services and model hosting platforms.
- Familiarity with containerized development workflows and API design.
- Experience with evaluation frameworks for AI/ML systems, including accuracy, reliability, and cost optimization.
CAS offers a competitive salary and comprehensive benefits package, including a generous vacation plan, medical, dental, vision insurance plans, and employee savings and retirement plans. Candidates for this position must be authorized to work in the United States and not require work authorization sponsorship by our company for this position now or in the future. EEO/Disabled/Veteran
Equal Opportunity Employer/Protected Veterans/Individuals with Disabilities This employer is required to notify all applicants of their rights pursuant to federal employment laws. For further information, please review the Know Your Rights notice from the Department of Labor.
|