Banner Default Image

Cloud MLOps Consultant

Back to job search

Cloud MLOps Consultant

  • Location:

    Romania

  • Sector:

    ConSol UK Enterprise & Cloud

  • Job type:

    Temporary & Contract

  • Salary:

    Negotiable

  • Contact:

    Yashica Bharvirkar

  • Contact email:

    Yashica.Bharvirkar@consolpartners.com

  • Job ref:

    BBBH229926_1637248184

  • Published:

    20 days ago

  • Duration:

    1 Year + Ext

  • Expiry date:

    2021-12-18

  • Startdate:

    ASAP

  • Client:

    ConSol Partners

We are recruiting on behalf of a leading IT services provider. They are seeking a Cloud MLOps Consultant.

On offer is long term opportunity to work with a large European institution that will offer training and development in addition to a competitive rate.

Job type - Contract

Duration - 1 year + Extensions (long term project)

Location - : Fully remote from anywhere in Europe

Start Date - ASAP


Skills

  • Fluent English communication skills.
  • MLOps SRE/MLRE (Site Reliability Engineer/ML Reliability Engineer) - CNO (Cloud-native Operations) forMLOps, based uponCNO forDevOps, with knowledge about and (hands-on) experience with:
    • Keeping the infrastructure healthy, ensuring the reliability of infrastructure, apps, services, databases
    • Ensuring availability of inference services to products
    • Following up on alerts
    • Solving (operational) infra related issues (including Kubernetes related issues and issues related to the selected frameworks like Pachyderm and Kubeflow)
  • Linux. Standard Linux shell scripts capabilities.
  • GCP/Cloud knowledge
  • Systems and software architecture concepts.


Experience

  • Arranging and configuring the required infrastructure (incl. storage solutions, GPUs, memory)
  • Installing, configuring and upgrading selected frameworks like Pachyderm and Kubeflow
  • Configuring and maintaining monitoring dashboards and alerts
  • Tracking the quality of predictions
  • Setting up automated re-training and redeployment if a quality metric indicates that that's needed
  • Production systems monitoring. Troubleshooting and incidents analysis (first / second level).
  • Continuous Integration / Continuous Delivery platforms (Jenkins)
  • Source code management systems (Git/svn/Bitbucket)