name: inverse layout: true class: center, middle --- class: center .brand-image-top[![HealthPartnersBig](/rhug-feb-2019/images/hp-1v-4c.png)] # Managing our OpenShift Infrastructure
## How HealthPartners Builds and Manages OpenShift Clusters --- layout: true name: top-bar .brand-image[![HealthPartners](/rhug-feb-2019/images/hp-1v-4c.png)] .twitter-handle[ `@jmcshane` ] --- # Introduction * James McShane - Cloud Engineering team @ HealthPartners * Started OpenShift back in late 2016 * Production apps in late 2017 * Onboarding teams rapidly to our environment --- # Basic Principles * Don't touch VMs manually * Stay up to date with minimal effort * Validate all changes before production * Prepare for the future * Make building clusters easy ??? * After a couple false starts, we've been able to patch the environment every month * got to 3.11 within a couple days of release --- # Getting Started .center[![OpenshiftAnsible](/rhug-feb-2019/images/openshift-ansible.png)] ??? * RedHat has published much of what it takes to manage an out of the box openshift cluster * Given infrastructure exists and has pre-reqs set up * This is where the rubber hits the road * Version specific code(!) --- # Our Infrastructure Choices * UCS Chassis * RHEV for Virtualization * VMWare for Virtualization (stay tuned) * F5 external load balancer * A few network components to be discussed * NetApp for Storage * Ansible for automation ??? This is just a few of our choices. We built some of these components ourselves, then integrated back into the company to begin migration. --- # Jobs to be Done * Building clusters * Patching clusters * Scaling clusters * Updating openshift software * Updating cluster admin level components * Day 2 operations - observability, disaster recovery ??? These are some of the jobs we must do regularly. We want to make all these jobs easy. --- # Code Structure * Put all of our code in a single repository * Master branch does not contain openshift-ansible * Version specific branches that we clone the openshift-ansible version release branch to * Most of our code is standalone, except for the turnkey stuff ??? The openshift-ansible code is specific to the version of OpenShift it is targeting. This version structure means that we can distinguish between code that requires openshift-ansible components and code that does not depend on them. --- # `openshiftcluster` ![OpenshiftCluster](/rhug-feb-2019/images/openshiftcluster.png) ![OpenshiftCluster311](/rhug-feb-2019/images/openshiftcluster-3.11.png) ??? * You can see the repository structure in the master branch here * vs 3.11 branch which has openshift-ansible code checked out --- # Referencing `openshift-ansible` * Wrap openshift-ansible plays in playbooks with pre/post installation steps ``` ### Pre Installation Stuff ### OpenShift Ansible Stuff - name: Scale up new nodes import_playbook: openshift-ansible/playbooks/openshift-node/scaleup.yml tags: scaleup ### Post Installation Stuff ``` * Hacking `group_vars` ``` find . -path './openshift-ansible/playbooks/*' -type d -depth 3 | xargs -L1 ln -s ../../../group_vars $1/group_vars find . -path './openshift-ansible/playbooks/*' -type d -depth 4 | xargs -L1 ln -s ../../../../group_vars $1/group_vars find . -path './openshift-ansible/playbooks/*' -type d -depth 5 | xargs -L1 ln -s ../../../../../group_vars $1/group_vars find . -path './openshift-ansible/playbooks/*' -type d -depth 6 | xargs -L1 ln -s ../../../../../../group_vars $1/group_vars ``` ??? We use ansible group_vars to distinguish behavior between clusters and share common variables across all playbooks. This little hack allows openshift-ansible to read our group vars. Its also analagous to what the openshift-ansible code itself does. --- # Externalizing Roles * OpenShift => RHEL 7 system administration * Satellite registration * RHEL 7 CIS standard * DNS management with InfoBlox * Prometheus node exporter * Centralize these ansible roles using ansible-galaxy ``` - name: prometheus_node_exporter src: https://gitlab.healthpartners.com/ansible-roles/prometheus_node_exporter.git scm: git ``` ``` # Load External Roles - hosts: localhost tags: always tasks: - name: Import Roles command: ansible-galaxy install --ignore-certs --force -r requirements.yml ... ``` ??? This allows us to build good things for the enterprise in general that we then consume via our own plays. --- # Customizing OpenShift * An out of the box cluster is useful for experimentation, but not production * What things do we add: * LDAP group management * Logging configuration * Connect to our Artifactory docker repository * Cleanup scripts * Trident installation * Security scanning * Router customization * Additional `SecurityContextConstraints` * And much more... ??? * Lots of stuff to do to move from an out of the box cluster to a "useful" cluster in our environment * We are providing our own platform for teams to consume. Should correspond to what the enterprise wants/needs --- # Managing our Customization * Define three types of Ansible roles * OC only - interact with OpenShift master API * Infra only - manage packages and files on the VMs * OpenShift software - interact with `openshift-ansible` * Defined jobs that can run arbitrary roles of each type by name and target --- # Orchestration and Automation * All ansible jobs are run through Ansible Tower * Ansible tower jobs send their job logs to splunk * Orchestrate Ansible Tower through Jenkins * (because Tower workflow run is ... bad) --- # Tower Configuration * Define each cluster as an inventory in tower * Use inventory sync in tower to manage this from source code * Each cluster has an associated SSH private key * SSHKeyRotate (an infra role) puts cluster level public key in `authorized_keys` * Sends private key through tower API * An environment ties a inventory & credential together to configure access to the cluster --- # Tower Configuration - Inventory Source ``` app_only: hosts: rs-dev1-node-abc123.healthpartners.com: vars: openshift_node_group_name: 'node-config-compute' dev1: children: OSEv3: vars: children: masters: etcd: children: masters: new_masters: nodes: children: app_only: infra_only: masters: new_nodes: new_masters: remove_nodes: nodes_only: children: app_only: infra_only: ``` ??? * Source inventory files from gitlab that can get automatically updated (groovy yaml FTW) * Can do validations on this file separately from our ansible code --- # Jenkins Setup - Automating change control with standard changes - See my previous talk, linked at the end - A tower job has a limited number of components, our jobs simplify the selection of these values: - extra vars - code location - playbook to run - credential & inventory - Create "higher level" jobs that are easy to operate --- # Jenkins Setup ![OpenshiftSoftware](/rhug-feb-2019/images/openshift-software-plays.png) ![OpenshiftRoles](/rhug-feb-2019/images/openshift-roles.png) --- # Jenkins Code ``` environments: [ "uat1": [ "credentialId": "21", "inventoryId": "10", "type": 'non-prod' ], "prd1": [ "credentialId": "22", "inventoryId": "19", "type": 'prod' ], ... ] ``` --- # Jenkins Code ``` messageJson.putAll([ name : ("${projectConfig.repo}-${projectConfig.branch}"+ "-${projectConfig.job}-${env.BUILD_NUMBER}").toString(), scm_type : 'git', scm_url : "${repoUrl}:${projectConfig.group}/${projectConfig.repo}.git".toString(), scm_branch : projectConfig.branch, credential : projectConfig.credential, organization : projectConfig.organization ]) def project = towerCall([ body: messageJson.toString(), apiPath: 'projects/', method: 'POST' ]) ``` --- # Jenkins Code towerCall.groovy ``` def call(config) { def request = [ url: "https://rs-tower.healthpartners.com/api/v2/${config.apiPath}", httpMode: config.method, authentication: 'd02124a8-d133-4bbf-a41f-16f310b25232', customHeaders: [[name: 'Content-Type', value: 'application/json']], consoleLogResponseBody: true ] as LinkedHashMap if (config.body) { request << [requestBody: config.body] } if(config.validResponseCodes) { request << [validResponseCodes: config.validResponseCodes] } def httpResponse = httpRequest request if(httpResponse.status != 204) { return new groovy.json.JsonSlurper().parseText(httpResponse.content) } } ``` --- # Patching - Dont change in flight - Scale up, then scale down - Except for master nodes, but working on it - Scaling becomes the same as patching, which is great! --- # Scaling ``` tasks: - name: Set up the new VMs for the cluster import_tasks: plays/ovirt/create-host.yml when: infrastructure_type == "rhev" - name: Set up the new VMs in vmware for the dmz cluster import_tasks: plays/vmware/vmware-create-host.yml when: infrastructure_type == "vmware" ``` --- # Moving from RHEV to VMWare * Built out a whole new RHEV environment that our team owns * Need to make good ownership decisions, use existing services * Began to migrate to existing VMWare infrastructure in an isolated DMZ for internet facing web traffic --- # New Challenges * Our internet facing DMZ does not have DHCP * Assigned an infoblox range that we can assign static IP addresses ``` - name: Add new host to DNS in range infoblox: server: "{ { server.prod } }" username: "{ { infoblox_username }}" password: "{ { infoblox_password }}" action: add_host host: "{ { new_name }}" start_addr: "{ { start_addr }}" end_addr: "{ { end_addr }}" delegate_to: localhost ``` ``` vmware_guest: networks: ip: "{ { lookup('dig', inventory_hostname) }}" ``` --- # Executing a Change * Commit ansible changes to a branch * Execute role against an engineering cluster from branch * Create pull request linked to successful job * Merge pull request * Run all lower environment clusters from master or version specific branch * Validate functionality * Run all production clusters * Change request automatically tied to Jenkins job log --- # Takeaways * Separated the concerns of infrastructure automation, infrastructure access, and orchestration * Automation through Ansible, extending the RedHat openshift-ansible repository * Infrastructure access & centralization through Ansible Tower * OpenShift may be a product, but what you build on top of it becomes its own product * Use this to your advantage, and don't be suprised by it * Come work with us: https://healthpartners.com/careers --- # References * https://github.com/openshift/openshift-ansible * https://github.com/HealthPartners/rhug-sept-2018 * https://github.com/jmcshane/openshift-commons-dec-2018 * https://plugins.jenkins.io/ansible-tower * https://docs.ansible.com/ansible-tower/latest/html/towerapi/index.html