---
title: Upgrades
weight: 3800
indent: true
---

# Rook-Ceph Upgrades

This guide will walk you through the steps to upgrade the software in a Rook-Ceph cluster from one
version to the next. This includes both the Rook-Ceph operator software itself as well as the Ceph
cluster software.

Upgrades for both the operator and for Ceph are nearly entirely
automated save for where Rook's permissions need to be explicitly updated by an admin or when
incompatibilities need to be addressed manually due to customizations.

We welcome feedback and opening issues!

## Supported Versions

This guide is for upgrading from **Rook v1.3.x to Rook v1.4.x**.

Please refer to the upgrade guides from previous releases for supported upgrade paths.
Rook upgrades are only supported between official releases. Upgrades to and from `master` are not
supported.

For a guide to upgrade previous versions of Rook, please refer to the version of documentation for
those releases.

* [Upgrade 1.2 to 1.3](https://rook.io/docs/rook/v1.3/ceph-upgrade.html)
* [Upgrade 1.1 to 1.2](https://rook.io/docs/rook/v1.2/ceph-upgrade.html)
* [Upgrade 1.0 to 1.1](https://rook.io/docs/rook/v1.1/ceph-upgrade.html)
* [Upgrade 0.9 to 1.0](https://rook.io/docs/rook/v1.0/ceph-upgrade.html)
* [Upgrade 0.8 to 0.9](https://rook.io/docs/rook/v0.9/ceph-upgrade.html)
* [Upgrade 0.7 to 0.8](https://rook.io/docs/rook/v0.8/upgrade.html)
* [Upgrade 0.6 to 0.7](https://rook.io/docs/rook/v0.7/upgrade.html)
* [Upgrade 0.5 to 0.6](https://rook.io/docs/rook/v0.6/upgrade.html)

## Considerations

With this upgrade guide, there are a few notes to consider:

* **WARNING**: Upgrading a Rook cluster is not without risk. There may be unexpected issues or
  obstacles that damage the integrity and health of your storage cluster, including data loss.
* The Rook cluster's storage may be unavailable for short periods during the upgrade process for
  both Rook operator updates and for Ceph version updates.
* We recommend that you read this document in full before you undertake a Rook cluster upgrade.

## Before you Upgrade

Rook v1.4 has a breaking change that should be considered before upgrading.

1. Make sure the `lvm2` package is installed on the host where OSDs are running. If not, the prepare job will fail
   and the upgrade of OSDs will be blocked.
   * This only applies to non-PVC OSDs. Beginning in v1.3, OSDs created on PVCs no longer rely on LVM.
2. CSI Snapshots: See the next section if you are currently using Ceph-CSI snapshots.


### CSI Snapshots

CSI snapshots have moved from Alpha to Beta and are **not** backward compatible. The snapshots
created with the Alpha version must be deleted before the upgrade.

If you desire to continue using Ceph-CSI 2.x with Alpha snapshots, then:

* Skip this section
* Set the CSI version to 2.x as described below in the [CSI Version](#csi-version) section.

To continue with the Ceph-CSI v3.0 driver that is recommended with v1.4:

1. List all the volumesnapshots created

```bash
kubectl get volumesnapshot
NAME               AGE
rbd-pvc-snapshot   22s
```

2. Delete all volumesnapshots

```bash
kubectl delete volumesnapshot rbd-pvc-snapshot
volumesnapshot.snapshot.storage.k8s.io "rbd-pvc-snapshot" deleted
```

3. List all volumesnapshotclasses created

```bash
kubectl get volumesnapshotclass
NAME                      AGE
csi-rbdplugin-snapclass   86s
```

4. Delete all volumesnapshotclasses

```bash
kubectl delete volumesnapshotclass csi-rbdplugin-snapclass
volumesnapshotclass.snapshot.storage.k8s.io "csi-rbdplugin-snapclass" deleted
```

*Note:* The underlying snapshots on the storage system will be deleted by ceph-csi

#### Delete the Alpha CRDs

As we are updating the snapshot resources from `Alpha` to `Beta` we need to
delete the old `alphav1` snapshot CRD created by external-snapshotter sidecar container

Check if we have any `v1alpha1` CRD created in our kubernetes cluster

```console
kubectl get crd volumesnapshotclasses.snapshot.storage.k8s.io -o yaml |grep v1alpha1
  - name: v1alpha1
  - v1alpha1
kubectl get crd volumesnapshotcontents.snapshot.storage.k8s.io -o yaml |grep v1alpha1
  - name: v1alpha1
  - v1alpha1
kubectl get crd volumesnapshots.snapshot.storage.k8s.io -o yaml |grep v1alpha1
  - name: v1alpha1
  - v1alpha1
```

As we have `v1alpha1` CRD created in our kubernetes cluster, we need to delete
the Alpha CRD

```console
kubectl delete crd volumesnapshotclasses.snapshot.storage.k8s.io volumesnapshotcontents.snapshot.storage.k8s.io volumesnapshots.snapshot.storage.k8s.io

customresourcedefinition.apiextensions.k8s.io "volumesnapshotclasses.snapshot.storage.k8s.io" deleted
customresourcedefinition.apiextensions.k8s.io "volumesnapshotcontents.snapshot.storage.k8s.io" deleted
customresourcedefinition.apiextensions.k8s.io "volumesnapshots.snapshot.storage.k8s.io" deleted
```

Finally, if you desire to use the beta snapshots, check that the [prerequisites](ceph-csi-snapshot.md#prerequisites) are met.


## Upgrading the Rook-Ceph Operator

## Patch Release Upgrades

Unless otherwise noted due to extenuating requirements, upgrades from one patch release of Rook to
another are as simple as updating the image of the Rook operator. For example, when Rook v1.4.2 is
released, the process of updating from v1.4.2 is as simple as running the following:

```console
kubectl -n rook-ceph set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.4.2
```

## Helm Upgrades

If you have installed Rook via the Helm chart, Helm will handle some details of the upgrade for you.
In particular, Helm will handle updating the RBAC and trigger the operator update and restart.


## Upgrading from v1.3 to v1.4

**Rook releases from master are expressly unsupported.** It is strongly recommended that you use
[official releases](https://github.com/rook/rook/releases) of Rook. Unreleased versions from the
master branch are subject to changes and incompatibilities that will not be supported in the
official releases. Builds from the master branch can have functionality changed and even removed at
any time without compatibility support and without prior notice.

## Prerequisites

We will do all our work in the Ceph example manifests directory.

```sh
cd $YOUR_ROOK_REPO/cluster/examples/kubernetes/ceph/
```

Unless your Rook cluster was created with customized namespaces, namespaces for Rook clusters
created before v0.8 are likely to be:

* Clusters created by v0.7 or earlier: `rook-system` and `rook`
* Clusters created in v0.8 or v0.9: `rook-ceph-system` and `rook-ceph`
* Clusters created in v1.0 or newer: only `rook-ceph`

With this guide, we do our best not to assume the namespaces in your cluster. To make things as easy
as possible, modify and use the below snippet to configure your environment. We will use these
environment variables throughout this document.

```sh
# Parameterize the environment
export ROOK_SYSTEM_NAMESPACE="rook-ceph"
export ROOK_NAMESPACE="rook-ceph"
```

In order to successfully upgrade a Rook cluster, the following prerequisites must be met:

* The cluster should be in a healthy state with full functionality. Review the
  [health verification section](#health-verification) in order to verify your cluster is in a good
  starting state.
* All pods consuming Rook storage should be created, running, and in a steady state. No Rook
  persistent volumes should be in the act of being created or deleted.

### Helm

* Your Helm version should be newer than v3.2.0 to avoid [this issue](https://github.com/helm/helm/issues/7697).
* For a Rook cluster already deployed with Helm older than v3.2.0, also execute the following commands.

```sh
KIND=ClusterRole
NAME=psp:rook
RELEASE=your-apps-release-name
NAMESPACE=your-apps-namespace
kubectl annotate $KIND $NAME meta.helm.sh/release-name=$RELEASE
kubectl annotate $KIND $NAME meta.helm.sh/release-namespace=$NAMESPACE
kubectl label $KIND $NAME app.kubernetes.io/managed-by=Helm
```

## Health Verification

Before we begin the upgrade process, let's first review some ways that you can verify the health of
your cluster, ensuring that the upgrade is going smoothly after each step. Most of the health
verification checks for your cluster during the upgrade process can be performed with the Rook
toolbox. For more information about how to run the toolbox, please visit the
[Rook toolbox readme](./ceph-toolbox.md).

See the common issues pages for troubleshooting and correcting health issues:

* [General troubleshooting](./common-issues.md)
* [Ceph troubleshooting](./ceph-common-issues.md)

### Pods all Running

In a healthy Rook cluster, the operator, the agents and all Rook namespace pods should be in the
`Running` state and have few, if any, pod restarts. To verify this, run the following commands:

```sh
kubectl -n $ROOK_NAMESPACE get pods
```

### Status Output

The Rook toolbox contains the Ceph tools that can give you status details of the cluster with the
`ceph status` command. Let's look at an output sample and review some of the details:

```sh
TOOLS_POD=$(kubectl -n $ROOK_NAMESPACE get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}')
kubectl -n $ROOK_NAMESPACE exec -it $TOOLS_POD -- ceph status
```

```console
  cluster:
    id:     a3f4d647-9538-4aff-9fd1-b845873c3fe9
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum b,c,a
    mgr: a(active)
    mds: myfs-1/1/1 up  {0=myfs-a=up:active}, 1 up:standby-replay
    osd: 6 osds: 6 up, 6 in
    rgw: 1 daemon active

  data:
    pools:   9 pools, 900 pgs
    objects: 67  objects, 11 KiB
    usage:   6.1 GiB used, 54 GiB / 60 GiB avail
    pgs:     900 active+clean

  io:
    client:   7.4 KiB/s rd, 681 B/s wr, 11 op/s rd, 4 op/s wr
    recovery: 164 B/s, 1 objects/s
```

In the output above, note the following indications that the cluster is in a healthy state:

* Cluster health: The overall cluster status is `HEALTH_OK` and there are no warning or error status
  messages displayed.
* Monitors (mon):  All of the monitors are included in the `quorum` list.
* Manager (mgr): The Ceph manager is in the `active` state.
* OSDs (osd): All OSDs are `up` and `in`.
* Placement groups (pgs): All PGs are in the `active+clean` state.
* (If applicable) Ceph filesystem metadata server (mds): all MDSes are `active` for all filesystems
* (If applicable) Ceph object store RADOS gateways (rgw): all daemons are `active`

If your `ceph status` output has deviations from the general good health described above, there may
be an issue that needs to be investigated further. There are other commands you may run for more
details on the health of the system, such as `ceph osd status`. See the
[Ceph troubleshooting docs](https://docs.ceph.com/docs/master/rados/troubleshooting/) for help.

Rook will prevent the upgrade of the Ceph daemons if the health is in a `HEALTH_ERR` state.
If you desired to proceed with the upgrade anyway, you will need to set either
`skipUpgradeChecks: true` or `continueUpgradeAfterChecksEvenIfNotHealthy: true`
as described in the [cluster CR settings](https://rook.github.io/docs/rook/v1.4/ceph-cluster-crd.html#cluster-settings).

### Container Versions

The container version running in a specific pod in the Rook cluster can be verified in its pod spec
output. For example for the monitor pod `mon-b`, we can verify the container version it is running
with the below commands:

```sh
POD_NAME=$(kubectl -n $ROOK_NAMESPACE get pod -o custom-columns=name:.metadata.name --no-headers | grep rook-ceph-mon-a)
kubectl -n $ROOK_NAMESPACE get pod ${POD_NAME} -o jsonpath='{.spec.containers[0].image}'
```

The status and container versions for all Rook pods can be collected all at once with the following
commands:

```sh
kubectl -n $ROOK_SYSTEM_NAMESPACE get pod -o jsonpath='{range .items[*]}{.metadata.name}{"\n\t"}{.status.phase}{"\t\t"}{.spec.containers[0].image}{"\t"}{.spec.initContainers[0]}{"\n"}{end}' && \
kubectl -n $ROOK_NAMESPACE get pod -o jsonpath='{range .items[*]}{.metadata.name}{"\n\t"}{.status.phase}{"\t\t"}{.spec.containers[0].image}{"\t"}{.spec.initContainers[0].image}{"\n"}{end}'
```

The `rook-version` label exists on Ceph controller resources. For various resource controllers, a
summary of the resource controllers can be gained with the commands below. These will report the
requested, updated, and currently available replicas for various Rook-Ceph resources in addition to
the version of Rook for resources managed by the updated Rook-Ceph operator. Note that the operator
and toolbox deployments do not have a `rook-version` label set.

```sh
kubectl -n $ROOK_NAMESPACE get deployments -o jsonpath='{range .items[*]}{.metadata.name}{"  \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{"  \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'

kubectl -n $ROOK_NAMESPACE get jobs -o jsonpath='{range .items[*]}{.metadata.name}{"  \tsucceeded: "}{.status.succeeded}{"      \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'
```

### Rook Volume Health

Any pod that is using a Rook volume should also remain healthy:

* The pod should be in the `Running` state with few, if any, restarts
* There should be no errors in its logs
* The pod should still be able to read and write to the attached Rook volume.

## Rook Operator Upgrade Process

In the examples given in this guide, we will be upgrading a live Rook cluster running `v1.3.9` to
the version `v1.4.2`. This upgrade should work from any official patch release of Rook v1.3 to any
official patch release of v1.4.

**Rook release from `master` are expressly unsupported.** It is strongly recommended that you use
[official releases](https://github.com/rook/rook/releases) of Rook. Unreleased versions from the
master branch are subject to changes and incompatibilities that will not be supported in the
official releases. Builds from the master branch can have functionality changed or removed at any
time without compatibility support and without prior notice.

Let's get started!

## 1. Remove Alpha Snapshots

As described above in the [CSI Snapshots](#csi-snapshots) section, if you are using alpha snapshots
it is required to remove them before the upgrade since they are not compatible with beta snaphots
available with Ceph-CSI v3.0.

## 2. Update the RBAC and CRDs

> Automatically updated if you are upgrading via the helm chart

First apply new resources. This includes modified privileges (RBAC) needed by the Operator
and updates to the Custom Resource Definitions (CRDs).

If you are not using the default `rook-ceph` namespace, replace the namespace in the following manifest:
```sh
sed -i "s/namespace: rook-ceph/namespace: $ROOK_SYSTEM_NAMESPACE/g" upgrade-from-v1.3-apply.yaml
```

Now apply the updated privileges:

```sh
kubectl delete -f upgrade-from-v1.3-delete.yaml
kubectl apply -f upgrade-from-v1.3-apply.yaml -f upgrade-from-v1.3-crds.yaml
```

## 3. Update Ceph CSI version to v3.0

> Rook v1.4 will install Ceph-CSI v3.0 use the latest drivers by default.
> If you have not specified custom CSI images in the Operator deployment this step is unnecessary.

If you have specified custom CSI images in the Rook-Ceph Operator deployment, it is recommended to update to use the latest Ceph-CSI v3.0 driver.
See the section [CSI Version](#csi-version) for more details.

## 4. Update the Rook Operator

> Automatically updated if you are upgrading via the helm chart

The largest portion of the upgrade is triggered when the operator's image is updated to `v1.4.x`.
When the operator is updated, it will proceed to update all of the Ceph daemons.

```sh
kubectl -n $ROOK_SYSTEM_NAMESPACE set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.4.2
```

## 5. Wait for the upgrade to complete

Watch now in amazement as the Ceph mons, mgrs, OSDs, rbd-mirrors, MDSes and RGWs are terminated and
replaced with updated versions in sequence. The cluster may be offline very briefly as mons update,
and the Ceph Filesystem may fall offline a few times while the MDSes are upgrading. This is normal.

The versions of the components can be viewed as they are updated:

```sh
watch --exec kubectl -n $ROOK_NAMESPACE get deployments -l rook_cluster=$ROOK_NAMESPACE -o jsonpath='{range .items[*]}{.metadata.name}{"  \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{"  \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'
```

As an example, this cluster is midway through updating the OSDs from v1.3 to v1.4. When all
deployments report `1/1/1` availability and `rook-version=v1.4.2`, the Ceph cluster's core
components are fully updated.

```console
Every 2.0s: kubectl -n rook-ceph get deployment -o j...

rook-ceph-mgr-a         req/upd/avl: 1/1/1      rook-version=v1.4.2
rook-ceph-mon-a         req/upd/avl: 1/1/1      rook-version=v1.4.2
rook-ceph-mon-b         req/upd/avl: 1/1/1      rook-version=v1.4.2
rook-ceph-mon-c         req/upd/avl: 1/1/1      rook-version=v1.4.2
rook-ceph-osd-0         req/upd/avl: 1//        rook-version=v1.4.2
rook-ceph-osd-1         req/upd/avl: 1/1/1      rook-version=v1.3.9
rook-ceph-osd-2         req/upd/avl: 1/1/1      rook-version=v1.3.9
```

An easy check to see if the upgrade is totally
finished is to check that there is only one `rook-version` reported across the cluster.

```console
# kubectl -n $ROOK_NAMESPACE get deployment -l rook_cluster=$ROOK_NAMESPACE -o jsonpath='{range .items[*]}{"rook-version="}{.metadata.labels.rook-version}{"\n"}{end}' | sort | uniq
This cluster is not yet finished:
  rook-version=v1.3.9
  rook-version=v1.4.2
This cluster is finished:
  rook-version=v1.4.2
```

## 6. Verify the updated cluster

At this point, your Rook operator should be running version `rook/ceph:v1.4.2`.

Verify the Ceph cluster's health using the [health verification section](#health-verification).


## Ceph Version Upgrades

Rook v1.4 supports Ceph Nautilus 14.2.5 or newer and Ceph Octopus v15.2.0 or newer. These are the
only supported major versions of Ceph.

> **IMPORTANT: When an update is requested, the operator will check Ceph's status, if it is in `HEALTH_ERR` it will refuse to do the upgrade.**

Rook is cautious when performing upgrades. When an upgrade is requested (the Ceph image has been
updated in the CR), Rook will go through all the daemons one by one and will individually perform
checks on them. It will make sure a particular daemon can be stopped before performing the upgrade.
Once the deployment has been updated, it checks if this is ok to continue. After each daemon is
updated we wait for things to settle (monitors to be in a quorum, PGs to be clean for OSDs, up for
MDSs, etc.), then only when the condition is met we move to the next daemon. We repeat this process
until all the daemons have been updated.

### Ceph images

Official Ceph container images can be found on [Docker Hub](https://hub.docker.com/r/ceph/ceph/tags/).
These images are tagged in a few ways:

* The most explicit form of tags are full-ceph-version-and-build tags (e.g., `v15.2.4-20200630`).
  These tags are recommended for production clusters, as there is no possibility for the cluster to
  be heterogeneous with respect to the version of Ceph running in containers.
* Ceph major version tags (e.g., `v15`) are useful for development and test clusters so that the
  latest version of Ceph is always available.

**Ceph containers other than the official images from the registry above will not be supported.**

### Example upgrade to Ceph Octopus

#### 1. Update the main Ceph daemons

The majority of the upgrade will be handled by the Rook operator. Begin the upgrade by changing the
Ceph image field in the cluster CRD (`spec.cephVersion.image`).

```sh
NEW_CEPH_IMAGE='ceph/ceph:v15.2.4-20200630'
CLUSTER_NAME="$ROOK_NAMESPACE"  # change if your cluster name is not the Rook namespace
kubectl -n $ROOK_NAMESPACE patch CephCluster $CLUSTER_NAME --type=merge -p "{\"spec\": {\"cephVersion\": {\"image\": \"$NEW_CEPH_IMAGE\"}}}"
```

#### 2. Wait for the daemon pod updates to complete

As with upgrading Rook, you must now wait for the upgrade to complete. Status can be determined in a
similar way to the Rook upgrade as well.

```sh
watch --exec kubectl -n $ROOK_NAMESPACE get deployments -l rook_cluster=$ROOK_NAMESPACE -o jsonpath='{range .items[*]}{.metadata.name}{"  \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{"  \tceph-version="}{.metadata.labels.ceph-version}{"\n"}{end}'
```

Determining when the Ceph has fully updated is rather simple.

```console
# kubectl -n $ROOK_NAMESPACE get deployment -l rook_cluster=$ROOK_NAMESPACE -o jsonpath='{range .items[*]}{"ceph-version="}{.metadata.labels.ceph-version}{"\n"}{end}' | sort | uniq
This cluster is not yet finished:
    ceph-version=14.2.7-0
    ceph-version=15.2.4-0
This cluster is finished:
    ceph-version=15.2.4-0
```

#### 3. Verify the updated cluster

Verify the Ceph cluster's health using the [health verification section](#health-verification).

## CSI Version

If you have a cluster running with CSI drivers enabled and you want to configure Rook
to use non-default CSI images, the following settings will need to be applied for the desired
version of CSI.

The operator configuration variables have recently moved from the operator deployment to the
`rook-ceph-operator-config` ConfigMap. The values in the operator deployment can still be set,
but if the ConfigMap settings are applied, they will override the operator deployment settings.

> If the cluster was originally installed prior to v1.3, the configmap may not exist.
> See the latest operator.yaml for example configmap settings.

```console
kubectl -n $ROOK_NAMESPACE edit configmap rook-ceph-operator-config
```

The default upstream images are included below, which you can change to your desired images.

```yaml
ROOK_CSI_CEPH_IMAGE: "quay.io/cephcsi/cephcsi:v3.1.0"
ROOK_CSI_REGISTRAR_IMAGE: "k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.0.1"
ROOK_CSI_PROVISIONER_IMAGE: "k8s.gcr.io/sig-storage/csi-provisioner:v2.0.0"
ROOK_CSI_SNAPSHOTTER_IMAGE: "k8s.gcr.io/sig-storage/csi-snapshotter:v3.0.0"
ROOK_CSI_ATTACHER_IMAGE: "k8s.gcr.io/sig-storage/csi-attacher:v3.0.0"
ROOK_CSI_RESIZER_IMAGE: "k8s.gcr.io/sig-storage/csi-resizer:v1.0.0"
```

### Use default images

If you would like Rook to use the inbuilt default upstream images, then you may simply remove all
variables matching `ROOK_CSI_*_IMAGE` from the above ConfigMap and/or the operator deployment.

### Verifying updates

You can use the below command to see the CSI images currently being used in the cluster.

```console
# kubectl --namespace rook-ceph get pod -o jsonpath='{range .items[*]}{range .spec.containers[*]}{.image}{"\n"}' -l 'app in (csi-rbdplugin,csi-rbdplugin-provisioner,csi-cephfsplugin,csi-cephfsplugin-provisioner)' | sort | uniq
quay.io/cephcsi/cephcsi:v3.1.0
k8s.gcr.io/sig-storage/csi-attacher:v3.0.0
k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.0.1
k8s.gcr.io/sig-storage/csi-provisioner:v2.0.0
k8s.gcr.io/sig-storage/k8scsi/csi-resizer:v1.0.0
k8s.gcr.io/sig-storage/csi-snapshotter:v3.0.0
k8s.gcr.io/sig-storage/csi-resizer:v1.0.0
```
