Resource Sets and Racks
Resource Sets
The operator allows you to create multiple sets of Pulsar proxies/broker/bookies. Each set is a dedicated deployment/statefulset with its own service and configmap. When multiple sets are specified, an umbrella service is created as the main entrypoint of the cluster. Other than that, a dedicated service is created for each set. You can customize the service singularly. For example, it’s straightforward to have different dns domains for each set.
Having different endpoints for the cluster allows new deployment strategies, such as canary deployments.
Racks
A rack defines a fault domain. A resource set can be mapped to a rack. When a resource set is mapped to a rack, all their replicas will be placed in the same failure domain.
Available failure domains are zone
, a region’s availability zone and host
, a cluster node.
In order to guarantee high availability over different availability zones, it’s required to create multiple sets in different racks.
One of the benefits of using racks is that you can know in advance if a proxy and a broker are in the same zone.
To use a rack you must assign it to a resource set:
spec:
global:
racks:
rack1: {}
rack2: {}
rack3: {}
resourceSets:
shared-az1:
rack: rack1
shared-az2:
rack: rack2
shared-az3:
rack: rack3
Proxy Sets
With proxy sets it’s straightforward to have dedicated proxy sets per extensions. Pulsar is able to communicate with different applications clients, such as Apache Kafka and RabbitMQ, through proxy extensions. It’s therefore possible to have a dedicated proxy to only accept specific protocol commands.
spec:
global:
resourceSets:
shared: {}
kafka: {}
proxy:
sets:
shared:
replicas: 5
service:
annotations:
external-dns.alpha.kubernetes.io/hostname: proxy.pulsar.local
kafka:
replicas: 3
config:
<config to enable kafka proxy extension>:
service:
annotations:
external-dns.alpha.kubernetes.io/hostname: kafka.proxy.pulsar.local
Bookkeeper Sets
Thanks to the racks, the operator is able to set the data placement policy automatically. Leveraging the rack-awareness concept of Pulsar and BookKeeper clients, every entry will be stored as much as possible in different failure domains.
The auto-configuration of rack-awareness is enabled by default. It’s configurable in the bookkeeper configuration section:
bookkeeper:
autoRackConfig:
enabled: true
periodMs: 60000
Note that these features require bookkeeperClientRegionawarePolicyEnabled=true
in the broker.
The operator will automatically add this configuration property in the broker and autorecovery.
If you wish to disable the region aware policy, you need to explicitly set bookkeeperClientRegionawarePolicyEnabled=false
in the broker and autorecovery.
Pod placement affinity and affinity
For a single resource set, it’s possible to specify the antiAffinity. There are two levels of affinity, zone and host. The first one will set the failure domain to the region’s availability zone. The latter one will set the failure domain to the node.
It’s possible to configure if the requirements must be satisfied or it should be only if possible.
This mechanism leverages the K8s requiredDuringSchedulingIgnoredDuringExecution
and preferredDuringSchedulingIgnoredDuringExecution
properties.
The default is:
global:
antiAffinity:
host:
enabled: true
required: true
zone:
enabled: false
required: false
This means each replica of any deployment/statefulset will be forced to be placed on different nodes. There’s no requests for placing the pods in different availability zones, therefore each pod could be in the same node. In order to achieve multi-zone availability, it’s required to set:
global:
antiAffinity:
host:
enabled: true
required: true
zone:
enabled: true
required: false
In this way each pod will be placed to a different zone, if possible. If you want to enforce it, you have to set:
global:
antiAffinity:
host:
enabled: true
required: true
zone:
enabled: true
required: true
Note that if an availability zone without any pods of that kind is not available during the upgrades, the pod won’t be scheduled and the upgrade will be blocked until a pod is manually deleted and the zone is then freed.
Resource sets pods placement affinity and affinity
A rack defines a fault domain. A resource set can be mapped to a rack. When a resource set is mapped to a rack, all their replicas will be placed in the same failure domain. There are two levels of affinity, zone and host. The first one will set the failure domain to the region’s availability zone. The latter one will set the failure domain to the node.
When a rack is specified, the default configuration is:
global:
racks:
rack1:
host:
enabled: false
requireRackAffinity: false
requireRackAntiAffinity: true
zone:
enabled: false
requireRackAffinity: false
requireRackAntiAffinity: true
enableHostAntiAffinity: true
requireRackHostAntiAffinity: true
The default configuration won’t enable any placement policy. If you want to place all the pods in the same node, you have to set
global:
racks:
rack1:
host:
enabled: true
With requireRackAffinity=false
, each pods of the same rack will be placed where a new pod of the same rack exists (if any exists), if possible.
Set requireRackAffinity=true
to enforce it. Note that if the target node is full (can’t accept new pod with those requirements), the pod will wait until the node is able to accept new pods.
With requireRackAntiAffinity=false
, each pods of the same rack will be placed in a node where any other pod of any other racks is already scheduled, if possible.
With requireRackAntiAffinity=true
, this behavior is enforced. Note that if no node is free, the pod will wait until a new node is added.
If you want to place all the pods in the same zone, you have to set:
global:
racks:
rack1:
zone:
enabled: true
With enableHostAntiAffinity=true
, other than placing pods in different availability zones, a different node will be chosen. These requirements can be disabled (enableHostAntiAffinity=false
), enforced (requireRackHostAntiAffinity: true
) or done in best-effort (requireRackHostAntiAffinity: false
)