Skip to content

Conversation

@spetrosi
Copy link
Collaborator

@spetrosi spetrosi commented Apr 1, 2022

TODO:

  • Add idempotency to all templates
  • Solve WRITE_LEASE_VALIDITY issue. MS says that it must work with WRITE_LEASE_VALIDITY=10, while VMs only work with WRITE_LEASE_VALIDITY=60, default is 60.

@spetrosi spetrosi requested a review from richm as a code owner April 1, 2022 09:36
@spetrosi spetrosi changed the title Add support for configuring HA cluster Draft: Add support for configuring HA cluster Apr 1, 2022
permanent: true
runtime: true
include_role:
name: fedora.linux_system_roles.firewall
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm - I guess this is ok for now - for CI you'll need to add a meta/requirements.yml with the fedora.linux_system_roles collection

collections:
  - name: fedora.linux_system_roles

tox -e qemu[-ansible-core-x.xx] will also use this file to install necessary collections

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planing to take a look at the CI and how to implement it a bit later

@spetrosi spetrosi force-pushed the configure_ha branch 6 times, most recently from 32d6264 to 48127f2 Compare May 12, 2022 12:03
@spetrosi
Copy link
Collaborator Author

[citest pending]

1 similar comment
@spetrosi
Copy link
Collaborator Author

[citest pending]

@spetrosi
Copy link
Collaborator Author

@richm rhel-8-y/ansible-2.12 and rhel-8/ansible-2.12 CI tests fail with include_role: name: fedora.linux_system_roles.ha_cluster being incorrect. Although other tests are fine with this? Do you have any idea why this can be?

@spetrosi
Copy link
Collaborator Author

[citest pending]

@spetrosi spetrosi requested review from nhosoi and richm May 17, 2022 16:30
@richm
Copy link
Contributor

richm commented May 17, 2022

[citest bad]

@spetrosi spetrosi force-pushed the configure_ha branch 2 times, most recently from 1a64fb4 to 42780c5 Compare May 25, 2022 10:28
@spetrosi
Copy link
Collaborator Author

I'll also need to add comments to new tasks files to describe why these files are in tasks/.

@richm
Copy link
Contributor

richm commented Jun 10, 2022

@tomjelinek can you please take a look at this PR and review the usage of the ha_cluster role?

tasks/main.yml Outdated
Comment on lines 788 to 789
- name: with-rsc-role
value: Master
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already specified as resource_leader.role. Please, remove these two lines.

tasks/main.yml Outdated
ha_cluster_constraints_colocation:
- resource_leader:
id: ag_cluster-clone
role: master
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
role: master
role: Promoted

Master is deprecated and should not be used. The ha_cluster role will automatically change Promoted to Master if needed.

README.md Outdated
2.4. Configure the user provided with the ``mssql_ha_login` variable for
Pacemaker.
3. Include the System Roles ha_cluster role to configure pacemaker.
Note that this role does not configure STONITH devices in Pacemaker.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been done probably due to the fact, that STONITH depends on customers' environment and thus cannot be defined by the role. However, running a cluster without STONITH is unsupported, as it may lead to data corruption and other issues. I suggest adding a variable, e.g. mssql_ha_stonith_resources as a list defaulting to [], add the list to ha_cluster_resource_primitives, and instruct users to define their STONITH resources.

Not all environments include devices for traditional STONITH, Therefore, SBD variables should be exposed in a similar way to make it possible to configure SBD STONITH in such cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent here was to avoid making the mssql role wrap around the ha_cluster role much. We are going to provide example playbooks that will run the mssql role and then run the ha_cluster role with all requried ha_cluster-related variables.
I thought that setting ha_cluster_resource_primitives is not required because stonith-enabled is set to true by default, and it seemed to work. Could you please provide an example of what mssql_ha_stonith_resources value should look like?
In the mssql role, I guess adding mssql_ha_stonith_resources would look like this?

        ha_cluster_resource_primitives:
          - "{{ mssql_ha_stonith_resources }}"
          - id: ag_cluster
          ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are going to provide example playbooks that will run the mssql role and then run the ha_cluster role with all requried ha_cluster-related variables.

Ok, that would work for me. Just bear in mind this part of the ha_cluster role readme: "The role replaces the configuration of HA Cluster on specified nodes. Any settings not specified in the role variables will be lost."

I thought that setting ha_cluster_resource_primitives is not required because stonith-enabled is set to true by default

That merely enables stonith. You still need to tell the cluster which stonith devices it can use, how it can connect to them, and which nodes they can fence. That is done by creating stonith resources. For further details, see for example:

Could you please provide an example of what mssql_ha_stonith_resources value should look like?

ha_cluster_resource_primitives:
  - id: myapc
    agent: stonith:fence_apc_snmp
    instance_attrs:
      - attrs:
          - name: ipaddr
            value: apc-switch.example.com
          - name: pcmk_host_map
            value: rhel8-node1.example.com:1;rhel8-node2.example.com:2
          - name: login
            value: apc
          - name: passwd
            value: apc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that would work for me. Just bear in mind this part of the ha_cluster role readme: "The role replaces the configuration > of HA Cluster on specified nodes. Any settings not specified in the role variables will be lost."

Oh, so in the case where a user runs the mssql role and then runs the ha_cluster role with ha_cluster_resource_primitives containing one item, the configuration applied by the mssql role would be removed? In this case, the mssql role would need to completely wrap around the ha_cluster role. As well as any other role that would use ha_cluster.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomjelinek, in the case where a user runs the mssql role and then runs the ha_cluster role with ha_cluster_resource_primitives containing one item, the cluster configuration applied previously by the mssql role would be removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible outcome is that we document that you cannot use the mssql in conjunction with the ha_cluster role on the same set of hosts. I think this is ok. If you want to manage mssql and ha_cluster in the same playbook, you'll have to create separate host groups for mssql and ha_cluster, and do not run the ha_cluster role against the mssql host group, and do not run the mssql role against the ha_cluster host group.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomjelinek, in the case where a user runs the mssql role and then runs the ha_cluster role with ha_cluster_resource_primitives containing one item, the cluster configuration applied previously by the mssql role would be removed?

In the case where a user runs the mssql role and then runs the ha_cluster role (with any settings), the cluster configuration applied previously by the mssql role would be removed.

It is unfortunate, but it is the best I can do now. Making the ha_cluster role work in a way where users can specify only partial cluster configuration and the role would not touch other parts of the cluster requires a lot more resources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unfortunate, but it is the best I can do now. Making the ha_cluster role work in a way where users can specify only partial cluster configuration and the role would not touch other parts of the cluster requires a lot more resources.

I think this is fine. We just document that you cannot use the ha_cluster role on mssql nodes, and vice versa.

@spetrosi
Copy link
Collaborator Author

[citest]

README.md Outdated
[`mssql_ha_cluster_print_vars:`](#mssql_ha_cluster_print_vars)`true` to print
planned `ha_cluster` variables. Then you can merge the printed variables with
your custom `ha_cluster` variables and specify the resulting set of variables
with the `microsoft.sql.server` role invocation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case for this? If you are using the ha_cluster role and the mssql role in the same playbook, and you are configuring mssql for ha, and you want to ensure you use the same ha parameters for both ha_cluster and mssql? I guess this is if you first run mssql and configure it for ha, then do a subsequent playbook run with ha_cluster (with or without mssql) and you want to ensure that you do not overwrite the previous ha cluster settings used when deploying mssql?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or vice versa - you deploy ha_cluster role first, then you want to grab those ha_cluster variables to use with the mssql role? If that's the case, then it seems like we need support in the ha_cluster role - have it export the settings in a format that you can use to configure something else to use ha_cluster on those same machines. Other roles have a similar feature - the ability to export the role configuration - linux-system-roles/kernel_settings#58 - linux-system-roles/firewall#83 (these are wip, but I believe some other roles have already implemented this)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we need a functionality to export settings in both the ha_cluster role and in mssql or other roles that include ha_cluster. Seems like here I do the same thing as in linux-system-roles/kernel_settings#58 but with pure ansible.

@spetrosi
Copy link
Collaborator Author

[citest pending]

Add firewall, ha_cluster roles, required vars
Take setup tasks form ha_cluster role, the mssql_ha_replica_type fact
Print sqlcmd output only when it's available
Add HA functionality description to README.md
Add dependency for fedora.linux-system-roles in galaxy.yml
Not print output of gathering facts because output is too long with -v
Wait for mssql-server to prepare for client connections after restart
Only print sqlcmd output when it exists
Add tests for templates, fix bugs in templates
Add error message when running against RHEL < 8
Add mssql_ha_cluster_run_role and other related variables
Add mssql_ha_cluster_run_role
Add mssql_ha_stonith_resources
Add mssql_ha_cluster_print_vars
@spetrosi spetrosi merged commit 294941e into linux-system-roles:master Jul 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants