新着情報

OSS

Pacemaker 1.1にデータが破損する問題

2014年3月28日

https://blog.clusterlabs.org/によると、Pacemakerのバージョン1.1.6から1.1.9に、データが破損する問題が見つかりました。

すべてのユーザは1.1.10またそれ以降のバージョンにアップグレードすることを強くお勧めします。

tengine_stonith_notify()関数にいくつかの間違ったロジックが存在し、それが原因で問題が発生します。フェンスノードの追加処理に誤りがあり、追加処理が行われた後でノードのステータスが削除されてしまう可能性があります。

ステータスが削除されることで、クラスタはノードが安全にダウンしたと見なし、サービスが正常に起動しているにも関わらず別のノードでサービスを起動しようとします。

これらの問題が発生するには、フェンスノードが下記の状態になっている必要があります。

1. 以前DCとして稼動している
2. 自身のフェンスリクエストを送信できる状態になっている
3. 次のDCが選ばれた後にフェンスの通知が届いているが、それよりも前にポリシーエンジンが呼び出されている

この問題が最初に報告されたのは2011年8月で、上記の状態になっても一見したところ通常の状態のように感じられ、見極めが困難でした。

この問題の兆候としてログに下記の出力が出ます。

# grep -e do_state_transition -e reboot -e do_dc_takeover -e tengine_stonith_notify -e S_IDLE /var/log/corosync.log

Mar 08 08:43:22 [9934] lorien crmd: info: do_dc_takeover: Taking over DC status for this partition
Mar 08 08:43:22 [9934] lorien crmd: notice: tengine_stonith_notify: Peer gandalf was terminated (st_notify_fence) by mordor for gandalf: OK (ref=10d27664-33ed-43e0-a5bd-7d0ef850eb05) by client crmd.31561
Mar 08 08:43:22 [9934] lorien crmd: notice: tengine_stonith_notify: Notified CMAN that 'gandalf' is now fenced
Mar 08 08:43:22 [9934] lorien crmd: notice: tengine_stonith_notify: Target may have been our leader gandalf (recorded: <unset>)
Mar 08 09:13:52 [9934] lorien crmd: info: do_dc_takeover: Taking over DC status for this partition
Mar 08 09:13:52 [9934] lorien crmd: notice: do_dc_takeover: Marking gandalf, target of a previous stonith action, as clean
Mar 08 08:43:22 [9934] lorien crmd: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
Mar 08 08:43:28 [9934] lorien crmd: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]

特にtengine_stonith_notify()の最終エントリに注意が必要です。

Target may have been our leader gandalf (recorded: <unset>)

もし「State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE」の前、そして「Taking over DC status for this partition」の後に、この出力を発見した場合、次のDCが選択された後で、1つ以上の場所でリソースが起動してしまう可能性があります。

この問題は1.1.10より以前の @f30e1e43 で修正されました。しかしその古いコードはあまり評価されていませんでした。

[対策]
Pacemakerの1.1.6から1.1.9を利用している場合は、データ破損が発生する可能性があるので、バージョンを1.1.10以上にあげることをお勧めします。

[原文]
https://blog.clusterlabs.org/blog/2014/potential-for-data-corruption-in-pacemaker-1-dot-1-6-through-1-dot-1-9/

It has come to my attention that the potential for data corruption exists in Pacemaker versions 1.1.6 to 1.1.9

Everyone is strongly encouraged to upgrade to 1.1.10 or later.

Those using RHEL 6.4 or later (or a RHEL clone) should already have access to 1.1.10 via the normal update channels.

At issue is some faulty logic in a function called tengine_stonith_notify() which can incorrectly add successfully fenced nodes to a list, causing Pacemaker to subsequently erase that node’s status section when the next DC election occurs.

With the status section erased, the cluster thinks that node is safely down and begins starting any services it has on other nodes - despite those already being active.

In order to trigger the logic, the fenced node must:
1.have been the previous DC
2.been sufficently functional to request its own fencing, and
3.the fencing notification must arrive after the new DC has been elected, but before it invokes the policy engine

That this is the first we have heard of the issue since the problem was introduced in August 2011, the above sequence of events is apparently hard to hit under normal conditions.

Logs symptomatic of the issue look as follows:
# grep -e do_state_transition -e reboot -e do_dc_takeover -e tengine_stonith_notify -e S_IDLE /var/log/corosync.log

Mar 08 08:43:22 [9934] lorien crmd: info: do_dc_takeover: Taking over DC status for this partition
Mar 08 08:43:22 [9934] lorien crmd: notice: tengine_stonith_notify: Peer gandalf was terminated (st_notify_fence) by mordor for gandalf: OK (ref=10d27664-33ed-43e0-a5bd-7d0ef850eb05) by
client crmd.31561
Mar 08 08:43:22 [9934] lorien crmd: notice: tengine_stonith_notify: Notified CMAN that 'gandalf' is now fenced
Mar 08 08:43:22 [9934] lorien crmd: notice: tengine_stonith_notify: Target may have been our leader gandalf (recorded: <unset>)
Mar 08 09:13:52 [9934] lorien crmd: info: do_dc_takeover: Taking over DC status for this partition
Mar 08 09:13:52 [9934] lorien crmd: notice: do_dc_takeover: Marking gandalf, target of a previous stonith action, as clean
Mar 08 08:43:22 [9934] lorien crmd: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
Mar 08 08:43:28 [9934] lorien crmd: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]

Note in particular the final entry from tengine_stonith_notify():Target may have been our leader gandalf (recorded: <unset>)

If you see this after Taking over DC status for this partition but prior to State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE,then you are likely to have resources running in more than one location after the next DC election.

The issue was fixed during a routine cleanup prior to Pacemaker-1.1.10 in @f30e1e43 However the implications of what the old code allowed were not fully appreciated at the time.