renay****@ybb*****
renay****@ybb*****
2015年 3月 16日 (月) 21:48:53 JST
福田さん こんばんは、山内です。 以下に去年のOSC Tokyoでのfencing_topologyのサンプルがあるようです。 * http://linux-ha.sourceforge.jp/wp/wp-content/uploads/osc2014_crm.txt fencing_topologyで対象とするノードと実行stonithエージェントが制御出来ます。 ----------------- fencing_topology \ server01: prmStonith1 \ server02: prmStonith2 ----------------- の形式で、 1行に対象ノード: 実行するstonithエージェントを記載...[複数可能] 以下にも本家の情報があります。 * http://clusterlabs.org/wiki/Fencing_topology 以上です。 ----- Original Message ----- >From: Masamichi Fukuda - elf-systems <masamichi_fukud****@elf-s*****> >To: "linux****@lists*****" <linux****@lists*****> >Date: 2015/3/16, Mon 19:24 >Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて > > >松島さん > >こんばんは、福田です。 >早速のご連絡ありがとうございます。 > >crm_mon -rfAの表示です。 > >Last updated: Mon Mar 16 18:26:37 2015 >Last change: Mon Mar 16 18:04:31 2015 >Stack: heartbeat >Current DC: lbv2.beta.com (82ffc36f-1ad8-8686-7db0-35686465c624) - parti >tion with quorum >Version: 1.1.12-561c4cf >2 Nodes configured >10 Resources configured > > >Online: [ lbv1.beta.com lbv2.beta.com ] > >Full list of resources: > > Resource Group: HAvarnish > vip_208 (ocf::heartbeat:IPaddr2): Stopped > varnishd (lsb:varnish): Stopped > Resource Group: grpStonith1 > Stonith1-1 (stonith:external/stonith-helper): Stopped > Stonith1-2 (stonith:external/xen0): Stopped > Stonith1-3 (stonith:meatware): Stopped > Resource Group: grpStonith2 > Stonith2-1 (stonith:external/stonith-helper): Stopped > Stonith2-2 (stonith:external/xen0): Stopped > Stonith2-3 (stonith:meatware): Stopped > Clone Set: clone_ping [ping] > Stopped: [ lbv1.beta.com lbv2.beta.com ] > >Node Attributes: >* Node lbv1.beta.com: >* Node lbv2.beta.com: > >Migration summary: >* Node lbv2.beta.com: > Stonith1-1: migration-threshold=1 fail-count=1000000 last-failure='Mon Mar 16 > 18:23:47 2015' > ping: migration-threshold=1 fail-count=1000000 last-failure='Mon Mar 16 18:23 >:47 2015' >* Node lbv1.beta.com: > Stonith2-1: migration-threshold=1 fail-count=1000000 last-failure='Mon Mar 16 > 18:23:48 2015' > ping: migration-threshold=1 fail-count=1000000 last-failure='Mon Mar 16 18:23 >:55 2015' > >Failed actions: > Stonith1-1_start_0 on lbv2.beta.com 'unknown error' (1): call=39, st >atus=Error, last-rc-change='Mon Mar 16 18:23:44 2015', queued=0ms, exec=2014ms > ping_start_0 on lbv2.beta.com 'unknown error' (1): call=40, status=c >omplete, last-rc-change='Mon Mar 16 18:23:45 2015', queued=0ms, exec=995ms > Stonith2-1_start_0 on lbv1.beta.com 'unknown error' (1): call=39, st >atus=Error, last-rc-change='Mon Mar 16 18:23:45 2015', queued=0ms, exec=2009ms > ping_start_0 on lbv1.beta.com 'unknown error' (1): call=41, status=c >omplete, last-rc-change='Mon Mar 16 18:23:54 2015', queued=0ms, exec=182ms > > >標準出力、標準エラー出力はなく、ログ(/var/log/ha-debug)になります。 > >ノード1側(lbv1) > >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: info: Pacemaker support: yes >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: WARN: File /etc/ha.d//haresources exists. >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: WARN: This file is not used because pacemaker is enabled >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access of: /usr/local/heartbeat/libexec/heartbeat/ccm >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access of: /usr/local/heartbeat/libexec/pacemaker/cib >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access of: /usr/local/heartbeat/libexec/pacemaker/stonithd >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access of: /usr/local/heartbeat/libexec/pacemaker/lrmd >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access of: /usr/local/heartbeat/libexec/pacemaker/attrd >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access of: /usr/local/heartbeat/libexec/pacemaker/crmd >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: WARN: Core dumps could be lost if multiple dumps occur. >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum supportability >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: WARN: Logging daemon is disabled --enabling logging daemon is recommended >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: info: ************************** >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: info: Configuration validated. Starting heartbeat 3.0.6 >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: heartbeat: version 3.0.6 >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: Heartbeat generation: 1423534103 >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: seed is -1702799346 >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1 >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: glib: ucast: bound send socket to device: eth1 >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: glib: ucast: set SO_REUSEADDR >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: glib: ucast: bound receive socket to device: eth1 >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: glib: ucast: started on port 694 interface eth1 to 10.0.17.133 >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: Local status now set to: 'up' >Mar 16 18:22:53 lbv1.beta.com heartbeat: [1957]: info: Link lbv2.beta.com:eth1 up. >Mar 16 18:22:53 lbv1.beta.com heartbeat: [1957]: info: Status update for node lbv2.beta.com: status up >Mar 16 18:22:53 lbv1.beta.com heartbeat: [1957]: debug: get_delnodelist: delnodelist= >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Comm_now_up(): updating status to active >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Local status now set to: 'active' >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child client "/usr/local/heartbeat/libexec/heartbeat/ccm" (109,113) >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child client "/usr/local/heartbeat/libexec/pacemaker/cib" (109,113) >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child client "/usr/local/heartbeat/libexec/pacemaker/stonithd" (0,0) >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child client "/usr/local/heartbeat/libexec/pacemaker/lrmd" (0,0) >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child client "/usr/local/heartbeat/libexec/pacemaker/attrd" (109,113) >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child client "/usr/local/heartbeat/libexec/pacemaker/crmd" (109,113) >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Status update for node lbv2.beta.com: status active >Mar 16 18:22:54 lbv1.beta.com heartbeat: [2868]: info: Starting "/usr/local/heartbeat/libexec/pacemaker/stonithd" as uid 0 gid 0 (pid 2868) >Mar 16 18:22:54 lbv1.beta.com heartbeat: [2866]: info: Starting "/usr/local/heartbeat/libexec/heartbeat/ccm" as uid 109 gid 113 (pid 2866) >Mar 16 18:22:54 lbv1.beta.com heartbeat: [2871]: info: Starting "/usr/local/heartbeat/libexec/pacemaker/crmd" as uid 109 gid 113 (pid 2871) >Mar 16 18:22:54 lbv1.beta.com heartbeat: [2869]: info: Starting "/usr/local/heartbeat/libexec/pacemaker/lrmd" as uid 0 gid 0 (pid 2869) >Mar 16 18:22:54 lbv1.beta.com heartbeat: [2867]: info: Starting "/usr/local/heartbeat/libexec/pacemaker/cib" as uid 109 gid 113 (pid 2867) >Mar 16 18:22:54 lbv1.beta.com heartbeat: [2870]: info: Starting "/usr/local/heartbeat/libexec/pacemaker/attrd" as uid 109 gid 113 (pid 2870) >Mar 16 18:22:54 lbv1.beta.com ccm: [2866]: info: Hostname: lbv1.beta.com >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: the send queue length from heartbeat to client ccm is set to 1024 >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: the send queue length from heartbeat to client attrd is set to 1024 >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: the send queue length from heartbeat to client stonithd is set to 1024 >Mar 16 18:22:55 lbv1.beta.com heartbeat: [1957]: info: the send queue length from heartbeat to client cib is set to 1024 >Mar 16 18:22:58 lbv1.beta.com heartbeat: [1957]: WARN: 1 lost packet(s) for [lbv2.beta.com] [33:35] >Mar 16 18:22:58 lbv1.beta.com heartbeat: [1957]: info: No pkts missing from lbv2.beta.com! >Mar 16 18:22:59 lbv1.beta.com heartbeat: [1957]: info: the send queue length from heartbeat to client crmd is set to 1024 >Mar 16 18:22:59 lbv1.beta.com heartbeat: [1957]: WARN: 1 lost packet(s) for [lbv2.beta.com] [40:42] >Mar 16 18:22:59 lbv1.beta.com heartbeat: [1957]: info: No pkts missing from lbv2.beta.com! >ping(ping)[3164]: 2015/03/16_18:23:54 WARNING: Could not update default_ping_set = 100: rc=127 > >ノード2側(lbv2) > >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: info: Pacemaker support: yes >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: File /etc/ha.d//haresources exists. >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: This file is not used because pacemaker is enabled >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access of: /usr/local/heartbeat/libexec/heartbeat/ccm >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access of: /usr/local/heartbeat/libexec/pacemaker/cib >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access of: /usr/local/heartbeat/libexec/pacemaker/stonithd >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access of: /usr/local/heartbeat/libexec/pacemaker/lrmd >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access of: /usr/local/heartbeat/libexec/pacemaker/attrd >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access of: /usr/local/heartbeat/libexec/pacemaker/crmd >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: Core dumps could be lost if multiple dumps occur. >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum supportability >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: Logging daemon is disabled --enabling logging daemon is recommended >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: info: ************************** >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: info: Configuration validated. Starting heartbeat 3.0.6 >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: heartbeat: version 3.0.6 >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: Heartbeat generation: 1423534179 >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: seed is 2086609325 >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1 >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: glib: ucast: bound send socket to device: eth1 >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: glib: ucast: set SO_REUSEADDR >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: glib: ucast: bound receive socket to device: eth1 >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: glib: ucast: started on port 694 interface eth1 to 10.0.17.132 >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: Local status now set to: 'up' >Mar 16 18:22:48 lbv2.beta.com heartbeat: [1977]: info: Link lbv1.beta.com:eth1 up. >Mar 16 18:22:48 lbv2.beta.com heartbeat: [1977]: info: Status update for node lbv1.beta.com: status up >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: debug: get_delnodelist: delnodelist= >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Comm_now_up(): updating status to active >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Local status now set to: 'active' >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child client "/usr/local/heartbeat/libexec/heartbeat/ccm" (109,113) >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child client "/usr/local/heartbeat/libexec/pacemaker/cib" (109,113) >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child client "/usr/local/heartbeat/libexec/pacemaker/stonithd" (0,0) >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child client "/usr/local/heartbeat/libexec/pacemaker/lrmd" (0,0) >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child client "/usr/local/heartbeat/libexec/pacemaker/attrd" (109,113) >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child client "/usr/local/heartbeat/libexec/pacemaker/crmd" (109,113) >Mar 16 18:22:53 lbv2.beta.com heartbeat: [3026]: info: Starting "/usr/local/heartbeat/libexec/pacemaker/attrd" as uid 109 gid 113 (pid 3026) >Mar 16 18:22:53 lbv2.beta.com heartbeat: [3023]: info: Starting "/usr/local/heartbeat/libexec/pacemaker/cib" as uid 109 gid 113 (pid 3023) >Mar 16 18:22:53 lbv2.beta.com heartbeat: [3025]: info: Starting "/usr/local/heartbeat/libexec/pacemaker/lrmd" as uid 0 gid 0 (pid 3025) >Mar 16 18:22:53 lbv2.beta.com heartbeat: [3024]: info: Starting "/usr/local/heartbeat/libexec/pacemaker/stonithd" as uid 0 gid 0 (pid 3024) >Mar 16 18:22:53 lbv2.beta.com heartbeat: [3022]: info: Starting "/usr/local/heartbeat/libexec/heartbeat/ccm" as uid 109 gid 113 (pid 3022) >Mar 16 18:22:53 lbv2.beta.com heartbeat: [3027]: info: Starting "/usr/local/heartbeat/libexec/pacemaker/crmd" as uid 109 gid 113 (pid 3027) >Mar 16 18:22:54 lbv2.beta.com ccm: [3022]: info: Hostname: lbv2.beta.com >Mar 16 18:22:54 lbv2.beta.com heartbeat: [1977]: info: the send queue length from heartbeat to client ccm is set to 1024 >Mar 16 18:22:54 lbv2.beta.com heartbeat: [1977]: info: the send queue length from heartbeat to client attrd is set to 1024 >Mar 16 18:22:54 lbv2.beta.com heartbeat: [1977]: info: Status update for node lbv1.beta.com: status active >Mar 16 18:22:54 lbv2.beta.com heartbeat: [1977]: info: the send queue length from heartbeat to client stonithd is set to 1024 >Mar 16 18:22:54 lbv2.beta.com heartbeat: [1977]: info: the send queue length from heartbeat to client cib is set to 1024 >Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: quorum plugin: majority >Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: cluster:linux-ha, member_count=1, member_quorum_votes=100 >Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: total_node_count=2, total_quorum_votes=200 >Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: quorum plugin: twonodes >Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: cluster:linux-ha, member_count=1, member_quorum_votes=100 >Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: total_node_count=2, total_quorum_votes=200 >Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: info: Break tie for 2 nodes cluster >Mar 16 18:22:58 lbv2.beta.com heartbeat: [1977]: WARN: 1 lost packet(s) for [lbv1.beta.com] [30:32] >Mar 16 18:22:58 lbv2.beta.com heartbeat: [1977]: info: No pkts missing from lbv1.beta.com! >Mar 16 18:22:58 lbv2.beta.com heartbeat: [1977]: info: the send queue length from heartbeat to client crmd is set to 1024 >Mar 16 18:22:59 lbv2.beta.com heartbeat: [1977]: WARN: 1 lost packet(s) for [lbv1.beta.com] [35:37] >Mar 16 18:22:59 lbv2.beta.com heartbeat: [1977]: info: No pkts missing from lbv1.beta.com! >Mar 16 18:22:59 lbv2.beta.com ccm: [3022]: debug: quorum plugin: majority >Mar 16 18:22:59 lbv2.beta.com ccm: [3022]: debug: cluster:linux-ha, member_count=2, member_quorum_votes=200 >Mar 16 18:22:59 lbv2.beta.com ccm: [3022]: debug: total_node_count=2, total_quorum_votes=200 >ping(ping)[3144]: 2015/03/16_18:23:46 WARNING: Could not update default_ping_set = 100: rc=127 > > > >宜しくお願いします。 > >以上 > > > > >2015年3月16日 18:53 Takehiro Matsushima <takeh****@gmail*****>: > >福田さん >> >>こんばんは、松島です。 >>取り急ぎ1点確認させていただけますでしょうか。 >> >>ping RAのstartでunknown errorになっているのも気になりますので、 >>pingやStonith Helperについて、各RAが標準出力・標準エラー出力に吐き出した部分も含めて >>該当しそうなログの引用をいただければ幸いです。 >> >>---- >>Takehiro Matsushima >> >>_______________________________________________ >>Linux-ha-japan mailing list >>Linux****@lists***** >>http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan >> >> > > >-- > >ELF Systems >Masamichi Fukuda >mail to: masamichi_fukud****@elf-s***** >_______________________________________________ >Linux-ha-japan mailing list >Linux****@lists***** >http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan > > >