PacemakerでリソースのHAを監視する

ノードごとにMailToリソースを作って各ノードに割り当てる。
mailコマンドが必要なので入ってない場合は事前に入れる。

# ノード1の監視
pcs resource create MailTo1 ocf:heartbeat:MailTo email="<メールアドレス>"
pcs constraint location MailTo1 prefers <ノード1>
# ノード2の監視
pcs resource create MailTo2 ocf:heartbeat:MailTo email="<メールアドレス>"
pcs constraint location MailTo2 prefers <ノード2>
# ノード3の監視
pcs resource create MailTo1 ocf:heartbeat:MailTo email="<メールアドレス>"
pcs constraint location MailTo3 prefers <ノード3>

Lenovo m75s small gen2でNTPサーバーと同期できない

NTPはクライアント側のローカルに持っているハードウェアクロックとNTPサーバ側の時刻情報をしばらく比較した後に、NTPサーバ側がクライアントをrejectするかどうか決めるようである。その時にクライアント側のハードウェアクロックがあまりにも精度が悪いとサーバとの距離が遠いと判断されてflash=400 peer_distとしてrejectされる。jitterの値も1198.290と非常に高い。

[root@m75s-1-host ~]# ntpq -c as
ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 47715  9024   yes   yes  none    reject   reachable  2

[root@m75s-1-host ~]# ntpq -c "rv 47715"
associd=47715 status=9024 conf, reach, sel_reject, 2 events, reachable,
srcadr=raspberrypi.mkashi.com, srcport=123, dstadr=192.168.151.241,
dstport=123, leap=00, stratum=2, precision=-18, rootdelay=3.922,
rootdisp=5.783, refid=133.243.238.164,
reftime=e4118348.66206e7a  Fri, Apr  2 2021 11:49:28.398,
rec=e41183d1.b1b84388  Fri, Apr  2 2021 11:51:45.694, reach=037,
unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=1,
flash=400 peer_dist, keyid=0, offset=1869.194, delay=0.979,
dispersion=438.293, jitter=1198.290, xleave=0.265,
filtdelay=     0.98    0.94    0.67    0.43    1.05    0.00    0.00    0.00,
filtoffset= 1869.19 1441.02 1006.12  557.80  109.76    0.00    0.00    0.00,
filtdisp=      0.00    0.96    1.94    2.94    3.95 16000.0 16000.0 16000.0

ハードウェアクロックはm75sだとtsc, hpet, acpi_pmの中から選べる。tscが一番精度が高いのでtscがデフォルトになっているようである。ただm75sではtscの精度が悪く、実際の時間に比べてめちゃくちゃ進むのが遅くてNTPサーバから距離が遠いと判断されてしまうようだ。hpetだと問題なく同期できるようになるのでkernelの起動パラメータにclocksource=hpetを設定する。

[root@m75s-1-host ~]# cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc hpet acpi_pm 
[root@m75s-1-host ~]# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

hpetに変更後は下記の通りNTPサーバと同期できるようになった。

[root@m75s-1-host ~]# ntpq -c as
ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 35947  963a   yes   yes  none  sys.peer    sys_peer  3

[root@m75s-1-host ~]# ntpq -c "rv 35947"
associd=35947 status=963a conf, reach, sel_sys.peer, 3 events, sys_peer,
srcadr=raspberrypi.mkashi.com, srcport=123, dstadr=192.168.151.241,
dstport=123, leap=00, stratum=2, precision=-18, rootdelay=4.929,
rootdisp=29.007, refid=133.243.238.164,
reftime=e4125f84.662e55a0  Sat, Apr  3 2021  3:29:08.399,
rec=e4126306.366cee04  Sat, Apr  3 2021  3:44:06.212, reach=377,
unreach=0, hmode=3, pmode=4, hpoll=9, ppoll=9, headway=0, flash=00 ok,
keyid=0, offset=0.164, delay=0.866, dispersion=11.731, jitter=0.211,
xleave=0.235,
filtdelay=     1.02    0.87    1.11    0.99    1.80    1.01    0.98    1.01,
filtoffset=    0.24    0.16    0.08   -0.12    0.15   -0.09   -0.09   -0.13,
filtdisp=      0.01    4.03    8.08   12.07   16.12   19.99   23.86   27.86

kernel 4.19.84-2.el7.nutanix.20190916.276.x86_64ではtscで同期できなかったが、5.8.0-55-genericでは特に問題なくデフォルトのままで同期できた。kernelのバグかも知れない。

参考: http://log.or.cz/?p=80

Lenovo m75s gen2でkernelにACS overrideパッチを当ててIOMMUグループを分割する

使用しているkernelはNutanix CEのAHVに付属の4.19.84-2.el7.nutanix.20190916.276.x86_64。基本的にRHEL7のものと同じと思われる。

ACS overrideを有効にすることで内蔵のSATAコントローラのIOMMUグループが12に変わり他のPCIデバイスが同じグループ内にない状態になったので、CVMにSATAコントローラ単体でパススルーすることができるようになった。ACS overrideがないとホスト側に必要なPCIデバイスまでパススルーされてしまいホストごとパニックしてしまう。

ACS overrideパッチ適用前

IOMMU Group 0 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
IOMMU Group 10 02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ef]
IOMMU Group 10 02:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] Device [1022:43eb]
IOMMU Group 10 02:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43e9]
IOMMU Group 10 03:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
IOMMU Group 10 03:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
IOMMU Group 10 03:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
IOMMU Group 10 03:03.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
IOMMU Group 10 03:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
IOMMU Group 10 03:08.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
IOMMU Group 10 03:09.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
IOMMU Group 10 07:00.0 USB controller [0c03]: Renesas Technology Corp. uPD720202 USB 3.0 Host Controller [1912:0015] (rev 02)
IOMMU Group 10 09:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 0e)
IOMMU Group 11 0b:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
IOMMU Group 1 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge [1022:1633]
IOMMU Group 2 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
IOMMU Group 3 00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge [1022:1634]
IOMMU Group 4 00:02.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge [1022:1634]
IOMMU Group 5 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
IOMMU Group 5 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus [1022:1635]
IOMMU Group 5 0c:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:1636] (rev d8)
IOMMU Group 5 0c:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:1637]
IOMMU Group 5 0c:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor [1022:15df]
IOMMU Group 5 0c:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1 [1022:1639]
IOMMU Group 5 0c:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1 [1022:1639]
IOMMU Group 5 0c:00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2/FireFlight/Renoir Audio Processor [1022:15e2] (rev 01)
IOMMU Group 5 0c:00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller [1022:15e3]
IOMMU Group 6 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 51)
IOMMU Group 6 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
IOMMU Group 7 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 0 [1022:1448]
IOMMU Group 7 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 1 [1022:1449]
IOMMU Group 7 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 2 [1022:144a]
IOMMU Group 7 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 3 [1022:144b]
IOMMU Group 7 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 4 [1022:144c]
IOMMU Group 7 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 5 [1022:144d]
IOMMU Group 7 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 6 [1022:144e]
IOMMU Group 7 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 7 [1022:144f]
IOMMU Group 8 01:00.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
IOMMU Group 9 01:00.1 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)

ACS overrideパッチ適用後
pcie_acs_override=downstream,multifunction を設定して再起動

IOMMU Group 0 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
IOMMU Group 10 01:00.1 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
IOMMU Group 11 02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ef]
IOMMU Group 12 02:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] Device [1022:43eb]
IOMMU Group 13 02:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43e9]
IOMMU Group 14 03:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
IOMMU Group 15 03:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
IOMMU Group 16 03:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
IOMMU Group 17 03:03.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
IOMMU Group 18 03:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
IOMMU Group 19 03:08.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
IOMMU Group 1 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge [1022:1633]
IOMMU Group 20 03:09.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
IOMMU Group 21 07:00.0 USB controller [0c03]: Renesas Technology Corp. uPD720202 USB 3.0 Host Controller [1912:0015] (rev 02)
IOMMU Group 22 09:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 0e)
IOMMU Group 23 0b:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
IOMMU Group 24 0c:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:1636] (rev d8)
IOMMU Group 25 0c:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:1637]
IOMMU Group 26 0c:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor [1022:15df]
IOMMU Group 27 0c:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1 [1022:1639]
IOMMU Group 28 0c:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1 [1022:1639]
IOMMU Group 29 0c:00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2/FireFlight/Renoir Audio Processor [1022:15e2] (rev 01)
IOMMU Group 2 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
IOMMU Group 30 0c:00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller [1022:15e3]
IOMMU Group 3 00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge [1022:1634]
IOMMU Group 4 00:02.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge [1022:1634]
IOMMU Group 5 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
IOMMU Group 6 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus [1022:1635]
IOMMU Group 7 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 51)
IOMMU Group 7 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
IOMMU Group 8 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 0 [1022:1448]
IOMMU Group 8 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 1 [1022:1449]
IOMMU Group 8 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 2 [1022:144a]
IOMMU Group 8 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 3 [1022:144b]
IOMMU Group 8 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 4 [1022:144c]
IOMMU Group 8 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 5 [1022:144d]
IOMMU Group 8 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 6 [1022:144e]
IOMMU Group 8 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 7 [1022:144f]
IOMMU Group 9 01:00.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)

参考: archlinux wiki

Lenovo m75s gen2 smallにNutanix CE 2020.09.16を入れる

インストール前に下記のスライドは必読
https://www.slideshare.net/smzksts/nutanix-community-edition-518

ドライブ構成はUSBメモリ(AHV), NVMe(CVM), SATA SSD(CVM), SATA HDD(DATA)
m75s gen2で躓いたポイントは以下

  • USBからブート後、一度USBを抜き差ししないと「StandardError: Failed command: [hdparm -z /dev/sdc] with reason [BLKRRPART failed: Device or resource busy」というメッセージが出てインストーラが起動しない
  • 上記スライド p19 「インストール準備 起動優先度の設定 • に入り、 起動デバイスの優先順位を変更し、 インストーラー メモリが 最上位となるように設定する • キーや キー等 機種によって異なる で 入れるブートデバイス手動選択画面から 起動すると、なぜかスクリーンサイズ不足に なってしまい、インストールを続行できない (報告多数)」に従ってもインストールが続行できなかった。しょうがないのでUSB NIC(QNA-UC5G1T)をつないでsshdを起動し./ce_installer && screen -rでインストールを行った。
  • CVM用にNVMeを選択した場合は、NVMeのメーカーによってはインストール後、CVMのブートに失敗する。Transcend TS512GMTE110Sだと失敗、Samsung 970 EVOだと成功した。これはTS512GMTE110Sに採用されてるSMIのコントローラチップのバグのせいっぽい。https://bugzilla.kernel.org/show_bug.cgi?id=20205
    起動失敗時はQemuのログを見ると下記のようなメッセージが出ていた。
qemu-system-x86_64: -device vfio-pci,host=01:00.0,id=hostdev2,bus=pci.0,addr=0x8: vfio error: 0000:01:00.0: failed to add PCI capability 0x11[0x50]@0xb0: table & pba overlap, or they don't fit in BARs, or don't align

Kernel panic from 4.19.84-300.el7.x86_64

AHV用にACS overrideパッチを当てたkernelビルドしたらpanicした。
ソースを見たがよくわからないので放置。
パッチは無関係で暗号化のセルフテスト周りでpanicしているようである。
FIPSをオフにすればこのセルフテストは動かなくなる。

[    1.346310] alg: self-tests for sha1-ssse3 (sha1) passed
[    1.346392] alg: self-tests for sha1-ni (sha1) passed
[    1.349783] alg: self-tests for sha256-ssse3 (sha256) passed
[    1.349884] alg: self-tests for sha224-ssse3 (sha224) passed
[    1.350011] alg: self-tests for sha256-ni (sha256) passed
[    1.350058] alg: self-tests for sha224-ni (sha224) passed
[    1.357928] alg: self-tests for sha512-generic (sha512) passed
[    1.357997] alg: self-tests for sha384-generic (sha384) passed
[    1.360691] alg: self-tests for sha512-ssse3 (sha512) passed
[    1.360762] alg: self-tests for sha384-ssse3 (sha384) passed
[    1.374665] CPU feature 'AVX registers' is not supported.
[    1.410444] alg: self-tests for sha1 (sha1) passed
[    1.410566] alg: self-tests for ecb(des3_ede) (ecb(des3_ede)) passed
[    1.410687] alg: self-tests for cbc(des3_ede) (cbc(des3_ede)) passed
[    1.410838] alg: self-tests for ctr(des3_ede) (ctr(des3_ede)) passed
[    1.410861] alg: self-tests for sha256 (sha256) passed
[    1.410897] alg: self-tests for ecb(aes) (ecb(aes)) passed
[    1.410935] alg: self-tests for cbc(aes) (cbc(aes)) passed
[    1.410973] alg: self-tests for xts(aes) (xts(aes)) passed
[    1.411013] alg: self-tests for ctr(aes) (ctr(aes)) passed
[    1.413166] alg: self-tests for rfc3686(ctr-aes-aesni) (rfc3686(ctr(aes))) passed
[    1.413230] alg: self-tests for rfc3686(ctr(aes)) (rfc3686(ctr(aes))) passed
[    1.415995] alg: skcipher: Failed to load transform for cfb(aes): -2
[    1.415999] Kernel panic - not syncing: cfb(aes): cfb(aes) alg self test failed in fips mode!

[    1.416347] CPU: 1 PID: 407 Comm: modprobe Not tainted 4.19.84-300.el7.x86_64 #1
[    1.416577] Hardware name: Nutanix AHV, BIOS 1.11.0-2.el7 04/01/2014
[    1.416776] Call Trace:
[    1.416934]  dump_stack+0x64/0x83
[    1.417087]  panic+0xe8/0x25c
[    1.417250]  alg_test.part.14+0x360/0x380
[    1.417414]  do_test+0x4e74/0x5eed [tcrypt]
[    1.417595]  do_test+0x5edd/0x5eed [tcrypt]
[    1.417782]  ? 0xffffffffc026a000
[    1.417949]  tcrypt_mod_init+0x50/0x1000 [tcrypt]
[    1.418079]  ? 0xffffffffc026a000
[    1.418216]  do_one_initcall+0x4e/0x1d4
[    1.418379]  ? free_unref_page_commit+0x85/0xf0
[    1.418560]  ? _cond_resched+0x15/0x30
[    1.418878]  ? kmem_cache_alloc_trace+0x17f/0x1e0
[    1.419224]  do_init_module+0x5a/0x210
[    1.419464]  load_module.isra.71+0x20cb/0x27e0
[    1.419704]  ? m_show+0x1c0/0x1c0
[    1.419964]  ? vmap_page_range_noflush+0x282/0x420
[    1.420286]  __do_sys_init_module+0x11c/0x180
[    1.420575]  do_syscall_64+0x5b/0x1a0
[    1.420834]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    1.421277] RIP: 0033:0x7faeb2b42fca
[    1.421516] Code: 48 8b 0d a9 7e 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 76 7e 2c 00 f7 d8 64 89 01 48
[    1.422164] RSP: 002b:00007fff584491d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000af
[    1.422555] RAX: ffffffffffffffda RBX: 0000000000a903d0 RCX: 00007faeb2b42fca
[    1.422828] RDX: 000000000041a96e RSI: 000000000001f05b RDI: 0000000000a9f350
[    1.423099] RBP: 000000000041a96e R08: 0000000000000000 R09: 0000000000a93ee0
[    1.423449] R10: 0000000000a93e00 R11: 0000000000000206 R12: 0000000000a9f350
[    1.423722] R13: 0000000000a905a0 R14: 0000000000040000 R15: 0000000000000000
[    1.424063] Kernel Offset: 0x2d000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    1.424682] ---[ end Kernel panic - not syncing: cfb(aes): cfb(aes) alg self test failed in fips mode!
                ]---
[    1.425439] ------------[ cut here ]------------
[    1.425751] sched: Unexpected reschedule of offline CPU#0!
[    1.426092] WARNING: CPU: 1 PID: 407 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
[    1.426728] Modules linked in: tcrypt(+) authenc cmac wp512 twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common tea sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 serpent_sse2_x86_64 serpent_generic seed salsa20_generic rmd320 rmd256 rmd160 rmd128 michael_mic md4 khazad fcrypt dm_crypt des3_ede_x86_64 des_generic crc32c_intel ccm cast6_generic cast_common camellia_generic camellia_x86_64 blowfish_generic blowfish_x86_64 blowfish_common arc4 ansi_cprng vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio ipmi_devintf ipmi_msghandler
[    1.428579] CPU: 1 PID: 407 Comm: modprobe Not tainted 4.19.84-300.el7.x86_64 #1
[    1.429074] Hardware name: Nutanix AHV, BIOS 1.11.0-2.el7 04/01/2014
[    1.429404] RIP: 0010:native_smp_send_reschedule+0x39/0x40
[    1.429661] Code: 0f 92 c0 84 c0 74 15 48 8b 05 03 55 14 01 be fd 00 00 00 48 8b 40 30 e9 d5 d7 ba 00 89 fe 48 c7 c7 98 bc 0b af e8 c7 fb 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 48 83 ec 20 65 48 8b 04 25
[    1.430412] RSP: 0018:ffff9a6438043cf8 EFLAGS: 00010082
[    1.430723] RAX: 0000000000000000 RBX: ffff9a642f583b80 RCX: 0000000000000006
[    1.431079] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff9a64380568f0
[    1.431464] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000390
[    1.431808] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9a642f58435c
[    1.432167] R13: 0000000000000000 R14: 0000000000000046 R15: 0000000000020400
[    1.432505] FS:  00007faeb367e740(0000) GS:ffff9a6438040000(0000) knlGS:0000000000000000
[    1.433027] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.433367] CR2: 00007f73d061c288 CR3: 00000001304d6000 CR4: 00000000003006e0
[    1.433718] Call Trace:
[    1.434014]  <IRQ>
[    1.434310]  try_to_wake_up+0x3f0/0x450
[    1.434623]  __wake_up_common+0x8f/0x160
[    1.434939]  ep_poll_callback+0x1af/0x300
[    1.435263]  __wake_up_common+0x8f/0x160
[    1.435563]  __wake_up_common_lock+0x7a/0xc0
[    1.435895]  irq_work_run_list+0x4c/0x70
[    1.436228]  ? tick_sched_do_timer+0x80/0x80
[    1.436548]  update_process_times+0x3b/0x50
[    1.436854]  tick_sched_handle+0x25/0x60
[    1.437176]  tick_sched_timer+0x37/0x70
[    1.437485]  __hrtimer_run_queues+0xfb/0x270
[    1.437800]  hrtimer_interrupt+0x122/0x270
[    1.438104]  smp_apic_timer_interrupt+0x6a/0x140
[    1.438446]  apic_timer_interrupt+0xf/0x20
[    1.438764]  </IRQ>
[    1.439049] RIP: 0010:panic+0x209/0x25c
[    1.439381] Code: 83 3d 36 d6 8e 01 00 74 05 e8 3f 60 02 00 48 c7 c6 20 32 9a af 48 c7 c7 d0 6e 0c af 31 c0 e8 8e 5e 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 0b c0 0c 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 db d5
[    1.440264] RSP: 0018:ffffadfd0144baf8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[    1.440783] RAX: 0000000000000060 RBX: 0000000000000000 RCX: 0000000000000006
[    1.441211] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff9a64380568f0
[    1.441639] RBP: ffffadfd0144bb68 R08: 0000000000000000 R09: 000000000000038e
[    1.442034] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffaf105b38
[    1.442449] R13: 0000000000000000 R14: 0000000000000000 R15: ffffadfd0144be88
[    1.442829]  ? panic+0x202/0x25c
[    1.443135]  alg_test.part.14+0x360/0x380
[    1.443494]  do_test+0x4e74/0x5eed [tcrypt]
[    1.443838]  do_test+0x5edd/0x5eed [tcrypt]
[    1.444160]  ? 0xffffffffc026a000
[    1.444484]  tcrypt_mod_init+0x50/0x1000 [tcrypt]
[    1.444811]  ? 0xffffffffc026a000
[    1.445121]  do_one_initcall+0x4e/0x1d4
[    1.445466]  ? free_unref_page_commit+0x85/0xf0
[    1.445802]  ? _cond_resched+0x15/0x30
[    1.446104]  ? kmem_cache_alloc_trace+0x17f/0x1e0
[    1.446467]  do_init_module+0x5a/0x210
[    1.446789]  load_module.isra.71+0x20cb/0x27e0
[    1.447140]  ? m_show+0x1c0/0x1c0
[    1.447474]  ? vmap_page_range_noflush+0x282/0x420
[    1.447815]  __do_sys_init_module+0x11c/0x180
[    1.448188]  do_syscall_64+0x5b/0x1a0
[    1.448540]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    1.448867] RIP: 0033:0x7faeb2b42fca
[    1.449181] Code: 48 8b 0d a9 7e 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 76 7e 2c 00 f7 d8 64 89 01 48
[    1.450132] RSP: 002b:00007fff584491d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000af
[    1.450718] RAX: ffffffffffffffda RBX: 0000000000a903d0 RCX: 00007faeb2b42fca
[    1.451076] RDX: 000000000041a96e RSI: 000000000001f05b RDI: 0000000000a9f350
[    1.451493] RBP: 000000000041a96e R08: 0000000000000000 R09: 0000000000a93ee0
[    1.451867] R10: 0000000000a93e00 R11: 0000000000000206 R12: 0000000000a9f350
[    1.452213] R13: 0000000000a905a0 R14: 0000000000040000 R15: 0000000000000000
[    1.452568] ---[ end trace bce24ab776ddad45 ]---

CertbotによるLet’s Encryptの自動更新

自分で運用しているDNSサーバを使ってCertbotによるLet’s Encryptの証明書の自動更新を行う。
下記のcertbot-renew.shを1日に1回など定期的に実行する。deploy-hookは使ってるhttpdをリロードするコマンドを入れる。

#!/bin/sh

/usr/bin/certbot renew --preferred-challenges dns-01 --manual-auth-hook /root/bin/auth-hook.sh --manual-cleanup-hook /root/bin/clean-hook.sh --deploy-hook "systemctl reload apache2"

auth-hook.sh: DNSサーバにTXTレコードを仕込むシェルスクリプト

#!/bin/sh

echo "_acme-challenge.${CERTBOT_DOMAIN}. IN TXT \"${CERTBOT_VALIDATION}\"" | ssh dns "cat >>/etc/bind/mkashi.com.zone"
ssh dns systemctl restart bind9
ssh dns cat /etc/bind/mkashi.com.zone

clean-hook.sh: DNSサーバからTXTレコードを削除するスクリプト

#!/bin/sh

ssh dns "sed -i -e '/_acme-challenge.${CERTBOT_DOMAIN}. IN TXT \"${CERTBOT_VALIDATION}\"/d' /etc/bind/mkashi.com.zone"
ssh dns systemctl restart bind9
ssh dns cat /etc/bind/mkashi.com.zone

libvirtでbacking store for image is self-referentialのエラーでVMの起動に失敗

libvitで以下のような感じで起動に失敗する場合の対処法

error: Failed to start domain dns
error: internal error: backing store for /var/lib/libvirt/images/dns.latest (/var/lib/libvirt/images/dns.20201226050002-snap) is self-referential

対象VMのスナップショット一覧を確認

root@mks-m75q-1:~# ll /var/lib/libvirt/images/dns.*
-rw------- 1 libvirt-qemu kvm   2721972224 12月 18 11:00 /var/lib/libvirt/images/dns.20201212001502-snap
-rw------- 1 libvirt-qemu kvm     24838144 12月 18 11:00 /var/lib/libvirt/images/dns.20201218050002-snap
-rw------- 1 libvirt-qemu kvm      3407872 12月 18 09:48 /var/lib/libvirt/images/dns.20201218093744-snap
-rw------- 1 libvirt-qemu kvm      4521984 12月 18 09:59 /var/lib/libvirt/images/dns.20201218094746-snap
-rw------- 1 libvirt-qemu kvm     45416448 12月 18 10:16 /var/lib/libvirt/images/dns.20201218095908-snap
-rw------- 1 libvirt-qemu kvm      8781824 12月 18 10:36 /var/lib/libvirt/images/dns.20201218101540-snap
-rw------- 1 libvirt-qemu kvm      3670016 12月 18 10:39 /var/lib/libvirt/images/dns.20201218103530-snap
-rw------- 1 libvirt-qemu kvm     33816576 12月 18 10:48 /var/lib/libvirt/images/dns.20201218103837-snap
-rw------- 1 libvirt-qemu kvm      4194304 12月 18 10:51 /var/lib/libvirt/images/dns.20201218104836-snap
-rw------- 1 libvirt-qemu kvm      4259840 12月 18 10:56 /var/lib/libvirt/images/dns.20201218105121-snap
-rw------- 1 libvirt-qemu kvm      3735552 12月 18 11:00 /var/lib/libvirt/images/dns.20201218105635-snap
-rw------- 1 libvirt-qemu kvm    288358400 12月 19 05:00 /var/lib/libvirt/images/dns.20201218110021-snap
-rw------- 1 libvirt-qemu kvm    591462400 12月 20 05:00 /var/lib/libvirt/images/dns.20201219050001-snap
-rw------- 1 libvirt-qemu kvm    252968960 12月 21 05:00 /var/lib/libvirt/images/dns.20201220050001-snap
-rw------- 1 libvirt-qemu kvm    283901952 12月 22 05:00 /var/lib/libvirt/images/dns.20201221050001-snap
-rw------- 1 libvirt-qemu kvm     83034112 12月 23 05:00 /var/lib/libvirt/images/dns.20201222050001-snap
-rw------- 1 libvirt-qemu kvm    243269632 12月 24 05:00 /var/lib/libvirt/images/dns.20201223050001-snap
-rw------- 1 libvirt-qemu kvm    261750784 12月 25 05:00 /var/lib/libvirt/images/dns.20201224050002-snap
-rw------- 1 libvirt-qemu kvm    246022144 12月 26 05:00 /var/lib/libvirt/images/dns.20201225050001-snap
-rw------- 1 root         root   225443840 12月 26 18:40 /var/lib/libvirt/images/dns.20201226050002-snap
lrwxrwxrwx 1 root         root          47 12月 26 05:00 /var/lib/libvirt/images/dns.latest -> /var/lib/libvirt/images/dns.20201226050002-snap
-rw-rw-r-- 1 libvirt-qemu kvm  13776781312 12月 12 10:07 /var/lib/libvirt/images/dns.qcow2

各スナップショットのbacking fileを確認
何故か/var/lib/libvirt/images/dns.20201218104836-snapが/var/lib/libvirt/images/dns.latestを指してる

root@mks-m75q-1:~# ls /var/lib/libvirt/images/dns.* | while read line;do qemu-img info $line;done | grep "backing file:"
image: /var/lib/libvirt/images/dns.20201212001502-snap
backing file: /var/lib/libvirt/images/dns.qcow2
image: /var/lib/libvirt/images/dns.20201218050002-snap
backing file: /var/lib/libvirt/images/dns.20201212001502-snap
image: /var/lib/libvirt/images/dns.20201218093744-snap
backing file: /var/lib/libvirt/images/dns.20201218050002-snap
image: /var/lib/libvirt/images/dns.20201218094746-snap
backing file: /var/lib/libvirt/images/dns.20201218093744-snap
image: /var/lib/libvirt/images/dns.20201218095908-snap
backing file: /var/lib/libvirt/images/dns.20201218094746-snap
image: /var/lib/libvirt/images/dns.20201218101540-snap
backing file: /var/lib/libvirt/images/dns.20201218095908-snap
image: /var/lib/libvirt/images/dns.20201218103530-snap
backing file: /var/lib/libvirt/images/dns.20201218101540-snap
image: /var/lib/libvirt/images/dns.20201218103837-snap
backing file: /var/lib/libvirt/images/dns.20201218103530-snap
image: /var/lib/libvirt/images/dns.20201218104836-snap
backing file: /var/lib/libvirt/images/dns.latest
image: /var/lib/libvirt/images/dns.20201218105121-snap
backing file: /var/lib/libvirt/images/dns.20201218104836-snap
image: /var/lib/libvirt/images/dns.20201218105635-snap
backing file: /var/lib/libvirt/images/dns.20201218105121-snap
image: /var/lib/libvirt/images/dns.20201218110021-snap
backing file: /var/lib/libvirt/images/dns.20201218105635-snap
image: /var/lib/libvirt/images/dns.20201219050001-snap
backing file: /var/lib/libvirt/images/dns.20201218110021-snap
image: /var/lib/libvirt/images/dns.20201220050001-snap
backing file: /var/lib/libvirt/images/dns.20201219050001-snap
image: /var/lib/libvirt/images/dns.20201221050001-snap
backing file: /var/lib/libvirt/images/dns.20201220050001-snap
image: /var/lib/libvirt/images/dns.20201222050001-snap
backing file: /var/lib/libvirt/images/dns.20201221050001-snap
image: /var/lib/libvirt/images/dns.20201223050001-snap
backing file: /var/lib/libvirt/images/dns.20201222050001-snap
image: /var/lib/libvirt/images/dns.20201224050002-snap
backing file: /var/lib/libvirt/images/dns.20201223050001-snap
image: /var/lib/libvirt/images/dns.20201225050001-snap
backing file: /var/lib/libvirt/images/dns.20201224050002-snap
image: /var/lib/libvirt/images/dns.20201226050002-snap
backing file: /var/lib/libvirt/images/dns.20201225050001-snap
image: /var/lib/libvirt/images/dns.latest
backing file: /var/lib/libvirt/images/dns.20201225050001-snap
image: /var/lib/libvirt/images/dns.qcow2

backing fileを正しいスナップショットに修正

qemu-img rebase -u -f qcow2 -F qcow2 -b /var/lib/libvirt/images/dns.20201218103837-snap /var/lib/libvirt/images/dns.20201218104836-snap

backing fileが正しくなったのを確認

root@mks-m75q-1:~# qemu-img info /var/lib/libvirt/images/dns.20201218104836-snap
image: /var/lib/libvirt/images/dns.20201218104836-snap
file format: qcow2
virtual size: 25 GiB (26843545600 bytes)
disk size: 3.95 MiB
cluster_size: 65536
backing file: /var/lib/libvirt/images/dns.20201218103837-snap
backing file format: qcow2
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

参考情報
https://blog.programster.org/qemu-img-cheatsheet

LVM raid1の復旧

vgreduce vgubuntu --removemissing --force
pvcreate /dev/nvme0n1p2
vgextend vgubuntu /dev/nvme0n1p2
lvconvert --repair /dev/vgubuntu/root
lvconvert --repair /dev/vgubuntu/swap_1
lvs -a -o name,copy_percent,devices vgubuntu

Ubuntu 20.04でaqc111のビルド

QNAP QNA-UC5G1Tのドライバをdkmsを使って自動ビルドする。
Aquantiaからaqc111の1.3.3.0を持ってきて、展開する。
Makefileを書き換えないとビルドに失敗するので以下のように書き換える

--- aqc111-1.3.3.0/Makefile     2019-07-02 18:36:06.000000000 +0900
+++ aqc111-1.3.3.1/Makefile     2020-10-05 23:57:00.873643349 +0900
@@ -23,7 +23,7 @@
 obj-m      :=  $(TARGET).o

 default:
-       make -C $(BUILD_DIR) SUBDIRS=$(PWD) modules
+       make -C $(BUILD_DIR) M=$(shell pwd) modules

 $(TARGET).o: $(OBJS)
        $(LD) $(LD_RFLAG) -r -o $@ $(OBJS)
@@ -32,7 +32,7 @@
        cp -v $(TARGET).ko $(DEST) && /sbin/depmod -a

 clean:
-       $(MAKE) -C $(BUILD_DIR) SUBDIRS=$(PWD) clean
+       $(MAKE) -C $(BUILD_DIR) M=$(shell pwd) clean

 .PHONY: modules clean

このパッチを当てる

ソースコードと同じディレクトリに下記のdkms.confを作る

PACKAGE_NAME="aqc111"
PACKAGE_VERSION="1.3.3.1"
BUILT_MODULE_NAME[0]="aqc111"
DEST_MODULE_LOCATION[0]="/kernel/drivers/net/usb"
AUTOINSTALL="yes"

ソースコードのディレクトリを/usr/src/aqc111-1.3.3.1に移動

apt -y install dkms
dkms add -m aqc111 -v 1.3.3.1
dkms build -m aqc111 -v 1.3.3.1
dkms install -m aqc111 -v 1.3.3.1
dkms status

aqc111でbondを使えるようにする

Aquantiaで公開されている1.3.3.0とUbuntu 20.04のkernel 5.4.0に含まれてるaqc111ドライバではbondがうまく動かないので以下のパッチを当てる必要がある。

--- aqc111-1.3.3.0/aqc111.c     2019-07-02 18:36:06.000000000 +0900
+++ aqc111-1.3.3.1/aqc111.c     2020-10-07 21:50:30.412367747 +0900
@@ -21,7 +21,7 @@
 #include "aq_compat.h"
 #include "aqc111.h"

-#define DRIVER_VERSION "1.3.3.0"
+#define DRIVER_VERSION "1.3.3.1"
 #define DRIVER_NAME "aqc111"

 static int aqc111_read_cmd_nopm(struct usbnet *dev, u8 cmd, u16 value,
@@ -724,17 +724,23 @@
 }

 static int aqc111_set_mac_addr(struct net_device *net, void *p)
-{
-       struct usbnet *dev = netdev_priv(net);
-       int ret = 0;

-       ret = eth_mac_addr(net, p);
-       if (ret < 0)
-               return ret;
+{
+  struct usbnet *dev = netdev_priv(net);
+  int ret = 0;

-       /* Set the MAC address */
-       return aqc111_write_cmd(dev, AQ_ACCESS_MAC, SFR_NODE_ID, ETH_ALEN,
-                               ETH_ALEN, net->dev_addr);
+  ret = eth_mac_addr(net, p);
+  if (ret < 0)
+    return ret;
+
+  /* Set the MAC address */
+  ret = aqc111_write_cmd(dev, AQ_ACCESS_MAC, SFR_NODE_ID, ETH_ALEN,
+                        ETH_ALEN, net->dev_addr);
+
+  if (ret < 0)
+    return ret;
+
+  return (ret == ETH_ALEN) ? 0 : -1;
 }

 static int aqc111_vlan_rx_kill_vid(struct net_device *net,
@@ -1013,6 +1019,7 @@
        dev->net->hw_features |= AQ_SUPPORT_HW_FEATURE;
        dev->net->features |= AQ_SUPPORT_FEATURE;
        dev->net->vlan_features |= AQ_SUPPORT_VLAN_FEATURE;
+       dev->net->priv_flags |= IFF_LIVE_ADDR_CHANGE;

        aqc111_read_fw_version(dev, aqc111_data);
        aqc111_data->autoneg = AUTONEG_ENABLE;