Opened 13 years ago
Closed 12 years ago
#611 closed defect (fixed)
Incorrect restoration of grub on SLES
Reported by: | Bruno Cornec | Owned by: | Bruno Cornec |
---|---|---|---|
Priority: | high | Milestone: | 3.0.3 |
Component: | mondo | Version: | 3.0.1 |
Severity: | major | Keywords: | |
Cc: |
Description
When erasing the MBR of disks, mondorestore does not restore correctly the bootloader content with grub.
Workaround is to perfrm a manual reinstall of grub in a chroot mode.
Will need /var/log/mondorestore.log to solve this issue correctly.
Attachments (2)
Change History (23)
comment:1 by , 13 years ago
Status: | new → assigned |
---|
comment:2 by , 13 years ago
This error:
expr: non-numeric argument
is not coming from the grub-MR script, so probably is from the grub-install from SLES.
However, this command returns 0 so mondorestore deduces that it was correctly installed, whereas it may not have been due to the mentioned error.
Would need an interactive session at that time so that we could pass:
chroot /mnt/restoring grub-install /dev/cciss/c0d0
comment:3 by , 13 years ago
We get in mondorestore.log
Now I'll use grub-install expr: non-numeric argument
Instead, in mondorestore.log, we should get, echoed by grub-MR shell-script:
Now I'll use grub-install grub-install returned 0 (0 if ok) or 1 (if problem)
So it's clear that grub-install shell-script fails.
While checking SLES 10 grub-install shell-script, I see that it begins with:
#! /bin/sh
And, when I did some tests for ticket #600, I got also some "expr: non-numeric argument" errors.
So I think that this problem (grub-install shell-script failing) could be related to the mondorescue /bin/sh problem that I submitted in ticket #600.
That problem is: at mondorescue iso boot, /bin/sh is linked to busybox limited sh, instead of been linked to /bin/bash (as it is inside SLES and inside RHEL).
So I think that the solution is to modify mondorescue to get at boot: /bin/sh a soft link to /bin/bash, instead of a soft link to busybox sh.
From there, we could test again and check if this problem still occurs.
comment:4 by , 13 years ago
Likely, this is not the same issue. grub-install is launched chrooted, under /mnt/RESTORING, where bash is the default std shell, not busybox sh.
comment:5 by , 13 years ago
That 'expr' error message seems to be an upstream grub issue found on multiple distributions:
- https://bugzilla.redhat.com/show_bug.cgi?id=508189 (Fedora 11)
- https://bugzilla.redhat.com/show_bug.cgi?id=402151 (RHEL 5)
- https://bugzilla.redhat.com/show_bug.cgi?id=736833 (RHEL 6)
- https://bugs.launchpad.net/ubuntu/+source/grub-installer/+bug/720558 (Ubuntu 10.04)
- https://qa.mandriva.com/show_bug.cgi?id=52397 (Mandriva 2009)
- http://www.linuxforen.de/forums/archive/index.php/t-213139.html (SuSE 10)
Maybe there is a SLES update to fix it ?
comment:6 by , 13 years ago
Milestone: | 3.0.2 → 3.0.3 |
---|
comment:7 by , 13 years ago
The bug is not solved in SLES, SLES 11 uses the following workaround.
The shell-script grub-install called without arguments calls :
test -x /usr/sbin/grub && \ grep -q quit /etc/grub.conf 2>/dev/null && \ grub --batch < /etc/grub.conf && exit 0
So, a mondo solution for SLES 11 could be
In grub-MR (mondorestore script) , modify "grub-install.unsupported" test sections (lines 109 and 115) to call:
/usr/sbin/grub-install
instead of:
/usr/sbin/grub-install.unsupported $1
And, because grub-install uses it, add to mondorescue backup (add in /etc/mindi/deplist.d/minimal.conf):
# SLES 11 and grub /etc/grub.conf /usr/sbin/grub /sbin/yast2 /lib/libncurses.so.5 /lib/libc.so.6 /lib/libdl.so.2
comment:8 by , 13 years ago
I tested the solution to modify grub-MR to call SLES 11 /usr/sbin/grub-install without argument (so that it launch grub command with the content of /etc/grub.conf).
It failed to configure well some GRUB parts, after mondorestore the machine was not booting on the hard disk, even the GRUB prompt doesn't appear on the screen.
comment:9 by , 13 years ago
Here is the information stored in mondorestore.log, when using SLES 11 /usr/sbin/grub-install without argument (so that it launch grub command with the content of /etc/grub.conf):
GNU GRUB version 0.97 (640K lower / 3072K upper memory) grub> setup --stage2=/boot/grub/stage2 --force-lba (hd0,0) (hd0,0) Checking if "/boot/grub/stage1" exists... yes Checking if "/boot/grub/stage2" exists... yes Checking if "/boot/grub/e2fs_stage1_5" exists... yes Running "embed /boot/grub/e2fs_stage1_5 (hd0,0)"... failed (this is not fatal) Running "embed /boot/grub/e2fs_stage1_5 (hd0,0)"... failed (this is not fatal) Running "install --force-lba --stage2=/boot/grub/stage2 /boot/grub/stage1 (hd0,0) /boot/grub/stage2 p /boot/grub/menu.lst "... succeeded Done. grub> quit grub-install in chroot returned 0
But, at reboot, the machine was not booting on the hard disk, even the GRUB prompt doesn't appear on the screen.
comment:10 by , 13 years ago
That solution didn't configure all the GRUB chain, because, after the "# mondorestore -Z nuke" the machine was not booting on the hard disk, it switched directly to the next peripheral (a CD-Live).
I tried also "# mondorestore -Z mbr", but the result in mondorestore.log is the same and the pb remains.
I manually copied the MBR (BOOTLOADER.MBR saved by mondorescue) in the machine MBR through dd. After that, at reboot it stops at the "GRUB _" prompt (flashing cursor), and doesn't boot to the OS. It's not the "grub>" prompt, I'm unable to type anything or ESC from it.
I finally booted on a SLES install CD, and repaired manually grub ; this time all succeeded (even : "embed /boot/grub/e2fs_stage1_5 (hd0)"... 17 sectors are embedded.).
Next time I'll try : # /sbin/yast2 bootloader
comment:11 by , 13 years ago
In my SLES 11 /etc/grub.conf file I see :
setup --stage2=/boot/grub/stage2 --force-lba (hd0,0) (hd0,0) quit
I thought that the grub boot problem could come from there, so I modified it manually to :
setup --stage2=/boot/grub/stage2 --force-lba (hd0) (hd0,0) quit
I tested it :
# bash -x /usr/sbin/grub-install + '[' 0 -gt 0 ']' + test -x /usr/sbin/grub + grep -q quit /etc/grub.conf + grub --batch < /etc/grub.conf GNU GRUB version 0.97 (640K lower / 3072K upper memory) [ Minimal BASH-like line editing is supported. For the first word, TAB lists possible command completions. Anywhere else TAB lists the possible completions of a device/filename. ] grub> setup --stage2=/boot/grub/stage2 --force-lba (hd0) (hd0,0) Checking if "/boot/grub/stage1" exists... yes Checking if "/boot/grub/stage2" exists... yes Checking if "/boot/grub/e2fs_stage1_5" exists... yes Running "embed /boot/grub/e2fs_stage1_5 (hd0)"... 17 sectors are embedded. succeeded Running "install --force-lba --stage2=/boot/grub/stage2 /boot/grub/stage1 (hd0) (hd0)1+17 p (hd0,0)/boot/grub/stage2 /boot/grub/menu.lst"... succeeded Done. grub> quit + exit 0
The reboot after this grub-install is ok, so it confirms the origin of the problem : the /etc/grub.conf of my server was badly configured.
comment:12 by , 13 years ago
To check further, I put back my server /etc/grub/conf to its original configuration
setup --stage2=/boot/grub/stage2 --force-lba (hd0,0) (hd0,0) quit
and I tested again grub-install
# cat /etc/grub.conf setup --stage2=/boot/grub/stage2 --force-lba (hd0,0) (hd0,0) quit # /usr/sbin/grub-install GNU GRUB version 0.97 (640K lower / 3072K upper memory) grub> setup --stage2=/boot/grub/stage2 --force-lba (hd0,0) (hd0,0) Checking if "/boot/grub/stage1" exists... yes Checking if "/boot/grub/stage2" exists... yes Checking if "/boot/grub/e2fs_stage1_5" exists... yes Running "embed /boot/grub/e2fs_stage1_5 (hd0,0)"... failed (this is not fatal) Running "embed /boot/grub/e2fs_stage1_5 (hd0,0)"... failed (this is not fatal) Running "install --force-lba --stage2=/boot/grub/stage2 /boot/grub/stage1 (hd0,0) /boot/grub/stage2 p /boot/grub/menu.lst "... succeeded Done. grub> quit #
After reboot, the machine booted well also.
I checked also another SLES 11 machine, in /etc/grub/conf it has the same syntax as my original configuration
setup --stage2=/boot/grub/stage2 --force-lba (hd0,1) (hd0,1) quit
comment:13 by , 12 years ago
My test didn't worked because Device ID are used by SLES 11, usage of Device Name (/dev/sda for example) in /etc/fstab, /boot/grub/device.map and in /boot/grub/menu.lst as described in p2v document should solve the grub install issue.
So the solution for SLES 11 is the following.
In grub-MR (mondorestore script) , modify "grub-install.unsupported" test sections (lines 109 and 115) to call:
/usr/sbin/grub-install
instead of:
/usr/sbin/grub-install.unsupported $1
And, because grub-install uses it, add to mondorescue backup (add in /etc/mindi/deplist.d/minimal.conf):
# SLES 11 and grub /etc/grub.conf /usr/sbin/grub /sbin/yast2 /lib/libncurses.so.5 /lib/libc.so.6 /lib/libdl.so.2
comment:14 by , 12 years ago
I restored the mondorescue backup in nuke mode, on the backuped hard disk, in order to avoid any Device ID problem.
The restored machine didn't boot.
I checked why.
Before the backup there was no boot problem, in the following fdisk result we see that c0d0p1 partition had the "bootable flag" set.
Device Boot Start End Blocks Id System /dev/cciss/c0d0p1 * 2048 258047 128000 83 Linux
I checked after the restore : the problem is that c0d0p1 partition had not the "bootable flag" set.
Solution
I booted again on the mondorestore.iso, in expert mode, and I just set manually the c0d0p1 partition "bootable flag" :
# fdisk /dev/cciss/c0d0 a 1 w
After that, the restored SLES 11 booted well on c0d0p1.
To be implemented
So for SLES 11, to the modification that I recommended in "comment: 13", should be added too:
- during the mondoarchive, check which partition has the bootable flag set,
- during the restore, set the bootable flag to the right partition.
comment:15 by , 12 years ago
So it seems that when linking this BR and #412, after opensuse 11.2 and SLES 11 SP2, grub-install was changed. The previous was renamed /usr/sbin/grub-install.unsupported and a new created with a different interface (no parameter supported).
So rev [3047] uses now the presence of /usr/sbin/grub-install.unsupported as a trigger to call /usr/sbin/grub-install without parameters.
Remains to see why the active flag is not recreated correctly in that context.
comment:16 by , 12 years ago
Could someone with a SLES SP2 distro run the following commands as root and report back (probably using a CCISS driver but also interesting without):
mindi --makemountlist /tmp/1 /usr/share/mondo/restore-scripts/mondo/make-me-bootable /tmp/1 dummy
WARNING: The second command will call fdisk to set up the bootable flag, so use it with the "dummy" key word as mentionned upper to avoid this.
If the second doesn't report the correct bootable partition, please run it with sh -x in front of it.
comment:17 by , 12 years ago
This is the result on our SLES 11 SP2 :
# fdisk -l | grep c0 Disk /dev/mapper/vgRAC-lvRAC doesn't contain a valid partition table Disk /dev/mapper/vg2-lv2 doesn't contain a valid partition table Disk /dev/mapper/vg1-lv1 doesn't contain a valid partition table Disk /dev/mapper/vg8-lvol8 doesn't contain a valid partition table Disk /dev/mapper/vg7-lvol7 doesn't contain a valid partition table Disk /dev/cciss/c0d0: 250.0 GB, 250023444480 bytes /dev/cciss/c0d0p1 * 2048 540879 269416 83 Linux /dev/cciss/c0d0p2 540880 488327039 243893080 5 Extended /dev/cciss/c0d0p5 542928 2767135 1112104 82 Linux swap / Solaris /dev/cciss/c0d0p6 * 2769184 223493327 110362072 83 Linux /dev/cciss/c0d0p7 223495376 355914343 66209484 8e Linux LVM /dev/cciss/c0d0p8 355916392 488327038 66205323+ 8e Linux LVM Disk /dev/cciss/c0d1: 250.0 GB, 250023444480 bytes /dev/cciss/c0d1p1 * 2048 540879 269416 83 Linux /dev/cciss/c0d1p2 540880 488327039 243893080 5 Extended /dev/cciss/c0d1p5 542928 2767135 1112104 82 Linux swap / Solaris /dev/cciss/c0d1p6 2769184 223493327 110362072 83 Linux /dev/cciss/c0d1p7 223495376 355914343 66209484 8e Linux LVM /dev/cciss/c0d1p8 355916392 488327038 66205323+ 8e Linux LVM Disk /dev/cciss/c0d2: 250.0 GB, 250023444480 bytes /dev/cciss/c0d2p1 2048 258047 128000 83 Linux /dev/cciss/c0d2p2 * 258048 20738047 10240000 83 Linux /dev/cciss/c0d2p3 20738048 22835199 1048576 82 Linux swap / Solaris /dev/cciss/c0d2p4 22835200 488327039 232745920 5 Extended /dev/cciss/c0d2p5 22835232 23043839 104304 83 Linux /dev/cciss/c0d2p6 23045888 25143039 1048576 8e Linux LVM /dev/cciss/c0d2p7 25145088 25349887 102400 82 Linux swap / Solaris /dev/cciss/c0d2p8 25351936 46323455 10485760 8e Linux LVM /dev/cciss/c0d2p9 46325504 67297023 10485760 8e Linux LVM /dev/cciss/c0d2p10 67299072 109242111 20971520 8e Linux LVM
The c0d0p1 is our boot partition.
# mindi --makemountlist /tmp/1 Your mountlist will look like this: Analyzing LVM... DEVICE MOUNTPOINT FORMAT SIZE (MB) LABEL/UUID /dev/cciss/c0d2p10 lvm lvm 20480 /dev/cciss/c0d2p8 lvm lvm 10240 /dev/cciss/c0d2p6 lvm lvm 1024 /dev/cciss/c0d2p9 lvm lvm 10240 /dev/cciss/c0d1p8 lvm lvm 64653 /dev/cciss/c0d1p7 lvm lvm 64657 /dev/cciss/c0d0p8 lvm lvm 64653 /dev/cciss/c0d0p7 lvm lvm 64657 /dev/cciss/c0d0p5 swap swap 1086 /dev/cciss/c0d0p6 / ext3 107775 /dev/cciss/c0d0p1 /boot ext2 263 /dev/dm-4 /data/lv7 ext3 30720 /dev/dm-3 /data/lv8 reiserfs 30720
# /usr/share/mondo/restore-scripts/mondo/make-me-bootable /tmp/1 dummy 1
# cat /tmp/1 /dev/cciss/c0d2p10 lvm lvm 20971520 /dev/cciss/c0d2p8 lvm lvm 10485760 /dev/cciss/c0d2p6 lvm lvm 1048576 /dev/cciss/c0d2p9 lvm lvm 10485760 /dev/cciss/c0d1p8 lvm lvm 66205323 /dev/cciss/c0d1p7 lvm lvm 66209484 /dev/cciss/c0d0p8 lvm lvm 66205323 /dev/cciss/c0d0p7 lvm lvm 66209484 /dev/cciss/c0d0p5 swap swap 1112100 /dev/cciss/c0d0p6 / ext3 110362072 /dev/cciss/c0d0p1 /boot ext2 269416 /dev/dm-4 /data/lv7 ext3 31457280 /dev/dm-3 /data/lv8 reiserfs 31457280
comment:18 by , 12 years ago
About the active flag (bootable flag), mondorestore had set it, but on c0d0p6 instead of c0d0p1. So I had to set it manually to c0d0p1 through fdisk to be able to boot. It's why now you see it on both c0d0p1 and c0d0p6 in the list (in comment:17). I emailed more information to your professional email, Bruno (mondoarchive.log, mondorestore.log, etc.).
comment:19 by , 12 years ago
I tested again, at mondorestore boot flag is set properly to c0d0p1 if hard disk is not already formatted with a partition with a boot flag.
comment:20 by , 12 years ago
Rev [3057] should provide a correct solution to partition activation when old flags were still there.
comment:21 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
related log extract: