Opened 12 years ago

Closed 11 years ago

#611 closed defect (fixed)

Incorrect restoration of grub on SLES

Reported by: Bruno Cornec Owned by: Bruno Cornec
Priority: high Milestone: 3.0.3
Component: mondo Version: 3.0.1
Severity: major Keywords:
Cc:

Description

When erasing the MBR of disks, mondorestore does not restore correctly the bootloader content with grub.

Workaround is to perfrm a manual reinstall of grub in a chroot mode.

Will need /var/log/mondorestore.log to solve this issue correctly.

Attachments (2)

patch for SLES 11 SP2.zip (5.5 KB ) - added by victor gattegno 12 years ago.
grub-MR and minimal.conf modified
manually restored grub.jpg (169.4 KB ) - added by victor gattegno 12 years ago.
grub manual restoration

Download all attachments as: .zip

Change History (23)

comment:1 by Bruno Cornec, 12 years ago

Status: newassigned

related log extract:

[Main] mondo-rstr-tools.c->offer_to_make_initrd#1267: Non-interactive mode: no way to give you the keyboard so that you re-generate your
initrd. Hope it's OK
                        [Main] libmondo-files.c->find_home_of_exe#363: find_home_of_exe() --- Could not find pico
                        [Main] libmondo-files.c->find_home_of_exe#363: find_home_of_exe() --- Could not find nano
                        [Main] libmondo-files.c->find_home_of_exe#363: find_home_of_exe() --- Could not find e3em
                        [Main] libmondo-files.c->find_home_of_exe#363: find_home_of_exe() --- Could not find e3vi
                        [Main] libmondo-files.c->find_home_of_exe#360: find_home_of_exe () --- Found vi at /bin/vi
running: which grub-MR > //mondo.tmp.wGg6Og/mondo-run-prog-thing.tmp 2> //mondo.tmp.wGg6Og/mondo-run-prog-thing.err
--------------------------------start of output-----------------------------
/mondo/grub-MR
--------------------------------end of output------------------------------
...ran just fine. :-)
[Main] mondo-rstr-tools.c->run_grub#1427: Yay! grub-MR found...
[Main] mondo-rstr-tools.c->run_grub#1429: command = grub-MR /dev/cciss/c0d0 /tmp/mountlist.txt
Running GRUB...
        [Main] mondo-rstr-tools.c->run_grub#1508: grub-MR /dev/cciss/c0d0 /tmp/mountlist.txt
Now I'll use grub-install
expr: non-numeric argument
Installation finished. No error reported.
This is the contents of the device map /boot/grub/device.map.
Check if this is correct or not. If any of the lines is incorrect,
fix it and re-run the script `grub-install'.

(hd0)   /dev/cciss/c0d0
grub-install returned 0
running: grub-MR /dev/cciss/c0d0 /tmp/mountlist.txt > //mondo.tmp.wGg6Og/mondo-run-prog-thing.tmp 2>
//mondo.tmp.wGg6Og/mondo-run-prog-thing.err
--------------------------------start of output-----------------------------
--------------------------------end of output------------------------------
...ran just fine. :-)
Done.
Your boot loader ran OK
[Main] mondorestore.c->nuke_mode#882: Great! Boot loader was installed. No need for msg at end.

comment:2 by Bruno Cornec, 12 years ago

This error:

expr: non-numeric argument

is not coming from the grub-MR script, so probably is from the grub-install from SLES.

However, this command returns 0 so mondorestore deduces that it was correctly installed, whereas it may not have been due to the mentioned error.

Would need an interactive session at that time so that we could pass:

chroot /mnt/restoring grub-install /dev/cciss/c0d0

comment:3 by victor gattegno, 12 years ago

We get in mondorestore.log

Now I'll use grub-install
expr: non-numeric argument

Instead, in mondorestore.log, we should get, echoed by grub-MR shell-script:

Now I'll use grub-install
grub-install returned 0       (0 if ok) or 1 (if problem)

So it's clear that grub-install shell-script fails.

While checking SLES 10 grub-install shell-script, I see that it begins with:

#! /bin/sh

And, when I did some tests for ticket #600, I got also some "expr: non-numeric argument" errors.

So I think that this problem (grub-install shell-script failing) could be related to the mondorescue /bin/sh problem that I submitted in ticket #600.

That problem is: at mondorescue iso boot, /bin/sh is linked to busybox limited sh, instead of been linked to /bin/bash (as it is inside SLES and inside RHEL).

So I think that the solution is to modify mondorescue to get at boot: /bin/sh a soft link to /bin/bash, instead of a soft link to busybox sh.

From there, we could test again and check if this problem still occurs.

comment:4 by Bruno Cornec, 12 years ago

Likely, this is not the same issue. grub-install is launched chrooted, under /mnt/RESTORING, where bash is the default std shell, not busybox sh.

comment:6 by Bruno Cornec, 12 years ago

Milestone: 3.0.23.0.3

comment:7 by victor gattegno, 12 years ago

The bug is not solved in SLES, SLES 11 uses the following workaround.

The shell-script grub-install called without arguments calls :

test -x /usr/sbin/grub && \
  grep -q quit /etc/grub.conf 2>/dev/null && \
  grub --batch < /etc/grub.conf && exit 0

So, a mondo solution for SLES 11 could be

In grub-MR (mondorestore script) , modify "grub-install.unsupported" test sections (lines 109 and 115) to call:

/usr/sbin/grub-install

instead of:

/usr/sbin/grub-install.unsupported $1

And, because grub-install uses it, add to mondorescue backup (add in /etc/mindi/deplist.d/minimal.conf):

    # SLES 11 and grub
    /etc/grub.conf
    /usr/sbin/grub
    /sbin/yast2
    /lib/libncurses.so.5
    /lib/libc.so.6
    /lib/libdl.so.2

by victor gattegno, 12 years ago

Attachment: patch for SLES 11 SP2.zip added

grub-MR and minimal.conf modified

comment:8 by victor gattegno, 12 years ago

I tested the solution to modify grub-MR to call SLES 11 /usr/sbin/grub-install without argument (so that it launch grub command with the content of /etc/grub.conf).

It failed to configure well some GRUB parts, after mondorestore the machine was not booting on the hard disk, even the GRUB prompt doesn't appear on the screen.

comment:9 by victor gattegno, 12 years ago

Here is the information stored in mondorestore.log, when using SLES 11 /usr/sbin/grub-install without argument (so that it launch grub command with the content of /etc/grub.conf):

    GNU GRUB  version 0.97  (640K lower / 3072K upper memory)

grub> setup --stage2=/boot/grub/stage2 --force-lba (hd0,0) (hd0,0)
 Checking if "/boot/grub/stage1" exists... yes
 Checking if "/boot/grub/stage2" exists... yes
 Checking if "/boot/grub/e2fs_stage1_5" exists... yes
 Running "embed /boot/grub/e2fs_stage1_5 (hd0,0)"... failed (this is not fatal)
 Running "embed /boot/grub/e2fs_stage1_5 (hd0,0)"... failed (this is not fatal)
 Running "install --force-lba --stage2=/boot/grub/stage2 /boot/grub/stage1 (hd0,0) /boot/grub/stage2 p /boot/grub/menu.lst "... succeeded
Done.
grub> quit

grub-install in chroot returned 0

But, at reboot, the machine was not booting on the hard disk, even the GRUB prompt doesn't appear on the screen.

comment:10 by victor gattegno, 12 years ago

That solution didn't configure all the GRUB chain, because, after the "# mondorestore -Z nuke" the machine was not booting on the hard disk, it switched directly to the next peripheral (a CD-Live).

I tried also "# mondorestore -Z mbr", but the result in mondorestore.log is the same and the pb remains.

I manually copied the MBR (BOOTLOADER.MBR saved by mondorescue) in the machine MBR through dd. After that, at reboot it stops at the "GRUB _" prompt (flashing cursor), and doesn't boot to the OS. It's not the "grub>" prompt, I'm unable to type anything or ESC from it.

I finally booted on a SLES install CD, and repaired manually grub ; this time all succeeded (even : "embed /boot/grub/e2fs_stage1_5 (hd0)"... 17 sectors are embedded.).

Next time I'll try : # /sbin/yast2 bootloader

by victor gattegno, 12 years ago

Attachment: manually restored grub.jpg added

grub manual restoration

comment:11 by victor gattegno, 12 years ago

In my SLES 11 /etc/grub.conf file I see :

setup --stage2=/boot/grub/stage2 --force-lba (hd0,0) (hd0,0)
quit

I thought that the grub boot problem could come from there, so I modified it manually to :

setup --stage2=/boot/grub/stage2 --force-lba (hd0) (hd0,0)
quit

I tested it :

# bash -x /usr/sbin/grub-install
+ '[' 0 -gt 0 ']'
+ test -x /usr/sbin/grub
+ grep -q quit /etc/grub.conf
+ grub --batch < /etc/grub.conf


    GNU GRUB  version 0.97  (640K lower / 3072K upper memory)

 [ Minimal BASH-like line editing is supported.  For the first word, TAB
   lists possible command completions.  Anywhere else TAB lists the possible
   completions of a device/filename. ]
grub> setup --stage2=/boot/grub/stage2 --force-lba (hd0) (hd0,0)
 Checking if "/boot/grub/stage1" exists... yes
 Checking if "/boot/grub/stage2" exists... yes
 Checking if "/boot/grub/e2fs_stage1_5" exists... yes
 Running "embed /boot/grub/e2fs_stage1_5 (hd0)"...  17 sectors are embedded.
succeeded
 Running "install --force-lba --stage2=/boot/grub/stage2 /boot/grub/stage1 (hd0) (hd0)1+17 p (hd0,0)/boot/grub/stage2 /boot/grub/menu.lst"... succeeded
Done.
grub> quit
+ exit 0

The reboot after this grub-install is ok, so it confirms the origin of the problem : the /etc/grub.conf of my server was badly configured.

comment:12 by victor gattegno, 12 years ago

To check further, I put back my server /etc/grub/conf to its original configuration

setup --stage2=/boot/grub/stage2 --force-lba (hd0,0) (hd0,0)
quit

and I tested again grub-install

# cat /etc/grub.conf                  
setup --stage2=/boot/grub/stage2 --force-lba (hd0,0) (hd0,0)
quit

# /usr/sbin/grub-install

    GNU GRUB  version 0.97  (640K lower / 3072K upper memory)

grub> setup --stage2=/boot/grub/stage2 --force-lba (hd0,0) (hd0,0)
 Checking if "/boot/grub/stage1" exists... yes
 Checking if "/boot/grub/stage2" exists... yes
 Checking if "/boot/grub/e2fs_stage1_5" exists... yes
 Running "embed /boot/grub/e2fs_stage1_5 (hd0,0)"... failed (this is not fatal)
 Running "embed /boot/grub/e2fs_stage1_5 (hd0,0)"... failed (this is not fatal)
 Running "install --force-lba --stage2=/boot/grub/stage2 /boot/grub/stage1 (hd0,0) /boot/grub/stage2 p 

/boot/grub/menu.lst "... succeeded
Done.
grub> quit
#

After reboot, the machine booted well also.

I checked also another SLES 11 machine, in /etc/grub/conf it has the same syntax as my original configuration

setup --stage2=/boot/grub/stage2 --force-lba (hd0,1) (hd0,1)
quit

comment:13 by victor gattegno, 12 years ago

My test didn't worked because Device ID are used by SLES 11, usage of Device Name (/dev/sda for example) in /etc/fstab, /boot/grub/device.map and in /boot/grub/menu.lst as described in p2v document should solve the grub install issue.

So the solution for SLES 11 is the following.

In grub-MR (mondorestore script) , modify "grub-install.unsupported" test sections (lines 109 and 115) to call:

/usr/sbin/grub-install

instead of:

/usr/sbin/grub-install.unsupported $1

And, because grub-install uses it, add to mondorescue backup (add in /etc/mindi/deplist.d/minimal.conf):

    # SLES 11 and grub
    /etc/grub.conf
    /usr/sbin/grub
    /sbin/yast2
    /lib/libncurses.so.5
    /lib/libc.so.6
    /lib/libdl.so.2

comment:14 by victor gattegno, 11 years ago

I restored the mondorescue backup in nuke mode, on the backuped hard disk, in order to avoid any Device ID problem.

The restored machine didn't boot.

I checked why.

Before the backup there was no boot problem, in the following fdisk result we see that c0d0p1 partition had the "bootable flag" set.

           Device Boot      Start         End      Blocks   Id  System
/dev/cciss/c0d0p1   *        2048      258047      128000   83  Linux

I checked after the restore : the problem is that c0d0p1 partition had not the "bootable flag" set.

Solution

I booted again on the mondorestore.iso, in expert mode, and I just set manually the c0d0p1 partition "bootable flag" :

# fdisk /dev/cciss/c0d0
a
1
w

After that, the restored SLES 11 booted well on c0d0p1.

To be implemented

So for SLES 11, to the modification that I recommended in "comment: 13", should be added too:

  1. during the mondoarchive, check which partition has the bootable flag set,
  2. during the restore, set the bootable flag to the right partition.

comment:15 by Bruno Cornec, 11 years ago

So it seems that when linking this BR and #412, after opensuse 11.2 and SLES 11 SP2, grub-install was changed. The previous was renamed /usr/sbin/grub-install.unsupported and a new created with a different interface (no parameter supported).

So rev [3047] uses now the presence of /usr/sbin/grub-install.unsupported as a trigger to call /usr/sbin/grub-install without parameters.

Remains to see why the active flag is not recreated correctly in that context.

comment:16 by Bruno Cornec, 11 years ago

Could someone with a SLES SP2 distro run the following commands as root and report back (probably using a CCISS driver but also interesting without):

mindi --makemountlist /tmp/1
/usr/share/mondo/restore-scripts/mondo/make-me-bootable /tmp/1 dummy

WARNING: The second command will call fdisk to set up the bootable flag, so use it with the "dummy" key word as mentionned upper to avoid this.

If the second doesn't report the correct bootable partition, please run it with sh -x in front of it.

Last edited 11 years ago by Bruno Cornec (previous) (diff)

comment:17 by victor gattegno, 11 years ago

This is the result on our SLES 11 SP2 :

# fdisk -l | grep c0
Disk /dev/mapper/vgRAC-lvRAC doesn't contain a valid partition table
Disk /dev/mapper/vg2-lv2 doesn't contain a valid partition table
Disk /dev/mapper/vg1-lv1 doesn't contain a valid partition table
Disk /dev/mapper/vg8-lvol8 doesn't contain a valid partition table
Disk /dev/mapper/vg7-lvol7 doesn't contain a valid partition table
Disk /dev/cciss/c0d0: 250.0 GB, 250023444480 bytes
/dev/cciss/c0d0p1   *        2048      540879      269416   83  Linux
/dev/cciss/c0d0p2          540880   488327039   243893080    5  Extended
/dev/cciss/c0d0p5          542928     2767135     1112104   82  Linux swap / Solaris
/dev/cciss/c0d0p6   *     2769184   223493327   110362072   83  Linux
/dev/cciss/c0d0p7       223495376   355914343    66209484   8e  Linux LVM
/dev/cciss/c0d0p8       355916392   488327038    66205323+  8e  Linux LVM
Disk /dev/cciss/c0d1: 250.0 GB, 250023444480 bytes
/dev/cciss/c0d1p1   *        2048      540879      269416   83  Linux
/dev/cciss/c0d1p2          540880   488327039   243893080    5  Extended
/dev/cciss/c0d1p5          542928     2767135     1112104   82  Linux swap / Solaris
/dev/cciss/c0d1p6         2769184   223493327   110362072   83  Linux
/dev/cciss/c0d1p7       223495376   355914343    66209484   8e  Linux LVM
/dev/cciss/c0d1p8       355916392   488327038    66205323+  8e  Linux LVM
Disk /dev/cciss/c0d2: 250.0 GB, 250023444480 bytes
/dev/cciss/c0d2p1            2048      258047      128000   83  Linux
/dev/cciss/c0d2p2   *      258048    20738047    10240000   83  Linux
/dev/cciss/c0d2p3        20738048    22835199     1048576   82  Linux swap / Solaris
/dev/cciss/c0d2p4        22835200   488327039   232745920    5  Extended
/dev/cciss/c0d2p5        22835232    23043839      104304   83  Linux
/dev/cciss/c0d2p6        23045888    25143039     1048576   8e  Linux LVM
/dev/cciss/c0d2p7        25145088    25349887      102400   82  Linux swap / Solaris
/dev/cciss/c0d2p8        25351936    46323455    10485760   8e  Linux LVM
/dev/cciss/c0d2p9        46325504    67297023    10485760   8e  Linux LVM
/dev/cciss/c0d2p10       67299072   109242111    20971520   8e  Linux LVM

The c0d0p1 is our boot partition.

# mindi --makemountlist /tmp/1
Your mountlist will look like this:
Analyzing LVM...
        DEVICE          MOUNTPOINT      FORMAT          SIZE (MB)     LABEL/UUID     
        /dev/cciss/c0d2p10 lvm             lvm               20480                
        /dev/cciss/c0d2p8 lvm             lvm               10240                
        /dev/cciss/c0d2p6 lvm             lvm                1024                
        /dev/cciss/c0d2p9 lvm             lvm               10240                
        /dev/cciss/c0d1p8 lvm             lvm               64653                
        /dev/cciss/c0d1p7 lvm             lvm               64657                
        /dev/cciss/c0d0p8 lvm             lvm               64653                
        /dev/cciss/c0d0p7 lvm             lvm               64657                
        /dev/cciss/c0d0p5 swap            swap               1086                
        /dev/cciss/c0d0p6 /               ext3             107775                
        /dev/cciss/c0d0p1 /boot           ext2                263                
        /dev/dm-4       /data/lv7       ext3              30720                
        /dev/dm-3       /data/lv8       reiserfs          30720   

# /usr/share/mondo/restore-scripts/mondo/make-me-bootable /tmp/1 dummy
1
# cat /tmp/1
/dev/cciss/c0d2p10 lvm lvm 20971520 
/dev/cciss/c0d2p8 lvm lvm 10485760 
/dev/cciss/c0d2p6 lvm lvm 1048576 
/dev/cciss/c0d2p9 lvm lvm 10485760 
/dev/cciss/c0d1p8 lvm lvm 66205323 
/dev/cciss/c0d1p7 lvm lvm 66209484 
/dev/cciss/c0d0p8 lvm lvm 66205323 
/dev/cciss/c0d0p7 lvm lvm 66209484 
/dev/cciss/c0d0p5 swap swap 1112100 
/dev/cciss/c0d0p6 / ext3 110362072 
/dev/cciss/c0d0p1 /boot ext2 269416 
/dev/dm-4 /data/lv7 ext3 31457280 
/dev/dm-3 /data/lv8 reiserfs 31457280

comment:18 by victor gattegno, 11 years ago

About the active flag (bootable flag), mondorestore had set it, but on c0d0p6 instead of c0d0p1. So I had to set it manually to c0d0p1 through fdisk to be able to boot. It's why now you see it on both c0d0p1 and c0d0p6 in the list (in comment:17). I emailed more information to your professional email, Bruno (mondoarchive.log, mondorestore.log, etc.).

comment:19 by victor gattegno, 11 years ago

I tested again, at mondorestore boot flag is set properly to c0d0p1 if hard disk is not already formatted with a partition with a boot flag.

comment:20 by Bruno Cornec, 11 years ago

Rev [3057] should provide a correct solution to partition activation when old flags were still there.

comment:21 by Bruno Cornec, 11 years ago

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.