Opened 12 years ago

Closed 11 years ago

#172 closed defect (fixed)

kernel panic during mindi process when using mondo rescue on x86_64 HP Proliant dl360g4 (RHEL4.5)

Reported by: cdmaestas Owned by: bruno
Priority: high Milestone: 2.2.5
Component: mondo Version: 2.2.3
Severity: critical Keywords:
Cc: simo.syvajarvi@…, jgatenc@…, cdmaest@…, rdscott@…, christopher.maestas@…, jatencio@…, rdscott@…, christmasboy_81@…, shane_chartrand@…, alexrixhardson@…

Description

When Mondo Rescue iso image we get:

RAX: ffffffff803f0520 RBX: 0000000000000000 RCX: 000000000000003f
RDX: 000001007b8bfef8 RSI: 000001007ba517c8 RDI: 000001007fbf9180
RBP: 000001007b8bfef8 R08: 0000000000000000 R09: 0000000000631e98
R10: 000000000000000c R11: 0000000000000000 R12: 0000007fbfffff47
R13: 0000000000000000 R14: 00000000006376b8 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff804ed700(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0
Process find-and-mount- (pid: 1034, threadinfo 000001007b8be000, task 000001007bb70030)
Stack: ffffffff801823d7 000001007ba517c8 000001007fbf9180 0000000000000202
       ffffffff8012442d 0000000100000001 ffffffff00000000 0000000000000007
       0000000b0000000e 000001007b8bff18
Call Trace:<ffffffff801823d7>{vfs_stat64+47} <ffffffff8012442d>{do_page_fault+577}
       <ffffffff8013bdff>{do_wait+3350} <ffffffff80182717>{sys_newstat+17}
       <ffffffff801342a4>{default_wake_function+0} <ffffffff80110d91>{error_exit+0}
       <ffffffff8011026a>{system_call+126}

Code:  Bad RIP value.
RIP [<0000000000000000>] RSP <000001007b8bfe50>
CR2: 0000000000000000
 <0>Kernel panic - not syncing: Oops

All the firmware is up2date. We've been able to use Mondo Rescue on a dl360g3 (32 bit platform) without issues using RHEL 4.4)

Attachments (1)

mondoarchive.log.gz (22.4 KB) - added by jatencio 12 years ago.
mondo archive log file

Download all attachments as: .zip

Change History (40)

comment:1 Changed 12 years ago by bruno

  • Status changed from new to assigned

Could you provide the /var/log/mondoarchive.log file please ? (Or /var/log/mondo-archive.log depending on version used) on the original system.

Could you try http://trac.mondorescue.org/wiki/FAQ#Q26/Whyismykerneldoingapanicatrestoretimewhenitworksperfectlyatarchivetime

Changed 12 years ago by jatencio

mondo archive log file

comment:2 Changed 12 years ago by jatencio

I modified the following line in /usr/sbin/mindi:

ADDITIONAL_BOOT_PARAMS="apm=off devfs=nomount noresume selinux=0 barrier=off"

However, we still get the same kernel panic when attempting to boot the mondorescue image. I also uploaded the mondarchive.log to this ticket as well.

comment:3 Changed 12 years ago by cdmaestas

Bruno,

I hope you've been able to get the data we updated.

Thanks, -cdm

comment:4 Changed 12 years ago by jatencio

An update, I was able to install the archive using a 32bit mondorescue.iso image. When it come to the point when it is attempting to install grub, is fails and I cannot chroot into the environment because of the 32bit and 64bitness. However, there is option to skip the grub install so I can install it later with a rescue cd. I am not sure if anything needs to happen after the grub install for the rescue to successfully boot.

comment:5 Changed 12 years ago by jatencio

I was able to successfully restore a x86_64 image using a i386 mondorescue image. However, the problem with the x86_64 mondorescue image still exists.

comment:6 Changed 12 years ago by jatencio

Even though I was able successfully restore an image, that image will not boot and now we get the following kernel panic during boot up:

  Booting 'Red Hat Enterprise Linux WS (2.6.9-55.ELsmp)'
kernel direct mapping tables upto 10100000000 @ 8000-d000
root (hd0,0)
 Filesystem type is ext2fs, partition type 0x83
kernel /vmlinuz-2.6.9-55.ELsmp ro root=/dev/VolGroup00/LogVol00 rhgb quiet
   [Linux-bzImage, setup=0x1e00, size=0x19772d]
initrd /initrd-2.6.9-55.ELsmp.img
   [Linux-initrd @ 0x37e45000, 0x1aa725 bytes]

.
Decompressing Linux...done.
Booting the kernel.
Red Hat nash version 4.2.1.10 starting
  Reading all physical volumes.  This may take a while...
  No volume groups found
  Volume group "VolGroup00" not found
ERROR: /bin/lvm exited abnormally! (pid 480)
mount: error 6 mounting ext3
mount: error 2 mounting none
switchroot: mount failed: 22
umount /initrd/dev failed: 2
Kernel panic - not syncing: Attempted to kill init!

comment:7 Changed 12 years ago by bruno

The kernel panic you have at boot time now is due to the fact there is a problem restoring the LVM environment of your system. (No VolGroup? found so unable to mount the root FS).

Can you save at the en d of the restore the /tmp/mondorestore.log file so that I can see what happens during restore.

Of course you're right that you can pass the grub stage and restore it with another rescue CD later, but I guess you tried and succeed in that.

For the kernel panic on the first case (x86_64), what would be possible to help diagnose is to change at archiving time the script /usr/lib64/mindi/rootfs/sbin/find-and-mount-cdrom to add a

set -x

command after the shebang at the begining. that way we should be able to see at restore time, which part of the script is causing problem (there are tests on various devices, may one triggers the anomaly.)

Last point, you use 2.2.4 which is NOT officialy released yet. But in your case I don't think it would change anything, I even consider it should be bettre than the current official 2.2.3.

comment:8 Changed 12 years ago by Krisztian_Toth

I'm able to reproduce the problem on a HP BL460.

My environment:

RHAS4 U5 x86_64
Kernel: 2.6.9-55.ELsmp

Mondorescue packages:

afio-2.4.7-1.x86_64.rpm
buffer-1.19-1.x86_64.rpm
mindi-1.2.4-1.rhel4.i586.rpm
mindi-busybox-1.2.2-3.rhel4.i586.rpm
mondo-2.2.4-1.rhel4.i586.rpm
mondo-doc-2.2.4-1.rhel4.noarch.rpm

My partition table:

# fdisk -l /dev/cciss/c0d0

Disk /dev/cciss/c0d0: 73.3 GB, 73372631040 bytes
255 heads, 63 sectors/track, 8920 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

           Device Boot      Start         End      Blocks   Id  System
/dev/cciss/c0d0p1   *           1          14      112423+  83  Linux
/dev/cciss/c0d0p2              15        8920    71537445   8e  Linux LVM


My filesystems:

/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
/dev/cciss/c0d0p1 on /boot type ext3 (rw)

I created my backups on NFS. Here's the command what I used:

mondoarchive -O9 -l GRUB -F -p Krisztian_Toth -s 700M -d /my_system -S /mondo_tmp -T /mondo_tmp -E "/mondo_tmp /backup" -N -n 192.168.128.77:/exports

I modified /usr/sbin/mindi because I encountered with another problem. So I forced the following modules:

FORCE_MODS="nls_utf8 sr_mod md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc ds yenta_socket pcmcia_core button battery ac bnx2 dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2400 cciss qla2xxx scsi_transport_fc usb_storage uhci_hcd ohci_hcd ehci_hcd sd_mod scsi_mod"

I used ILO2 Virtual Media and NFS to restore my system. I used ILO2 to boot my first mondorescue CD and NFS to obtain the data itself.

So that was my environment.

After I tried some configuration changes to find what is the root of the problem.

A little summary of my actions:

  • I tried to install afio-2.4.7-1.i586.rpm and buffer-1.19-1.i386.rpm instead of afio-2.4.7-1.x86_64.rpm and buffer-1.19-1.x86_64.rpm. I created a new backup. Crash occurred.
  • I tried to use 2.6.9-55.EL instead of 2.6.9-55.ELsmp. I hope crash won't be occurred in an non smp environment. I created a new backup. Crash occured.
  • I tried to test different version of kernels. So I first installed the previous version of the kernel to check it is works or not. I installed 2.6.9-42.0.10.ELsmp which was the latest Red Hat provided kernel before 2.6.9-55.ELsmp. I put back the afio-2.4.7-1.x86_64.rpm and the afio-2.4.7-1.x86_64.rpm. The backup and the restore was successful.
  • I put "set -x" to the /usr/lib/mindi/rootfs/sbin/find-and-mount-cdrom to see what happened before the crash:
+ ln -sf /dev/scd0 /dev/cdrom
+ [ 0 -ne 0 ]
+ LogIt CD-ROM found at /dev/scd0
+ mount /mnt/cdrom
+ mount: /dev/cdrom is write-protected, mounting read-only
+ [ 0 -ne 0 ]
+ [ ! -d /mnt/cdrom/archive ]
<<<< The crash occurred here >>>>>
Unable to handle kernel NULL pointer deference at 0000000000000000 RIP:
...

Summary:

  • I was able to reproduce the problem.
  • The crash won't be occurred if you use the latest mondorescue packages (see my package list above) and you use the 2.6.9-42.0.10.ELsmp kernel.

Conclusion?

  • More testing is needed, however there is something strange with the 2.6.9-55.ELsmp. Maybe something is changed which in not "tolerated" by mindi or this kernel version introduced a new kernel bug.

Best Regards, Krisztian Toth

comment:9 Changed 12 years ago by Krisztian_Toth

I also tried the above test on an RHAS 4 U5 i386 environment. I can clarify I encountered with the same results. This is not a x86_64 specific problem. So for my results see my previous post.

comment:10 Changed 12 years ago by jatencio

Hello Bruno,

I have not been successful in my attempt to save /tmp/mondoarvhive.log after restoring. I tried using a rhel4.5 rescue cd, however, it too was unable to mount the root filesystem after I restored the file system.

Since it appears to be a 2.6.9-55.EL issue, has there been any progress as to what the problem may be?

Thanks,

Jonathan

comment:11 Changed 12 years ago by bruno

The work around is to step back and use the previous kernel version as for now.

i'll try to reproduce it in a pure RH context so that I can open a bug report on RH's bugzilla.

comment:12 Changed 12 years ago by anonymous

  • Milestone 2.2.4 deleted

Milestone 2.2.4 deleted

comment:13 Changed 12 years ago by bruno

  • Milestone set to 2.2.5

comment:14 Changed 12 years ago by sciencewhiz

  • Cc christmasboy_81@… added

Anyone had a chance to try this with 2.6.9-55.0.2?

comment:15 Changed 12 years ago by gavro

Im sad to say that the same goes for 2.6.9-55.0.2... Is there any other option beside rebooting an older kernel to make a working backup with mondoarchive? Rebooting a productionserver is something that is not really appreciated...

comment:16 Changed 12 years ago by gavro

Has anyone already tried it with kernel-2.6.9-55.0.6?

comment:17 Changed 12 years ago by nico

I tried mondorestore with kernel-2.6.9-55.0.6.ELsmp (32bit) on a HP Proliant DL380G5 and ran into the kernel-panic too. :-(

My rpms: buffer-1.19-1 mindi-busybox-1.2.2-3.rhel4 mondo-2.2.4-1.rhel4 mindi-1.2.4-1.rhel4 afio-2.4.7-1

comment:18 Changed 12 years ago by sciencewhiz

looks like it still happens with 2.6.9-55.0.9.EL

comment:19 Changed 12 years ago by bruno

  • Cc simo.syvajarvi@… added
  • Component changed from mindi-kernel to mondo

I've just made a full backup restore of a BL 460 G1 (same family used by Kristian) with 2.6.9-55 and mondo-2.2.5 + mindi-1.2.5 (from ftp://ftp.mondorescue.org/rhel/4/) with success.

Could anyone of you having pb with 2.2.4 try again with that version to see if it's also fixed for you ?

My main problem is that I do not have a clear idea of what fixes the issue :-(

comment:20 Changed 12 years ago by nico

Unfortunately the issuie is not fixed for me. Used the newest mindi-1.2.5-1.rhel4 for creating the bootdvd, made a full-backup on the HP Proliant DL380G5, but got the same kernel-panic with kernel 2.6.9-55.0.9.ELsmp during boot the dvd.

comment:21 Changed 12 years ago by ylihemmo

Not working here either on ProLiant? BL685c (rhel 4.5)

Tried it with:

mondo-2.2.5-1.rhel4 mindi-1.2.5-1.rhel4

and ADDITIONAL_BOOT_PARAMS="apm=off devfs=nomount noresume selinux=0 barrier=off"

comment:22 Changed 12 years ago by nico

Tried once more on DL380G5 and DL580G3, RHEL 4.5. Not working on both maschines. Tried ylihemmo's addtional_boot_params without success.

comment:23 Changed 11 years ago by bruno

So the bug is clearly in find-and-mount-cdrom when unmounting the CD. When editing the script and adding a call to sh before the umount everything is fine. Typing umount /dev/hda in another shell doesn't trigger it. A sequence of 20 mount/umount doesn't either, but in the shell it's triggered everytime. Even with 2.2.5 as of 2007-10-31. selinux=0 doesn't change anything either.

Still searching.

comment:24 Changed 11 years ago by bruno

Of course (in case it wasn't clear earlier) you may use 2.2.4 with the -k /boot/vmlinuz-2.6.9-34.ELsm e.g so that only mondo uses the old kernel for the time of the restore, without changing the fact that the kernel used at run time and after restore time is -55*

Hope this helps as a workaround

comment:25 Changed 11 years ago by triumvir

Try to mount the /mnt/cdrom in the "find-and-mount-cdrom" function at line 43 not with "mount /mnt/cdrom" but with "mount $device -t iso9660 -o ro /mnt/cdrom 2> /tmp/mount.log" like the test if it is a cdrom.

comment:26 Changed 11 years ago by bruno

Fixed with latest 2.2.5 + mindi 2.0.0 + mindi-busybox 1.7.3 Please check on your side.

comment:27 Changed 11 years ago by julien

I'm sorry, but that problem is not fixed with latest 2.2.5 + mindi 2.0.0 + mindi-busybox 1.7.3.

I'm using :

[root@TSM... ~]# rpm -qa | grep mindi mindi-busybox-1.7.3-1.rhel4 mindi-2.0.0-1.rhel4 [root@TSM.. ~]# rpm -qa | grep mondo mondo-2.2.5-1.rhel4 [root@TSM... ~]# arch x86_64

My server: HP PROLIANT BL465c G1...and the same problem prevent me to restore backups from ISO CD ...

Code: Bad RIP value. RIP [<0000000000000000>] RSP <000001012bac9e50> CR2: 0000000000000000

<0>Kernel panic - not syncing: Oop

comment:28 Changed 11 years ago by sciencewhiz

Still having the problem with 2.6.9-55.0.12.EL and the latest 2.2.5 + mindi 2.0.0 + mindi-busybox 1.7.3 from 12/14.

comment:29 Changed 11 years ago by siffland

I have been getting the same error here at work using mondo 2.2.4 and now 2.2.5 . I am using RHEL 4 update 5 with kernel vmlinuz-2.6.9-55.0.2.ELsmp. it also appears to be the find-and-mount causing the error. Like others on here I am eager to try to get this resolved so I can use Mondo. I have some ProLiant? BL460c G1 Blades that are not going into production for about 3 weeks and am willing to do a bit of teting and send logs if you think this will aid in resolving this problem.

Sean

comment:30 Changed 11 years ago by alan.walker

I am also seeing a kernel panic when booting from a Mondorescue restore CD using Centos (RHEL)4.5 kernel 2.6.9-55.0.12.ELsmp on a HP DL380G5, but when using Mondoarchive with the -k option and specifying a 2.6.9-42.ELsmp kernel image it seems to work OK.

I am using the following packages: afio-2.4.7-1.i586.rpm buffer-1.19-1.i386.rpm mindi-1.2.0-2.rhel4.i586.rpm mindi-busybox-1.2.2-2.rhel4.i586.rpm mondo-2.2.0-2.rhel4.i586.rpm mondo-doc-2.2.0-2.rhel4.noarch.rpm

(I had tried using more recent versions but then I could not even do a successful backup, as it is I find that I have to eject the tape during a restore while it is searching for the file lists to make it continue, otherwise it just seems to hang there)

I have a screen photo of the panic message, if it helps. Thanks for the -k suggestion above, Alan.

comment:31 Changed 11 years ago by adrianmarsh

Did this get progressed? I've been hanging onto Centos 2.6.9-42.0.10.EL Just tried it in a VM using 2.6.9-67.0.1.EL.plus.c4 and saw the same kernel panic using RPMs :

mindi-1.2.4-1.rh9 mindi-busybox-1.2.2-3.rh9 mondo-2.2.4-1.rh9

comment:32 Changed 11 years ago by shanec

  • Cc shane_chartrand@… added

comment:33 Changed 11 years ago by shanec

I am having the same issue with the 2.6.9-67.ELsmp Redhat es4.

I have the following installed:

mindi-1.2.4-1.rhel4.i586.rpm mindi-busybox-1.2.2-3.rhel4.i586.rpm mondo-2.2.4-1.rhel4.i586.rpm

Can anyone point me to instructions on installing and configuring the 2.2.4 linux kernel for use with the -k option?

comment:34 Changed 11 years ago by alexrixhardson

The same problem also persist in the newest RHEL4 kernel: 2.6.9-67.0.4.ELsmp.

Does anyone know if there is some other fairly new vmlinuz available that I could use in order to avodi this problem? (it should however support latest RAID controllers)

comment:35 Changed 11 years ago by alexrixhardson

  • Cc alexrixhardson@… added

comment:36 Changed 11 years ago by bruno

I've updated the test version of 2.2.5 to fix the call to mount that was creating that issue. Could any of you having that issue test and report back please ?

Cf: ftp://ftp.mondorescue.org/rhel/4/test

comment:37 Changed 11 years ago by amaura

Hello,

I am running RHEL 4.6 x86_64 (kernel 2.6.9-67.0.4) on a Dell PE 2950 and I had the kernel panic problem with version 2.2.4. And your test version of 2.2.5 solved the problem. It also solved the problem on a virtual machine running RHEL 4.5 32 bits (kernel 2.6.9-55).

However, although everything seems to be ok on the VM, on kernel 2.6.9-67.0.4 x86_64 I've got a problem of mounting ext3. Indeed the live boot CD created doesn't support file support ext3 (ext2 is ok but I do not want to mount ext2) and therefore I've got an error at the mountlist step :(. Any idea ?

comment:38 Changed 11 years ago by bruno

Thanks for your report.

Concerning your other issue, I cn't do much without logs. I guess an additional module may be needed or something like that. However, I'd prefer you open another ticket to follow that issue.

Do any of the other people having that issue want to report their feedback ?

comment:39 Changed 11 years ago by bruno

  • Resolution set to fixed
  • Status changed from assigned to closed

Should be fixed with official 2.2.5. Reopen if you still experiment an issue with that.

Note: See TracTickets for help on using tickets.