Opened 8 years ago

Closed 6 years ago

Last modified 6 years ago

#358 closed defect (fixed)

mondo never takes HP LTO SAS drive out of OBDR mode during restore

Reported by: tastle73 Owned by: bruno
Priority: normal Milestone: 3.0.0
Component: mondo Version: 2.2.9
Severity: normal Keywords:
Cc:

Description

I was testing OBDR resore in mondo/mindi with my new LTO-2 drive attached via SAS and it boot ok right up until where I think it has to switch the drive out of OBDR mode. It panics saying that /dev/nst0 is not an extended data drive

If I powercycle the drive at that point and then let it get rediscovered and type in the /dev/nst0 device, it proceeds and finishes.

Centos 5.3 x86_64 HP SAS LTO-2 drive

Attachments (3)

mondorestore.log (382.5 KB) - added by tastle73 8 years ago.
mondorestore.log
hpsa_obdr_mode.c (15.1 KB) - added by bruno 6 years ago.
Source code for the hpsa_obdr_mode program
Makefile (571 bytes) - added by bruno 6 years ago.
Makefile to build the hpsa_obdr_mode program

Download all attachments as: .zip

Change History (17)

comment:1 Changed 8 years ago by bruno

  • Status changed from new to assigned

Can you get the /var/log/mondorestore.log file generated during that restore. Even better if you can launch manually mondorestore -Z 99 to generate a more verbose one.

Changed 8 years ago by tastle73

mondorestore.log

comment:2 Changed 8 years ago by tastle73

  • Version changed from 2.2.8 to 2.2.9

comment:3 Changed 8 years ago by bruno

Ok, I see where the problem hriives.

It's not a panic. It's just that some commands to not succeed ,and the init process ask you to give the correct device instead. Of course, in your case that's the name that needs to be given.

So to fix htis issue, we need to know why the following commands are failing at that point (after the OBDR boot):

mt -f /dev/nst0 rewind
mt -f /dev/nst0 fsf 2
dd if=/dev/nst0 bs=32k count=1024 | tar -zx

I know another perso nwho did the following to its tape drive in order to make it work (but context completely different):

I have reconfigured the tape drive in non OBDR mode, in this way:

1) power cycle tape drive
2) from another shell (Alt+F2)
   - echo "scsi remove-single-device 1 0 3 0" > /proc/scsi/scsi
   - modprobe st
   - echo "scsi add-single-device 1 0 3 0" > / proc/scsi/scsi
   to reconfigure the tape drive in Sequencial-Access mode

Does it also work in your case ?

comment:4 Changed 6 years ago by bruno

  • Milestone changed from 2.2.10 to 2.2.9.8

I have now access to a similar HW configuration, and will be doing tests next week to try to reproduce it.

comment:5 Changed 6 years ago by bruno

  • Milestone changed from 3.0.0 to 3.0.1

I can confirm I see the same problem both with Firmware WS92 and WS95 on my HP DAT 160 SAS connected to a Smart Array P812 with FW 3.66 and 5.12.

In your case, you still see the drive after the boot:

  Vendor: HP        Model: Ultrium 2-SCSI    Rev: T61D
  Type:   CD-ROM                             ANSI SCSI revision: 05

which could allow for detection of this case and try doing something.

In my case, there is no device available to discuss with the Hardware. Nothing at all in /proc/scsi/scsi, nor in the dmesg output. As I have an external drive connected to a Smart Array controller, if I turn it off, then on, and do

rmmod cciss
rmmod hpsa
rmmod st
modprobe hpsa
modprobe st

then I can dialog with my tape drives, and it loads the rest from the tape.

However, if you have an internal drive, there is no way to do that !

Remains to see if I can find a software way to reset the tape from the CLI, which I've not been able to find up to now.

comment:6 Changed 6 years ago by bruno

On another case I'm working on I find:

scsi2 : cciss
  Vendor: HP        Model: DAT160            Rev: WS95
  Type:   CD-ROM                             ANSI SCSI revision: 03
sr1: scsi-1 drive

This could be due to a driver difference between SLES 10 (2.6.16.60-0.77.1-smp) with cciss and RHEL 6 (2.6.32-131.17.1.el6) with hpsa which doesn't show the device in CD-ROM mode at all.

comment:7 Changed 6 years ago by bruno

Booting the RHEL 6.1 server with hpsa having the tape in boot mode without tape, allows to boot on the native RHEL 6.1 and check that the behaviour is similar (nothing in /proc/scsi/scsi, no message detecting the tape in hpsa loading).

Using hpacucli doesn't seem to help reseting the tape in the sequential mode either (needs more research). Next step is to use the USB drive with the same OBDR tape to check what happens, and to check with another distro (SLES 10 SP3) to see whether it could be better with a different driver (cciss in that case).

comment:8 Changed 6 years ago by bruno

Booting the RHEL 6.1 server with usb_storage having the tape in boot mode allows to boot on the native RHEL 6.1 and then the tape is put back into sequential mode and the rest of the data can be accessed in that configuration (with the exact same tape that doesn't work with the SAS drive).

comment:9 Changed 6 years ago by bruno

  • Milestone changed from 3.0.1 to 3.0.0

comment:10 Changed 6 years ago by bruno

I'm working with an HP colleague to get a piece of software that will solve this issue, and can be called from the init script of mindi to put back the tape drive in Sequential mode, in case it's still in CD-ROM mode (and the reverse as well, so will allow to perform fully automated DR with OBDR in that version)

Will be handled in 3.0.0, and that additinal program should be available soon as well.

comment:11 Changed 6 years ago by bruno

This is now fixed with rev [2915] and [2913] at least for SLES 10 (cciss driver). Will check next week for RHEL 6 as well (hpsa driver). It fixes this issue by using an external program (hpsa_obdr_mode) which can set the mode of the tape to CD-ROM or Sequential at will.

That program will have to be downloaded from http://cciss.sf.net

comment:12 Changed 6 years ago by bruno

Here is an example script showing how to setup the tape based DR correctly completely from the CLI on SLES 10:

mkdir -p /mondo/images /mondo/sratch /mondo/tmp
echo "engage scsi" > /proc/driver/cciss/cciss1
modprobe st
hpsa_obdr_mode -m tape /dev/cciss/c1d0
mondoarchive -G -N -O -E "/mondo" -t -o -d /dev/st0 -T /mondo/tmp -S /mondo/scratch
hpsa_obdr_mode -m cd /dev/cciss/c1d0
#reboot

The machine is backed up and then rebooted in the OBDR mode ... without using the button ;-)

Last edited 6 years ago by bruno (previous) (diff)

comment:13 Changed 6 years ago by bruno

  • Resolution set to fixed
  • Status changed from assigned to closed

I can now confirm that on SLES 10 with rev [2918] and with the additional hpsa_obdr_mode command, the problem is fixed.

Changed 6 years ago by bruno

Source code for the hpsa_obdr_mode program

Changed 6 years ago by bruno

Makefile to build the hpsa_obdr_mode program

comment:14 Changed 6 years ago by bruno

Pending the availability of the official source code from the upstream sourceforge project mentioned upper, I attach a copy of the source and the Makefile to allow building it.

Note: See TracTickets for help on using tickets.