Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#600 closed defect (fixed)

At boot: occur "no space left on device" errors

Reported by: victor gattegno Owned by: Bruno Cornec
Priority: highest Milestone: 3.0.2
Component: mindi Version: 3.0.1
Severity: blocker Keywords:
Cc:

Description

With a RHEL/CentOS 5 mondo backup, mindi boot fails with a lot of "no space left of device".

I don't reproduce that problem with RHEL 6.

Five users signaled that problem with mindi 2.2.1 and mondo 3.0.1 to mondo-devel mailing-list in march 2012.

In fact the errors are generated by RHEL 5 /sbin/start_udev shell-script file, which is started at boot by mindi rcS.

If start_udev is replaced by the one of RHEL 6 there is no more "no space left of device" error message.

Nevertheless, the replacement by the one of RHEL 6 is not a solution because some items are missing in RHEL 5 for it.

For more details, check, in the mailing-list archive, the 27 march 2012 discussion with the subject: [Mondo-devel] mondorescue: no space left on device

I attach to this ticket two screenshots that a user has taken.

Attachments (11)

mindi-1.png (26.2 KB ) - added by victor gattegno 12 years ago.
On mindi-1.png the beginning of the problem, see mindi-2.png for the rest
mindi-2.png (25.9 KB ) - added by victor gattegno 12 years ago.
the rest of error messages, "no space left on device"
start_udev-RHEL-5.7 (4.4 KB ) - added by victor gattegno 12 years ago.
RHEL 5 start_udev (not modified)
start_udev-RHEL-5.7-modified (4.5 KB ) - added by victor gattegno 12 years ago.
RHEL 5 start_udev (that I modified) working well with mindi 2.1.1
start_udev-RHEL-6.1 (8.2 KB ) - added by victor gattegno 12 years ago.
RHEL 6 start_udev (just for info), working well with mindi 2.1.1
debux-xv make_extra_nodes.doc (30.0 KB ) - added by victor gattegno 12 years ago.
Alan start_udev "set -xv" results
testfordevdir.sh (248 bytes ) - added by victor gattegno 12 years ago.
testfordevdir-result.txt (608 bytes ) - added by victor gattegno 12 years ago.
test2.sh (288 bytes ) - added by victor gattegno 12 years ago.
test2-result.txt (1.0 KB ) - added by victor gattegno 12 years ago.
start_udev-RHEL-5.7-added-pushd (4.9 KB ) - added by victor gattegno 12 years ago.
start_udev script modified (I added a "pushd /", and I added a popd at the end of the for loop)

Download all attachments as: .zip

Change History (27)

by victor gattegno, 12 years ago

Attachment: mindi-1.png added

On mindi-1.png the beginning of the problem, see mindi-2.png for the rest

by victor gattegno, 12 years ago

Attachment: mindi-2.png added

the rest of error messages, "no space left on device"

comment:1 by victor gattegno, 12 years ago

Summary: At boot: occur "no space left of device" errorsAt boot: occur "no space left on device" errors

comment:2 by victor gattegno, 12 years ago

The error is "no space left on device", and not "no space left of device".

comment:3 by victor gattegno, 12 years ago

Today a user reported in the mailing-list that, when he downgraded mindi-2.1.1 to mindi.2.1.0 on RHEL56 and RHEL58, there were no "cp write error no space left on device".

comment:4 by Bruno Cornec, 12 years ago

Priority: normalhigh
Severity: majorblocker
Status: newassigned

comment:5 by victor gattegno, 12 years ago

I modified the tmpfs mount section of the RHEL5 start_udev script, it's now like the tmpfs mount section of RHEL 6 start_udev.

I attach the modified start_udev here. A user tested my start_udev modified on RHEL 5 and it worked well, mindi.iso boots now fine with mindi-2.1.1 package.

Diff between the /sbin/start_udev;

# diff /sbin/start_udev /sbin/start_udev_ori
136c136
< LANG=C awk "\$2 == \"${udev_root%/}\" && ( \$3 == \"devtmpfs\" || \$3 == \"tmpfs\" ) { exit 1 }" /proc/mounts && {
---
> LANG=C awk "\$2 == \"${udev_root%/}\" && \$3 == \"tmpfs\" { exit 1 }" 
> /proc/mounts && {
145,147c145
<       # First try to mount a devtmpfs on $udev_root
<       mount -n -o mode=0755 -t devtmpfs none "$udev_root" 2>/dev/null \
<       || mount -n -o mode=0755 -t tmpfs none "$udev_root"
---
>       mount -n -o mode=0755 -t tmpfs none "$udev_root"

by victor gattegno, 12 years ago

Attachment: start_udev-RHEL-5.7 added

RHEL 5 start_udev (not modified)

by victor gattegno, 12 years ago

RHEL 5 start_udev (that I modified) working well with mindi 2.1.1

by victor gattegno, 12 years ago

Attachment: start_udev-RHEL-6.1 added

RHEL 6 start_udev (just for info), working well with mindi 2.1.1

comment:6 by victor gattegno, 12 years ago

Two users said that they still get "cp write error no space left on device" with mindi 2.1.1 and with the RHEL 5 start_udev that I modified.

I didn't tried it with RHEL 5, because I already upgraded to RHEL 6.

comment:7 by Bruno Cornec, 12 years ago

The major diffrence I see between 2.1.0 and 2.1.1 for mindi is that more file are copied onto the boot media. So we may fill the ramdrive whereas before we didn't.

Could be worth changing the ramdrive_size at boot time so the EXTRA_SPACE variable in mindi to see if that improves stuff

comment:8 by victor gattegno, 12 years ago

I recommended Alan to add "set -xv" in start_udev script

Thanks for the -xv results, it sheds some light on the bug.

I see in the "debux-xv make_extra_nodes.doc" file:

+ pushd /lib/udev/devices
+ set README THIS-IS-A-RAMDISK ataraid.tgz bin cciss.tgz dev dev.static 
+ dm.tgz etc i20.tgz ida,tgz init lib lib64 linuxrc lost+found mnt 
+ nst.tgz proc raw.tgz rd.tgz root sbin symlinks.tgz sys tmp usr var 
+ vc.tgz [ read != * ]

It's why the start_udev script tries to copy all that (README, etc.) to /dev through the "cp -ar "$@" $udev_root/" command, and it's why user then get the "cp: write error: no space left on device" messages.

Normally, as Alan /lib/udev/devices directory was empty, he should have get:

 + pushd /etc/udev/devices
 /etc/udev/devices ~/test5
 + set '*'
 + '[' '*' '!=' '*' ']'

Here I get "~/test5" too because I ran a test shell-script from ~/test5 directory.

In Alan case, it seems that the "pushd /lib/udev/devices" was not successful, so /lib/udev/devices was not added to the list of currently remembered directories.

It's strange, because, with Alan "ls -al /lib/udev/devices" we see that it exists, so the "pushd /lib/udev/devices" should be successful.

If in start_udev script there was:

pushd $devdir

instead of:

pushd $devdir &> "$udev_root/null"

Maybe we could see:

+ pushd /etc/udev/devices

..... pushd: dir: No such file or directory

You'll find attached my tests on a RHEL 4.

In testfordevdir-result.txt you'll see what you should get:

 + pushd /etc/udev/devices
 /etc/udev/devices ~/test5
 + set '*'
 + '[' '*' '!=' '*' ']'

In test2-result.txt you'll see that:

  • I used "dir" instead of /etc/udev/devices
  • and I added a "pushd /"

so I got:

+ pushd /
/ ~/test5
+ pushd dir
./test2.sh: line 8: pushd: dir: No such file or directory
+ set audit bin boot dev dir1 etc home initrd lib lost+found media misc mnt opt proc root sbin selinux srv sys test2.sh test4popd.sh testhrea.410 tftpboot tmp usr var
+ '[' audit '!=' '*' ']'

Which is not a bug, because on my server I have no "dir" directory under /. But it looks similar to the bug.

I asked Alan to check it he's sure that, after the line:

+ pushd /lib/udev/devices

he saw no line before this one?

+ set README THIS-IS-A-RAMDISK .......

If there was nothing, maybe the list of currently remembered directories is empty...

Then could be added in start_udev script a "pushd /" just before the line:

pushd $devdir &> "$udev_root/null"

You'll find attached the start_udev script modified that way (I added a "pushd /", and I added a popd at the end of the for loop).

by victor gattegno, 12 years ago

Alan start_udev "set -xv" results

by victor gattegno, 12 years ago

Attachment: testfordevdir.sh added

by victor gattegno, 12 years ago

Attachment: testfordevdir-result.txt added

by victor gattegno, 12 years ago

Attachment: test2.sh added

by victor gattegno, 12 years ago

Attachment: test2-result.txt added

by victor gattegno, 12 years ago

start_udev script modified (I added a "pushd /", and I added a popd at the end of the for loop)

comment:9 by Bruno Cornec, 12 years ago

Priority: highhighest

On the second image provided, there is a mention of recursion in cp. Maybe that's an area we need to explore more. A recursive link could be a problem here.

comment:10 by victor gattegno, 12 years ago

I submitted the idea of a shell problem, because pushd didn't worked in the original RHEL 5 start_udev script.

Moreover, in RHEL 6 "#!/bin/sh" is replaced by "#!/bin/bash" in start_udev script.

So, in the RHEL 5 original start_udev script, Alan replaced "#!/bin/sh" by "#!/bin/bash".

Then all worked fine ; he successfully created a new archive DVD and restored the server.

comment:11 by victor gattegno, 12 years ago

I found the problem:

  • in RHEL 5, 6, SLES 10, etc, /bin/sh is a soft link to /bin/bash, so no problem.
  • in mondo boot, /bin/sh is a soft link to busybox, so it calls the tiny shell embedded in busybox.

And busybox sh doesn't have pushd (nor popd) embedded, if I start a /bin/sh under busybox and if I type "pushd /", I get "pushd: not found" ; the same for popd.

If I start a /bin/sh under busybox and if I type "pushd /" it works, popd works too.

It's why I didn't got the errors with RHEL 6 start_udev, which uses /bin/bash shell instead of /bin/sh shell (used by RHEL 5 start_udev).

comment:12 by Bruno Cornec, 12 years ago

Some good feedback provided by Stefan Heijmans:

I noticed that in mindi 2.1.0 /sbin/MAKEDEV is not there and in mindi 2.1.0 it is. /sbin/MAKEDEV is also used in /sbin/start_udev -> line 180 -> make_extra_nodes

So this made me wonder why this happens, so I did a diff on the mindi script between 2.1.0 and 2.1.1; Showing this, first part is 2.1.0 and second part is 2.1.1;

2488c2477,2493
<
---
>
>       # Handle the case where busybox and mount are dynamically linked
>       file $MINDI_LIB/rootfs/bin/busybox 2>&1 | grep -q "dynamically"
>       if [ $? -eq 0 ]; then
>               # We want to use the real mount and all the supported variants (nfs, cifs, ...)
>               rm -f bin/mount $MINDI_TMP/busy.lis
>               mountlis=`grep -E "mount|fuse|ssh" $DEPLIST_FILE $DEPLIST_DIR/* | grep -v " *#.*" | cut -d: -f2 | sort -u`
>               LocateDeps $MINDI_LIB/rootfs/bin/busybox $mountlis >> $MINDI_TMP/busy.lis
>               # Special for libs
>               for f in `grep -E "libnss" $DEPLIST_FILE $DEPLIST_DIR/* | grep -v " *#.*" | cut -d: -f2`; do
>                       echo "`ReadAllLink $f`" >> $MINDI_TMP/busy.lis
>               done
>               # Initial / are trucated by tar
>               tar cf - $mountlis `sort -u $MINDI_TMP/busy.lis` 2>> $MINDI_TMP/$$.log | tar xf - || LogIt "Problem in mount analysis"
$MINDI_TMP/$$.log
>               rm -f $MINDI_TMP/busy.lis
>       fi
>
2500,2521c2505
<       # Handle the case where busybox and mount are dynamically linked
<       file $MINDI_LIB/rootfs/bin/busybox 2>&1 | grep -q "dynamically"
<       if [ $? -eq 0 ]; then
<               # We want to use the real mount and all the supported variants (nfs, cifs, ...)
<               rm -f bin/mount
<       fi
<
<       # Copy of files from the minimal env needed as per the deplist.d/minimal.conf file (which includes all busybox deps)
<       minimallis=`grep -Ev '^#' $DEPLIST_DIR/minimal.conf`
<       rm -f $MINDI_TMP/minimal.lis
<       for f in $MINDI_LIB/rootfs/bin/busybox $minimallis; do
<               echo $f >> $MINDI_TMP/minimal.lis
<       done
<       LocateDeps $MINDI_LIB/rootfs/bin/busybox $minimallis >> $MINDI_TMP/minimal.lis
<       for f in `cat $MINDI_TMP/minimal.lis`; do
<               echo "`ReadAllLink $f`" >> $MINDI_TMP/minimal.lis
<       done
<       # Initial / are trucated by tar
<       tar cf - `sort -u $MINDI_TMP/minimal.lis` 2>> $MINDI_TMP/$$.log | tar xf - || LogIt "Problem in minimal analysis"
$MINDI_TMP/$$.log
<       rm -f $MINDI_TMP/minimal.lis
<
<       # Avoids an issue on some distro (RHEL5)
---

In mindi 2.1.1 the $DEPLIST_DIR/minimal.conf is processed and in mindi 2.1.0 only the binaries for "mount|fuse|ssh". So putting this back into mindi 2.1.1, like;

       # Handle the case where busybox and mount are dynamically linked
        file $MINDI_LIB/rootfs/bin/busybox 2>&1 | grep -q "dynamically"
        if [ $? -eq 0 ]; then
                # We want to use the real mount and all the supported variants (nfs, cifs, ...)
                rm -f bin/mount
        fi
        # Copy of files from the minimal env needed as per the deplist.d/minimal.conf file (which includes all busybox deps)
        minimallis=`grep -Ev '^#' $DEPLIST_DIR/minimal.conf`
mountlis=`grep -E "mount|fuse|ssh" $DEPLIST_FILE $DEPLIST_DIR/* | grep -v " *#.*" | cut -d: -f2 | sort -u`                      <==
extra line
        rm -f $MINDI_TMP/minimal.lis
        for f in $MINDI_LIB/rootfs/bin/busybox $mountlis; do
<== edited line
                echo $f >> $MINDI_TMP/minimal.lis
        done
        LocateDeps $MINDI_LIB/rootfs/bin/busybox $mountlis >> $MINDI_TMP/minimal.lis                                            <==
edited line
        for f in `cat $MINDI_TMP/minimal.lis`; do
                echo "`ReadAllLink $f`" >> $MINDI_TMP/minimal.lis
        done
        # Initial / are trucated by tar
        tar cf - `sort -u $MINDI_TMP/minimal.lis` 2>> $MINDI_TMP/$$.log | tar xf - || LogIt "Problem in minimal analysis"
$MINDI_TMP/$$.log
        rm -f $MINDI_TMP/minimal.lis

I created the mindi iso and it booted fine into the prompt.

comment:13 by victor gattegno, 12 years ago

I think that the better solution is to have at boot: /bin/sh a soft link to /bin/bash, instead of a soft link to busybox binary.

comment:14 by Bruno Cornec, 12 years ago

That's indeed a solution. But I'd like to know why this is the right solution in 5.7 where it was not in 5.2 e.g. I think that the fact that /sbin/MAKEDEV is now included whereas in 2.1.0 it wasn't is the cause of the problem. I didn't had time to look at its content to be sure,

When it's here, then it's called to create some devices, which seems to make the cp afterwards failing. Which is not the case when we just skip that step in start_udev.

It still needs some digging so that we can document why this is happening. But I'm like you tempted to systematically use bash as the main shell becasue as we use more and more distribution scripts, we will have that type of issue aain in the future probably.

comment:15 by Bruno Cornec, 12 years ago

Resolution: fixed
Status: assignedclosed

As bash may be used anyway, and is now part of minimal.conf by defulat, it it's not removed, it will be used as the default shell. Should fix this issue with rev [3000] (interesting to see that the bug #600 is fixed by rev [3000] ! Numbers are magic ;-)

Version 0, edited 12 years ago by Bruno Cornec (next)
Note: See TracTickets for help on using tickets.