Ticket #600 (closed defect: fixed)

Opened 14 months ago

Last modified 13 months ago

At boot: occur "no space left on device" errors

Reported by: vicgat Owned by: bruno
Priority: highest Milestone: 3.0.2
Component: mindi Version: 3.0.1
Severity: blocker Keywords:
Cc:

Description

With a RHEL/CentOS 5 mondo backup, mindi boot fails with a lot of "no space left of device".

I don't reproduce that problem with RHEL 6.

Five users signaled that problem with mindi 2.2.1 and mondo 3.0.1 to mondo-devel mailing-list in march 2012.

In fact the errors are generated by RHEL 5 /sbin/start_udev shell-script file, which is started at boot by mindi rcS.

If start_udev is replaced by the one of RHEL 6 there is no more "no space left of device" error message.

Nevertheless, the replacement by the one of RHEL 6 is not a solution because some items are missing in RHEL 5 for it.

For more details, check, in the mailing-list archive, the 27 march 2012 discussion with the subject: [Mondo-devel] mondorescue: no space left on device

I attach to this ticket two screenshots that a user has taken.

Attachments

mindi-1.png Download (26.2 KB) - added by vicgat 14 months ago.
On mindi-1.png the beginning of the problem, see mindi-2.png for the rest
mindi-2.png Download (25.9 KB) - added by vicgat 14 months ago.
the rest of error messages, "no space left on device"
start_udev-RHEL-5.7 Download (4.4 KB) - added by vicgat 14 months ago.
RHEL 5 start_udev (not modified)
start_udev-RHEL-5.7-modified Download (4.5 KB) - added by vicgat 14 months ago.
RHEL 5 start_udev (that I modified) working well with mindi 2.1.1
start_udev-RHEL-6.1 Download (8.2 KB) - added by vicgat 14 months ago.
RHEL 6 start_udev (just for info), working well with mindi 2.1.1
debux-xv make_extra_nodes.doc Download (30.0 KB) - added by vicgat 14 months ago.
Alan start_udev "set -xv" results
testfordevdir.sh Download (248 bytes) - added by vicgat 14 months ago.
testfordevdir-result.txt Download (608 bytes) - added by vicgat 14 months ago.
test2.sh Download (288 bytes) - added by vicgat 14 months ago.
test2-result.txt Download (1.0 KB) - added by vicgat 14 months ago.
start_udev-RHEL-5.7-added-pushd Download (4.9 KB) - added by vicgat 14 months ago.
start_udev script modified (I added a "pushd /", and I added a popd at the end of the for loop)

Change History

Changed 14 months ago by vicgat

On mindi-1.png the beginning of the problem, see mindi-2.png for the rest

Changed 14 months ago by vicgat

the rest of error messages, "no space left on device"

comment:1 Changed 14 months ago by vicgat

  • Summary changed from At boot: occur "no space left of device" errors to At boot: occur "no space left on device" errors

comment:2 Changed 14 months ago by vicgat

The error is "no space left on device", and not "no space left of device".

comment:3 Changed 14 months ago by vicgat

Today a user reported in the mailing-list that, when he downgraded mindi-2.1.1 to mindi.2.1.0 on RHEL56 and RHEL58, there were no "cp write error no space left on device".

comment:4 Changed 14 months ago by bruno

  • Priority changed from normal to high
  • Status changed from new to assigned
  • Severity changed from major to blocker

comment:5 Changed 14 months ago by vicgat

I modified the tmpfs mount section of the RHEL5 start_udev script, it's now like the tmpfs mount section of RHEL 6 start_udev.

I attach the modified start_udev here. A user tested my start_udev modified on RHEL 5 and it worked well, mindi.iso boots now fine with mindi-2.1.1 package.

Diff between the /sbin/start_udev;

# diff /sbin/start_udev /sbin/start_udev_ori
136c136
< LANG=C awk "\$2 == \"${udev_root%/}\" && ( \$3 == \"devtmpfs\" || \$3 == \"tmpfs\" ) { exit 1 }" /proc/mounts && {
---
> LANG=C awk "\$2 == \"${udev_root%/}\" && \$3 == \"tmpfs\" { exit 1 }" 
> /proc/mounts && {
145,147c145
<       # First try to mount a devtmpfs on $udev_root
<       mount -n -o mode=0755 -t devtmpfs none "$udev_root" 2>/dev/null \
<       || mount -n -o mode=0755 -t tmpfs none "$udev_root"
---
>       mount -n -o mode=0755 -t tmpfs none "$udev_root"

Changed 14 months ago by vicgat

RHEL 5 start_udev (not modified)

Changed 14 months ago by vicgat

RHEL 5 start_udev (that I modified) working well with mindi 2.1.1

Changed 14 months ago by vicgat

RHEL 6 start_udev (just for info), working well with mindi 2.1.1

comment:6 Changed 14 months ago by vicgat

Two users said that they still get "cp write error no space left on device" with mindi 2.1.1 and with the RHEL 5 start_udev that I modified.

I didn't tried it with RHEL 5, because I already upgraded to RHEL 6.

comment:7 Changed 14 months ago by bruno

The major diffrence I see between 2.1.0 and 2.1.1 for mindi is that more file are copied onto the boot media. So we may fill the ramdrive whereas before we didn't.

Could be worth changing the ramdrive_size at boot time so the EXTRA_SPACE variable in mindi to see if that improves stuff

comment:8 Changed 14 months ago by vicgat

I recommended Alan to add "set -xv" in start_udev script

Thanks for the -xv results, it sheds some light on the bug.

I see in the "debux-xv make_extra_nodes.doc" file:

+ pushd /lib/udev/devices
+ set README THIS-IS-A-RAMDISK ataraid.tgz bin cciss.tgz dev dev.static 
+ dm.tgz etc i20.tgz ida,tgz init lib lib64 linuxrc lost+found mnt 
+ nst.tgz proc raw.tgz rd.tgz root sbin symlinks.tgz sys tmp usr var 
+ vc.tgz [ read != * ]

It's why the start_udev script tries to copy all that (README, etc.) to /dev through the "cp -ar "$@" $udev_root/" command, and it's why user then get the "cp: write error: no space left on device" messages.

Normally, as Alan /lib/udev/devices directory was empty, he should have get:

 + pushd /etc/udev/devices
 /etc/udev/devices ~/test5
 + set '*'
 + '[' '*' '!=' '*' ']'

Here I get "~/test5" too because I ran a test shell-script from ~/test5 directory.

In Alan case, it seems that the "pushd /lib/udev/devices" was not successful, so /lib/udev/devices was not added to the list of currently remembered directories.

It's strange, because, with Alan "ls -al /lib/udev/devices" we see that it exists, so the "pushd /lib/udev/devices" should be successful.

If in start_udev script there was:

pushd $devdir

instead of:

pushd $devdir &> "$udev_root/null"

Maybe we could see:

+ pushd /etc/udev/devices

..... pushd: dir: No such file or directory

You'll find attached my tests on a RHEL 4.

In testfordevdir-result.txt you'll see what you should get:

 + pushd /etc/udev/devices
 /etc/udev/devices ~/test5
 + set '*'
 + '[' '*' '!=' '*' ']'

In test2-result.txt you'll see that:

  • I used "dir" instead of /etc/udev/devices
  • and I added a "pushd /"

so I got:

+ pushd /
/ ~/test5
+ pushd dir
./test2.sh: line 8: pushd: dir: No such file or directory
+ set audit bin boot dev dir1 etc home initrd lib lost+found media misc mnt opt proc root sbin selinux srv sys test2.sh test4popd.sh testhrea.410 tftpboot tmp usr var
+ '[' audit '!=' '*' ']'

Which is not a bug, because on my server I have no "dir" directory under /. But it looks similar to the bug.

I asked Alan to check it he's sure that, after the line:

+ pushd /lib/udev/devices

he saw no line before this one?

+ set README THIS-IS-A-RAMDISK .......

If there was nothing, maybe the list of currently remembered directories is empty...

Then could be added in start_udev script a "pushd /" just before the line:

pushd $devdir &> "$udev_root/null"

You'll find attached the start_udev script modified that way (I added a "pushd /", and I added a popd at the end of the for loop).

Changed 14 months ago by vicgat

Alan start_udev "set -xv" results

Changed 14 months ago by vicgat

Changed 14 months ago by vicgat

Changed 14 months ago by vicgat

Changed 14 months ago by vicgat

Changed 14 months ago by vicgat

start_udev script modified (I added a "pushd /", and I added a popd at the end of the for loop)

comment:9 Changed 14 months ago by bruno

  • Priority changed from high to highest

On the second image provided, there is a mention of recursion in cp. Maybe that's an area we need to explore more. A recursive link could be a problem here.

comment:10 Changed 14 months ago by vicgat

I submitted the idea of a shell problem, because pushd didn't worked in the original RHEL 5 start_udev script.

Moreover, in RHEL 6 "#!/bin/sh" is replaced by "#!/bin/bash" in start_udev script.

So, in the RHEL 5 original start_udev script, Alan replaced "#!/bin/sh" by "#!/bin/bash".

Then all worked fine ; he successfully created a new archive DVD and restored the server.

comment:11 Changed 14 months ago by vicgat

I found the problem:

  • in RHEL 5, 6, SLES 10, etc, /bin/sh is a soft link to /bin/bash, so no problem.
  • in mondo boot, /bin/sh is a soft link to busybox, so it calls the tiny shell embedded in busybox.

And busybox sh doesn't have pushd (nor popd) embedded, if I start a /bin/sh under busybox and if I type "pushd /", I get "pushd: not found" ; the same for popd.

If I start a /bin/sh under busybox and if I type "pushd /" it works, popd works too.

It's why I didn't got the errors with RHEL 6 start_udev, which uses /bin/bash shell instead of /bin/sh shell (used by RHEL 5 start_udev).

comment:12 Changed 13 months ago by bruno

Some good feedback provided by Stefan Heijmans:

I noticed that in mindi 2.1.0 /sbin/MAKEDEV is not there and in mindi 2.1.0 it is. /sbin/MAKEDEV is also used in /sbin/start_udev -> line 180 -> make_extra_nodes

So this made me wonder why this happens, so I did a diff on the mindi script between 2.1.0 and 2.1.1; Showing this, first part is 2.1.0 and second part is 2.1.1;

2488c2477,2493
<
---
>
>       # Handle the case where busybox and mount are dynamically linked
>       file $MINDI_LIB/rootfs/bin/busybox 2>&1 | grep -q "dynamically"
>       if [ $? -eq 0 ]; then
>               # We want to use the real mount and all the supported variants (nfs, cifs, ...)
>               rm -f bin/mount $MINDI_TMP/busy.lis
>               mountlis=`grep -E "mount|fuse|ssh" $DEPLIST_FILE $DEPLIST_DIR/* | grep -v " *#.*" | cut -d: -f2 | sort -u`
>               LocateDeps $MINDI_LIB/rootfs/bin/busybox $mountlis >> $MINDI_TMP/busy.lis
>               # Special for libs
>               for f in `grep -E "libnss" $DEPLIST_FILE $DEPLIST_DIR/* | grep -v " *#.*" | cut -d: -f2`; do
>                       echo "`ReadAllLink $f`" >> $MINDI_TMP/busy.lis
>               done
>               # Initial / are trucated by tar
>               tar cf - $mountlis `sort -u $MINDI_TMP/busy.lis` 2>> $MINDI_TMP/$$.log | tar xf - || LogIt "Problem in mount analysis"
$MINDI_TMP/$$.log
>               rm -f $MINDI_TMP/busy.lis
>       fi
>
2500,2521c2505
<       # Handle the case where busybox and mount are dynamically linked
<       file $MINDI_LIB/rootfs/bin/busybox 2>&1 | grep -q "dynamically"
<       if [ $? -eq 0 ]; then
<               # We want to use the real mount and all the supported variants (nfs, cifs, ...)
<               rm -f bin/mount
<       fi
<
<       # Copy of files from the minimal env needed as per the deplist.d/minimal.conf file (which includes all busybox deps)
<       minimallis=`grep -Ev '^#' $DEPLIST_DIR/minimal.conf`
<       rm -f $MINDI_TMP/minimal.lis
<       for f in $MINDI_LIB/rootfs/bin/busybox $minimallis; do
<               echo $f >> $MINDI_TMP/minimal.lis
<       done
<       LocateDeps $MINDI_LIB/rootfs/bin/busybox $minimallis >> $MINDI_TMP/minimal.lis
<       for f in `cat $MINDI_TMP/minimal.lis`; do
<               echo "`ReadAllLink $f`" >> $MINDI_TMP/minimal.lis
<       done
<       # Initial / are trucated by tar
<       tar cf - `sort -u $MINDI_TMP/minimal.lis` 2>> $MINDI_TMP/$$.log | tar xf - || LogIt "Problem in minimal analysis"
$MINDI_TMP/$$.log
<       rm -f $MINDI_TMP/minimal.lis
<
<       # Avoids an issue on some distro (RHEL5)
---

In mindi 2.1.1 the $DEPLIST_DIR/minimal.conf is processed and in mindi 2.1.0 only the binaries for "mount|fuse|ssh". So putting this back into mindi 2.1.1, like;

       # Handle the case where busybox and mount are dynamically linked
        file $MINDI_LIB/rootfs/bin/busybox 2>&1 | grep -q "dynamically"
        if [ $? -eq 0 ]; then
                # We want to use the real mount and all the supported variants (nfs, cifs, ...)
                rm -f bin/mount
        fi
        # Copy of files from the minimal env needed as per the deplist.d/minimal.conf file (which includes all busybox deps)
        minimallis=`grep -Ev '^#' $DEPLIST_DIR/minimal.conf`
mountlis=`grep -E "mount|fuse|ssh" $DEPLIST_FILE $DEPLIST_DIR/* | grep -v " *#.*" | cut -d: -f2 | sort -u`                      <==
extra line
        rm -f $MINDI_TMP/minimal.lis
        for f in $MINDI_LIB/rootfs/bin/busybox $mountlis; do
<== edited line
                echo $f >> $MINDI_TMP/minimal.lis
        done
        LocateDeps $MINDI_LIB/rootfs/bin/busybox $mountlis >> $MINDI_TMP/minimal.lis                                            <==
edited line
        for f in `cat $MINDI_TMP/minimal.lis`; do
                echo "`ReadAllLink $f`" >> $MINDI_TMP/minimal.lis
        done
        # Initial / are trucated by tar
        tar cf - `sort -u $MINDI_TMP/minimal.lis` 2>> $MINDI_TMP/$$.log | tar xf - || LogIt "Problem in minimal analysis"
$MINDI_TMP/$$.log
        rm -f $MINDI_TMP/minimal.lis

I created the mindi iso and it booted fine into the prompt.

comment:13 Changed 13 months ago by vicgat

I think that the better solution is to have at boot: /bin/sh a soft link to /bin/bash, instead of a soft link to busybox binary.

comment:14 Changed 13 months ago by bruno

That's indeed a solution. But I'd like to know why this is the right solution in 5.7 where it was not in 5.2 e.g. I think that the fact that /sbin/MAKEDEV is now included whereas in 2.1.0 it wasn't is the cause of the problem. I didn't had time to look at its content to be sure,

When it's here, then it's called to create some devices, which seems to make the cp afterwards failing. Which is not the case when we just skip that step in start_udev.

It still needs some digging so that we can document why this is happening. But I'm like you tempted to systematically use bash as the main shell becasue as we use more and more distribution scripts, we will have that type of issue aain in the future probably.

comment:15 Changed 13 months ago by bruno

  • Status changed from assigned to closed
  • Resolution set to fixed

As bash may be used anyway, and is now part of minimal.conf by default, it it's not removed, it will be used as the default shell. Should fix this issue with rev [3000] (interesting to see that the bug #600 is fixed by rev [3000] ! Numbers are magic ;-)

Last edited 8 months ago by bruno (previous) (diff)
Note: See TracTickets for help on using tickets.