Opened 7 years ago

Closed 6 years ago

Last modified 6 years ago

#600 closed defect (fixed)

At boot: occur "no space left on device" errors

Reported by: vicgat Owned by: bruno
Priority: highest Milestone: 3.0.2
Component: mindi Version: 3.0.1
Severity: blocker Keywords:
Cc:

Description

With a RHEL/CentOS 5 mondo backup, mindi boot fails with a lot of "no space left of device".

I don't reproduce that problem with RHEL 6.

Five users signaled that problem with mindi 2.2.1 and mondo 3.0.1 to mondo-devel mailing-list in march 2012.

In fact the errors are generated by RHEL 5 /sbin/start_udev shell-script file, which is started at boot by mindi rcS.

If start_udev is replaced by the one of RHEL 6 there is no more "no space left of device" error message.

Nevertheless, the replacement by the one of RHEL 6 is not a solution because some items are missing in RHEL 5 for it.

For more details, check, in the mailing-list archive, the 27 march 2012 discussion with the subject: [Mondo-devel] mondorescue: no space left on device

I attach to this ticket two screenshots that a user has taken.

Attachments (11)

mindi-1.png (26.2 KB) - added by vicgat 7 years ago.
On mindi-1.png the beginning of the problem, see mindi-2.png for the rest
mindi-2.png (25.9 KB) - added by vicgat 7 years ago.
the rest of error messages, "no space left on device"
start_udev-RHEL-5.7 (4.4 KB) - added by vicgat 7 years ago.
RHEL 5 start_udev (not modified)
start_udev-RHEL-5.7-modified (4.5 KB) - added by vicgat 7 years ago.
RHEL 5 start_udev (that I modified) working well with mindi 2.1.1
start_udev-RHEL-6.1 (8.2 KB) - added by vicgat 7 years ago.
RHEL 6 start_udev (just for info), working well with mindi 2.1.1
debux-xv make_extra_nodes.doc (30.0 KB) - added by vicgat 7 years ago.
Alan start_udev "set -xv" results
testfordevdir.sh (248 bytes) - added by vicgat 7 years ago.
testfordevdir-result.txt (608 bytes) - added by vicgat 7 years ago.
test2.sh (288 bytes) - added by vicgat 7 years ago.
test2-result.txt (1.0 KB) - added by vicgat 7 years ago.
start_udev-RHEL-5.7-added-pushd (4.9 KB) - added by vicgat 7 years ago.
start_udev script modified (I added a "pushd /", and I added a popd at the end of the for loop)

Download all attachments as: .zip

Change History (27)

Changed 7 years ago by vicgat

On mindi-1.png the beginning of the problem, see mindi-2.png for the rest

Changed 7 years ago by vicgat

the rest of error messages, "no space left on device"

comment:1 Changed 7 years ago by vicgat

  • Summary changed from At boot: occur "no space left of device" errors to At boot: occur "no space left on device" errors

comment:2 Changed 7 years ago by vicgat

The error is "no space left on device", and not "no space left of device".

comment:3 Changed 7 years ago by vicgat

Today a user reported in the mailing-list that, when he downgraded mindi-2.1.1 to mindi.2.1.0 on RHEL56 and RHEL58, there were no "cp write error no space left on device".

comment:4 Changed 7 years ago by bruno

  • Priority changed from normal to high
  • Severity changed from major to blocker
  • Status changed from new to assigned

comment:5 Changed 7 years ago by vicgat

I modified the tmpfs mount section of the RHEL5 start_udev script, it's now like the tmpfs mount section of RHEL 6 start_udev.

I attach the modified start_udev here. A user tested my start_udev modified on RHEL 5 and it worked well, mindi.iso boots now fine with mindi-2.1.1 package.

Diff between the /sbin/start_udev;

# diff /sbin/start_udev /sbin/start_udev_ori
136c136
< LANG=C awk "\$2 == \"${udev_root%/}\" && ( \$3 == \"devtmpfs\" || \$3 == \"tmpfs\" ) { exit 1 }" /proc/mounts && {
---
> LANG=C awk "\$2 == \"${udev_root%/}\" && \$3 == \"tmpfs\" { exit 1 }" 
> /proc/mounts && {
145,147c145
<       # First try to mount a devtmpfs on $udev_root
<       mount -n -o mode=0755 -t devtmpfs none "$udev_root" 2>/dev/null \
<       || mount -n -o mode=0755 -t tmpfs none "$udev_root"
---
>       mount -n -o mode=0755 -t tmpfs none "$udev_root"

Changed 7 years ago by vicgat

RHEL 5 start_udev (not modified)

Changed 7 years ago by vicgat

RHEL 5 start_udev (that I modified) working well with mindi 2.1.1

Changed 7 years ago by vicgat

RHEL 6 start_udev (just for info), working well with mindi 2.1.1

comment:6 Changed 7 years ago by vicgat

Two users said that they still get "cp write error no space left on device" with mindi 2.1.1 and with the RHEL 5 start_udev that I modified.

I didn't tried it with RHEL 5, because I already upgraded to RHEL 6.

comment:7 Changed 7 years ago by bruno

The major diffrence I see between 2.1.0 and 2.1.1 for mindi is that more file are copied onto the boot media. So we may fill the ramdrive whereas before we didn't.

Could be worth changing the ramdrive_size at boot time so the EXTRA_SPACE variable in mindi to see if that improves stuff

comment:8 Changed 7 years ago by vicgat

I recommended Alan to add "set -xv" in start_udev script

Thanks for the -xv results, it sheds some light on the bug.

I see in the "debux-xv make_extra_nodes.doc" file:

+ pushd /lib/udev/devices
+ set README THIS-IS-A-RAMDISK ataraid.tgz bin cciss.tgz dev dev.static 
+ dm.tgz etc i20.tgz ida,tgz init lib lib64 linuxrc lost+found mnt 
+ nst.tgz proc raw.tgz rd.tgz root sbin symlinks.tgz sys tmp usr var 
+ vc.tgz [ read != * ]

It's why the start_udev script tries to copy all that (README, etc.) to /dev through the "cp -ar "$@" $udev_root/" command, and it's why user then get the "cp: write error: no space left on device" messages.

Normally, as Alan /lib/udev/devices directory was empty, he should have get:

 + pushd /etc/udev/devices
 /etc/udev/devices ~/test5
 + set '*'
 + '[' '*' '!=' '*' ']'

Here I get "~/test5" too because I ran a test shell-script from ~/test5 directory.

In Alan case, it seems that the "pushd /lib/udev/devices" was not successful, so /lib/udev/devices was not added to the list of currently remembered directories.

It's strange, because, with Alan "ls -al /lib/udev/devices" we see that it exists, so the "pushd /lib/udev/devices" should be successful.

If in start_udev script there was:

pushd $devdir

instead of:

pushd $devdir &> "$udev_root/null"

Maybe we could see:

+ pushd /etc/udev/devices

..... pushd: dir: No such file or directory

You'll find attached my tests on a RHEL 4.

In testfordevdir-result.txt you'll see what you should get:

 + pushd /etc/udev/devices
 /etc/udev/devices ~/test5
 + set '*'
 + '[' '*' '!=' '*' ']'

In test2-result.txt you'll see that:

  • I used "dir" instead of /etc/udev/devices
  • and I added a "pushd /"

so I got:

+ pushd /
/ ~/test5
+ pushd dir
./test2.sh: line 8: pushd: dir: No such file or directory
+ set audit bin boot dev dir1 etc home initrd lib lost+found media misc mnt opt proc root sbin selinux srv sys test2.sh test4popd.sh testhrea.410 tftpboot tmp usr var
+ '[' audit '!=' '*' ']'

Which is not a bug, because on my server I have no "dir" directory under /. But it looks similar to the bug.

I asked Alan to check it he's sure that, after the line:

+ pushd /lib/udev/devices

he saw no line before this one?

+ set README THIS-IS-A-RAMDISK .......

If there was nothing, maybe the list of currently remembered directories is empty...

Then could be added in start_udev script a "pushd /" just before the line:

pushd $devdir &> "$udev_root/null"

You'll find attached the start_udev script modified that way (I added a "pushd /", and I added a popd at the end of the for loop).

Changed 7 years ago by vicgat

Alan start_udev "set -xv" results

Changed 7 years ago by vicgat

Changed 7 years ago by vicgat

Changed 7 years ago by vicgat

Changed 7 years ago by vicgat

Changed 7 years ago by vicgat

start_udev script modified (I added a "pushd /", and I added a popd at the end of the for loop)

comment:9 Changed 7 years ago by bruno

  • Priority changed from high to highest

On the second image provided, there is a mention of recursion in cp. Maybe that's an area we need to explore more. A recursive link could be a problem here.

comment:10 Changed 7 years ago by vicgat

I submitted the idea of a shell problem, because pushd didn't worked in the original RHEL 5 start_udev script.

Moreover, in RHEL 6 "#!/bin/sh" is replaced by "#!/bin/bash" in start_udev script.

So, in the RHEL 5 original start_udev script, Alan replaced "#!/bin/sh" by "#!/bin/bash".

Then all worked fine ; he successfully created a new archive DVD and restored the server.

comment:11 Changed 7 years ago by vicgat

I found the problem:

  • in RHEL 5, 6, SLES 10, etc, /bin/sh is a soft link to /bin/bash, so no problem.
  • in mondo boot, /bin/sh is a soft link to busybox, so it calls the tiny shell embedded in busybox.

And busybox sh doesn't have pushd (nor popd) embedded, if I start a /bin/sh under busybox and if I type "pushd /", I get "pushd: not found" ; the same for popd.

If I start a /bin/sh under busybox and if I type "pushd /" it works, popd works too.

It's why I didn't got the errors with RHEL 6 start_udev, which uses /bin/bash shell instead of /bin/sh shell (used by RHEL 5 start_udev).

comment:12 Changed 7 years ago by bruno

Some good feedback provided by Stefan Heijmans:

I noticed that in mindi 2.1.0 /sbin/MAKEDEV is not there and in mindi 2.1.0 it is. /sbin/MAKEDEV is also used in /sbin/start_udev -> line 180 -> make_extra_nodes

So this made me wonder why this happens, so I did a diff on the mindi script between 2.1.0 and 2.1.1; Showing this, first part is 2.1.0 and second part is 2.1.1;

2488c2477,2493
<
---
>
>       # Handle the case where busybox and mount are dynamically linked
>       file $MINDI_LIB/rootfs/bin/busybox 2>&1 | grep -q "dynamically"
>       if [ $? -eq 0 ]; then
>               # We want to use the real mount and all the supported variants (nfs, cifs, ...)
>               rm -f bin/mount $MINDI_TMP/busy.lis
>               mountlis=`grep -E "mount|fuse|ssh" $DEPLIST_FILE $DEPLIST_DIR/* | grep -v " *#.*" | cut -d: -f2 | sort -u`
>               LocateDeps $MINDI_LIB/rootfs/bin/busybox $mountlis >> $MINDI_TMP/busy.lis
>               # Special for libs
>               for f in `grep -E "libnss" $DEPLIST_FILE $DEPLIST_DIR/* | grep -v " *#.*" | cut -d: -f2`; do
>                       echo "`ReadAllLink $f`" >> $MINDI_TMP/busy.lis
>               done
>               # Initial / are trucated by tar
>               tar cf - $mountlis `sort -u $MINDI_TMP/busy.lis` 2>> $MINDI_TMP/$$.log | tar xf - || LogIt "Problem in mount analysis"
$MINDI_TMP/$$.log
>               rm -f $MINDI_TMP/busy.lis
>       fi
>
2500,2521c2505
<       # Handle the case where busybox and mount are dynamically linked
<       file $MINDI_LIB/rootfs/bin/busybox 2>&1 | grep -q "dynamically"
<       if [ $? -eq 0 ]; then
<               # We want to use the real mount and all the supported variants (nfs, cifs, ...)
<               rm -f bin/mount
<       fi
<
<       # Copy of files from the minimal env needed as per the deplist.d/minimal.conf file (which includes all busybox deps)
<       minimallis=`grep -Ev '^#' $DEPLIST_DIR/minimal.conf`
<       rm -f $MINDI_TMP/minimal.lis
<       for f in $MINDI_LIB/rootfs/bin/busybox $minimallis; do
<               echo $f >> $MINDI_TMP/minimal.lis
<       done
<       LocateDeps $MINDI_LIB/rootfs/bin/busybox $minimallis >> $MINDI_TMP/minimal.lis
<       for f in `cat $MINDI_TMP/minimal.lis`; do
<               echo "`ReadAllLink $f`" >> $MINDI_TMP/minimal.lis
<       done
<       # Initial / are trucated by tar
<       tar cf - `sort -u $MINDI_TMP/minimal.lis` 2>> $MINDI_TMP/$$.log | tar xf - || LogIt "Problem in minimal analysis"
$MINDI_TMP/$$.log
<       rm -f $MINDI_TMP/minimal.lis
<
<       # Avoids an issue on some distro (RHEL5)
---

In mindi 2.1.1 the $DEPLIST_DIR/minimal.conf is processed and in mindi 2.1.0 only the binaries for "mount|fuse|ssh". So putting this back into mindi 2.1.1, like;

       # Handle the case where busybox and mount are dynamically linked
        file $MINDI_LIB/rootfs/bin/busybox 2>&1 | grep -q "dynamically"
        if [ $? -eq 0 ]; then
                # We want to use the real mount and all the supported variants (nfs, cifs, ...)
                rm -f bin/mount
        fi
        # Copy of files from the minimal env needed as per the deplist.d/minimal.conf file (which includes all busybox deps)
        minimallis=`grep -Ev '^#' $DEPLIST_DIR/minimal.conf`
mountlis=`grep -E "mount|fuse|ssh" $DEPLIST_FILE $DEPLIST_DIR/* | grep -v " *#.*" | cut -d: -f2 | sort -u`                      <==
extra line
        rm -f $MINDI_TMP/minimal.lis
        for f in $MINDI_LIB/rootfs/bin/busybox $mountlis; do
<== edited line
                echo $f >> $MINDI_TMP/minimal.lis
        done
        LocateDeps $MINDI_LIB/rootfs/bin/busybox $mountlis >> $MINDI_TMP/minimal.lis                                            <==
edited line
        for f in `cat $MINDI_TMP/minimal.lis`; do
                echo "`ReadAllLink $f`" >> $MINDI_TMP/minimal.lis
        done
        # Initial / are trucated by tar
        tar cf - `sort -u $MINDI_TMP/minimal.lis` 2>> $MINDI_TMP/$$.log | tar xf - || LogIt "Problem in minimal analysis"
$MINDI_TMP/$$.log
        rm -f $MINDI_TMP/minimal.lis

I created the mindi iso and it booted fine into the prompt.

comment:13 Changed 6 years ago by vicgat

I think that the better solution is to have at boot: /bin/sh a soft link to /bin/bash, instead of a soft link to busybox binary.

comment:14 Changed 6 years ago by bruno

That's indeed a solution. But I'd like to know why this is the right solution in 5.7 where it was not in 5.2 e.g. I think that the fact that /sbin/MAKEDEV is now included whereas in 2.1.0 it wasn't is the cause of the problem. I didn't had time to look at its content to be sure,

When it's here, then it's called to create some devices, which seems to make the cp afterwards failing. Which is not the case when we just skip that step in start_udev.

It still needs some digging so that we can document why this is happening. But I'm like you tempted to systematically use bash as the main shell becasue as we use more and more distribution scripts, we will have that type of issue aain in the future probably.

comment:15 Changed 6 years ago by bruno

  • Resolution set to fixed
  • Status changed from assigned to closed

As bash may be used anyway, and is now part of minimal.conf by default, it it's not removed, it will be used as the default shell. Should fix this issue with rev [3000] (interesting to see that the bug #600 is fixed by rev [3000] ! Numbers are magic ;-)

Last edited 6 years ago by bruno (previous) (diff)
Note: See TracTickets for help on using tickets.