Opened 19 years ago
Closed 17 years ago
#82 closed defect (worksforme)
Archiving to tape leaves out most files (biggiefiles are ok)
| Reported by: | andree | Owned by: | andree |
|---|---|---|---|
| Priority: | normal | Milestone: | 2.2.7 |
| Component: | mondo | Version: | 2.0.9 |
| Severity: | normal | Keywords: | |
| Cc: |
Description
The undelying reason os that mondoarchive calls function make_afioballs_and_images_OLD() rather than make_afioballs_and_images() for tape archives. (This happens in function make_those_afios_phase().) At least this explains why there is different behaviour.
Changing the code so it calls make_afioballs_and_images() works fine until towards the end of archiving the normal files where it segfaults. Stupidly, gdb segfaults as well when this happens and I can't even get a backtrace. :-(
The screenout looks like this:
---progress-form---4--- TASK: [********************] 97% done; 0:32 to go ---progress-form---1--- I am backing up your live filesystem now. ---progress-form---2--- Volume 1: 1,181,312 kilobytes archived so far ---progress-form---3--- Please wait. This may take a couple of hours. ---progress-form---E--- ---progress-form---4--- TASK: [********************] 98% done; 0:21 to go ---progress-form---1--- I am backing up your live filesystem now. ---progress-form---2--- Volume 1: 1,181,312 kilobytes archived so far ---progress-form---3--- Please wait. This may take a couple of hours. ---progress-form---E--- ---progress-form---4--- TASK: [********************] 98% done; 0:21 to go ---progress-form---1--- I am backing up your live filesystem now. ---progress-form---2--- Volume 1: 1,190,016 kilobytes archived so far ---progress-form---3--- Please wait. This may take a couple of hours. ---progress-form---E--- ---progress-form---4--- TASK: [********************] 99% done; 0:10 to go (Unknown) signal received from OS Fatal error... Mondoarchive is terminating in response to a signal from the OS Fatal error... Execution run aborted (main loop) Execution run ended; result=1 Type 'less /var/log/mondo-archive.log' to see the output log
The final rows of mondo-archive.log are this:
[Main] libmondo-archive.c->archive_this_fileset#527: command='afio -o -b 131072 -M 16m -Z -P bzip2 -G 3 -T 3k -E /usr/share/mondo/do-not-compress-these /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/tmpfs/117.afio.bz2 < /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/filelist.117 2>> /var/log/mondo-archive.log'
[Main] libmondo-archive.c->make_afioballs_and_images#1729: AL DEBUG: At end of if in make_afioballs_and_images()
[Main] libmondo-archive.c->make_afioballs_and_images#1683: Storing set 116
[Main] libmondo-files.c->does_file_exist#261: /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/filelist.116 exists
[Main] libmondo-files.c->does_file_exist#261: /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/tmpfs/116.afio.bz2 exists
running: cp -f /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/tmpfs/116.afio.bz2 /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/tmpfs/backcatalog > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err
--------------------------------start of output-----------------------------
--------------------------------end of output------------------------------
...ran just fine. :-)
[Main] libmondo-archive.c->make_afioballs_and_images#1699: libmondo-archive.c, make_afioballs_and_images, 1699: Writing EXAT files
[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_START_EXTENDED_ATTRIBUTES (fname=xattr_list.116.gz, size=0 K)
[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_START_EXAT_FILE (fname=xattr_list.116.gz, size=0 K)
[Main] libmondo-stream.c->write_file_to_stream_from_file#1620: Writing file 'xattr_list.116.gz' to tape (0 KB)
[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_START_FILE (fname=xattr_list.116.gz, size=0 K)
[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_STOP_FILE (fname=728eeead8447, size=0 K)
[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_STOP_EXAT_FILE (fname=xattr_list.116.gz, size=-1 K)
[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_START_EXAT_FILE (fname=acl_list.116.gz, size=0 K)
[Main] libmondo-stream.c->write_file_to_stream_from_file#1620: Writing file 'acl_list.116.gz' to tape (0 KB)
[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_START_FILE (fname=acl_list.116.gz, size=0 K)
[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_STOP_FILE (fname=76103a4c9048, size=0 K)
[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_STOP_EXAT_FILE (fname=acl_list.116.gz, size=-1 K)
[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_STOP_EXTENDED_ATTRIBUTES (fname=xattr_list.116.gz, size=0 K)
[Main] libmondo-files.c->does_file_exist#261: /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/tmpfs/116.afio.bz2 exists
[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_START_AN_AFIO_OR_SLICE (fname=116.afio.bz2, size=13184 K)
[Main] libmondo-stream.c->write_file_to_stream_from_file#1620: Writing file '116.afio.bz2' to tape (13184 KB)
[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_START_FILE (fname=116.afio.bz2, size=13184 K)
[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_STOP_FILE (fname=0d7e3982cbca, size=0 K)
[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_STOP_AN_AFIO_OR_SLICE (fname=finished-writing-file, size=0 K)
[Main] libmondo-archive.c->make_afioballs_and_images#1729: AL DEBUG: At end of if in make_afioballs_and_images()
[Main] libmondo-archive.c->make_afioballs_and_images#1729: AL DEBUG: At end of if in make_afioballs_and_images()
[Main] libmondo-archive.c->make_afioballs_and_images#1729: AL DEBUG: At end of if in make_afioballs_and_images()
[Main] libmondo-archive.c->create_afio_files_in_background#1222: [8973:0] - archived 117 OK
[Main] libmondo-files.c->does_file_exist#258: /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/filelist.118 does not exist
[Main] libmondo-archive.c->create_afio_files_in_background#1233: [8973:0] - exiting
(Unknown) signal received from OS
[Main] libmondo-fifo.c->kill_buffer#247: kill_buffer() --- command = ps auxww | grep -F "buffer -m 32m -p75 -B -s32768 -o /dev/ht0 2>> /var/log/mondo-archive.log" | grep -Fv grep | awk '{print $2;}' | grep -v PID | tr -s '
' ' ' | awk '{ print $1; }'
[Main] libmondo-fifo.c->kill_buffer#250: kill_buffer() --- command = kill 2800
running: kill 2800 > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err
--------------------------------start of output-----------------------------
--------------------------------end of output------------------------------
...ran just fine. :-)
[Main] newt-specific.c->fatal_error#377: Fatal error received - 'Mondoarchive is terminating in response to a signal from the OS'
[Main] newt-specific.c->fatal_error#395: OK, I think I'm the main PID.
[Main] newt-specific.c->fatal_error#403: I'm going to do some cleaning up now.
[Main] newt-specific.c->fatal_error#404: killall mindi 2> /dev/null
running: kill `ps auxww | grep " /mondo/do-not " | awk '{print $1;}' | grep -vx "\?"` > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err
--------------------------------start of output-----------------------------
sh: line 0: kill: root: arguments must be process or job IDs
sh: line 0: kill: root: arguments must be process or job IDs
--------------------------------end of output------------------------------
...ran with res=256
running: kill `ps auxww | grep " tmp.mondo " | awk '{print $1;}' | grep -vx "\?"` > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err
--------------------------------start of output-----------------------------
sh: line 0: kill: root: arguments must be process or job IDs
sh: line 0: kill: root: arguments must be process or job IDs
sh: line 0: kill: root: arguments must be process or job IDs
--------------------------------end of output------------------------------
...ran with res=256
running: kill `ps auxww | grep " ntfsclone " | awk '{print $1;}' | grep -vx "\?"` > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err
--------------------------------start of output-----------------------------
sh: line 0: kill: root: arguments must be process or job IDs
sh: line 0: kill: root: arguments must be process or job IDs
sh: line 0: kill: root: arguments must be process or job IDs
--------------------------------end of output------------------------------
...ran with res=256
[Main] libmondo-archive.c->make_afioballs_and_images#1729: AL DEBUG: At end of if in make_afioballs_and_images()
[Main] newt-specific.c->fatal_error#377: Fatal error received - 'Execution run aborted (main loop)'
[Main] newt-specific.c->fatal_error#395: OK, I think I'm the main PID.
[Main] newt-specific.c->fatal_error#397: ...I'm already exiting. Give me time, Julian!
[Main] libmondo-files.c->register_pid#812: Unregistering PID
running: umount /mnt/cdrom > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err
--------------------------------start of output-----------------------------
umount: /mnt/cdrom: not mounted
--------------------------------end of output------------------------------
...ran with res=256
running: rm -Rf /mondo.scratch.* /tmp.mondo.* > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err
--------------------------------start of output-----------------------------
--------------------------------end of output------------------------------
...ran just fine. :-)
running: rm -Rf /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027 /home/mondorescue/scratch/mondo.scratch.7496/mondo.scratch.11697 > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err
--------------------------------start of output-----------------------------
rm: cannot remove directory `/home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/tmpfs': Device or resource busy
--------------------------------end of output------------------------------
...ran with res=256
[Main] libmondo-tools.c->do_libmondo_global_strings_thing#1586: libmondo-tools.c, do_libmondo_global_strings_thing, 1586: Freeing globals
All not very informative I'm afraid.
We can either try to figure out what the differences in behaviour between make_afioballs_and_images_OLD() and than make_afioballs_and_images() are and fix this. Or we can try to fix above segmentation fault.
If you have an idea, Bruno, please let me know. Otherwise I'll keep plodding along...
(As a side note, the kill command fails because it should be $2 in the awk call not $1- that or ps_options needs be different.)
Change History (14)
comment:1 by , 19 years ago
comment:2 by , 19 years ago
> The undelying reason os that mondoarchive calls function
> make_afioballs_and_images_OLD() rather than make_afioballs_and_images()
> for tape archives. (This happens in function make_those_afios_phase().) At
> least this explains why there is different behaviour.
The difference, as it is documented, is :
use the singlethreaded make_afioballs_and_images_OLD()
instead of the multithreaded make_afioballs_and_images()
if backing up to tape
(It's nearly the first time I find a useful comment in mondo ;-)
{{{
> [Main] libmondo-files.c->does_file_exist#258:
> /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/filelist.118 does not
> exist
> [Main] libmondo-
> archive.c->create_afio_files_in_background#1233:
> [8973:0] - exiting
> (Unknown) signal received from OS
}}}
Isn't there any link between the fact that the latest doesn't exist
(which should be always the case), that we are with a tape, and
multi-threaded ?
{{{
> running: kill `ps auxww | grep " /mondo/do-not " | awk '{print $1;}' |
> grep -vx "\?"` > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-
> thing.err
> --------------------------------start of
> output-----------------------------
> sh: line 0: kill: root: arguments must be process or job IDs
}}}
> We can either try to figure out what the differences in behaviour between
> make_afioballs_and_images_OLD() and than make_afioballs_and_images() are
> and fix this. Or we can try to fix above segmentation fault.
I'm all for removing a function if we can obsolete it.
> If you have an idea, Bruno, please let me know. Otherwise I'll keep
> plodding along...
Not like that, I need to have a closer look.
> (As a side note, the kill command fails because it should be $2 in the awk
> call not $1- that or ps_options needs be different.)
ps_options was introduced recently and should be OK.
Clearly for me $1 is wrong for a ps result. It's always $2.
Bruno.
comment:3 by , 19 years ago
| Milestone: | → 2.2.1 |
|---|---|
| Owner: | changed from to |
comment:4 by , 19 years ago
| Status: | new → assigned |
|---|
Some more information:
- only happens on i386, amd64 is not affected
- does not seem to depend on kernel version - bug occurs both with 2.6.18 and 2.6.16
The reason for make_afioballs_and_images_OLD() skipping the normal files is this:
[Main] libmondo-archive.c->make_afioballs_and_images_OLD#2379: Archiving set 0 [Main] libmondo-files.c->does_file_exist#261: /home/mondorescue/tmp/tmp.mondo.11723/tmp.mondo.2908/filelist.0 exists [Main] libmondo-files.c->does_file_exist#261: /usr/share/mondo/do-not-compress-these exists [Main] libmondo-files.c->does_file_exist#261: /home/mondorescue/tmp/tmp.mondo.11723/tmp.mondo.2908 exists [Main] libmondo-files.c->does_file_exist#261: /home/mondorescue/scratch/mondo.scratch.23479/mondo.scratch.18669 exists [Main] libmondo-archive.c->archive_this_fileset#527: command='afio -o -b 131072 -M 16m -Z -P bzip2 -G 3 -T 3k -E /usr/share/mondo/do-not-compress-these /home/mondorescue/tmp/tmp.mondo.11723/tmp.mondo.2908/tmpfs/0.afio.bz2 < /home/mondorescue/tmp/tmp.mondo.11723/tmp.mondo.2908/filelist.0 2>> /var/log/mondo-archive.log' [Main] libmondo-archive.c->archive_this_fileset#552: min(free_ramdisk_space) is now 335 running: cp -f /home/mondorescue/tmp/tmp.mondo.11723/tmp.mondo.2908/tmpfs/0.afio.bz2 /tmpfs/backcatalog > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err --------------------------------start of output----------------------------- cp: cannot create regular file `/tmpfs/backcatalog': No such file or directory --------------------------------end of output------------------------------ ...ran with res=256 [Main] libmondo-stream.c->maintain_collection_of_recent_archives#604: Warning - failed to copy /home/mondorescue/tmp/tmp.mondo.11723/tmp.mondo.2908/tmpfs/0.afio.bz2 to backcatalog at /tmpfs/backcatalog
The backcatalog directory can not be created because variable td of maintain_collection_of_recent_archives() is blank.
Adding a log_it() debug line to output bkpinfo->tmpdir before the call to maintain_collection_of_recent_archives() in make_afioballs_and_images_OLD() makes this go away but then make_afioballs_and_images_OLD() loops forever doing fileset 1.
This looks like a memory allocation bug (what else is new).
comment:5 by , 19 years ago
I wrote:
The backcatalog directory can not be created because variable td of maintain_collection_of_recent_archives() is blank.
This looks like a compiler bug in gcc 4.1.2. When I compile with -O0 instead of -O2, td has the correct value.
comment:6 by , 19 years ago
Tested with gcc 3.4.6 with same result. Also, to clarify, gcc 4.1.2 in previous comment is actually "4.1.2 20061007 (prerelease)".
comment:7 by , 19 years ago
Have gone back now to trying with make_afioballs_and_images(), i.e.:
--- mondo-2.20.orig/mondo/common/libmondo-archive.c
+++ mondo-2.20/mondo/common/libmondo-archive.c
@@ -2468,7 +2468,7 @@
"Using single-threaded make_afioballs_and_images() to suit b0rken FreeBSD 5.0");
res = make_afioballs_and_images_OLD(bkpinfo);
#else
- res = make_afioballs_and_images_OLD(bkpinfo);
+ res = make_afioballs_and_images(bkpinfo);
#endif
write_header_block_to_stream((off_t)0, "stop-afioballs",
BLK_STOP_AFIOBALLS);
I have also turned off singal handling to avoid gdb crashing like this:
--- mondo-2.20.orig/mondo/mondoarchive/main.c
+++ mondo-2.20/mondo/mondoarchive/main.c
@@ -121,7 +121,7 @@
//static char cvsid[] = "$Id: main.c 814 2006-09-23 08:36:11Z bruno $";
/************************* external variables *************************/
-extern void set_signals(int);
+//extern void set_signals(int);
extern int g_current_media_number;
extern void register_pid(pid_t, char *);
extern int g_currentY;
@@ -315,7 +315,7 @@
log_msg(9, "This");
register_pid(g_main_pid, "mondo");
- set_signals(TRUE); // catch SIGTERM, etc.
+ //set_signals(TRUE); // catch SIGTERM, etc.
run_program_and_log_output("dmesg -n1", TRUE);
log_msg(9, "Next");
@@ -538,7 +538,7 @@
}
distro_specific_kludges_at_end_of_mondoarchive();
register_pid(0, "mondo");
- set_signals(FALSE);
+ //set_signals(FALSE);
chdir("/tmp"); // just in case there's something wrong with g_erase_tmpdir_and_scratchdir
system(g_erase_tmpdir_and_scratchdir);
free_libmondo_global_strings();
Result is that I am getting pretty much the same behaviour as described above now for make_afioballs_and_images_OLD(). Now it is a double-free error for storing_filelist_fname (line 1748 in libmondo-archive.c). Compiling with -O0 instead of -O2 makes the problem go away. I suspect something fishy going on with move_files_to_cd().
comment:8 by , 19 years ago
| Milestone: | 2.2.1 → 2.2.2 |
|---|
comment:9 by , 19 years ago
| Milestone: | 2.2.2 → 2.2.3 |
|---|
comment:10 by , 19 years ago
| Milestone: | 2.2.3 → 2.2.4 |
|---|
comment:11 by , 18 years ago
| Milestone: | 2.2.4 → 2.2.5 |
|---|
Any way valgrind could be use to diagnostic where the problem is ?
comment:12 by , 18 years ago
| Milestone: | 2.2.5 → 2.2.6 |
|---|
comment:13 by , 17 years ago
| Milestone: | 2.2.6 → 2.2.7 |
|---|
comment:14 by , 17 years ago
| Resolution: | → worksforme |
|---|---|
| Status: | assigned → closed |
I close this bug as my own tests with tape on Debian have been successful. Feel free to reopen if you think it's still there with latest gcc/mondo 2.2.7/kernel/debian 4/lenny ...

I'd didn't mean to assign to you and it also happens in 2.2.0.
Cheers, Andree