Opened 18 years ago
Closed 16 years ago
#82 closed defect (worksforme)
Archiving to tape leaves out most files (biggiefiles are ok)
Reported by: | andree | Owned by: | andree |
---|---|---|---|
Priority: | normal | Milestone: | 2.2.7 |
Component: | mondo | Version: | 2.0.9 |
Severity: | normal | Keywords: | |
Cc: |
Description
The undelying reason os that mondoarchive calls function make_afioballs_and_images_OLD() rather than make_afioballs_and_images() for tape archives. (This happens in function make_those_afios_phase().) At least this explains why there is different behaviour.
Changing the code so it calls make_afioballs_and_images() works fine until towards the end of archiving the normal files where it segfaults. Stupidly, gdb segfaults as well when this happens and I can't even get a backtrace. :-(
The screenout looks like this:
---progress-form---4--- TASK: [********************] 97% done; 0:32 to go ---progress-form---1--- I am backing up your live filesystem now. ---progress-form---2--- Volume 1: 1,181,312 kilobytes archived so far ---progress-form---3--- Please wait. This may take a couple of hours. ---progress-form---E--- ---progress-form---4--- TASK: [********************] 98% done; 0:21 to go ---progress-form---1--- I am backing up your live filesystem now. ---progress-form---2--- Volume 1: 1,181,312 kilobytes archived so far ---progress-form---3--- Please wait. This may take a couple of hours. ---progress-form---E--- ---progress-form---4--- TASK: [********************] 98% done; 0:21 to go ---progress-form---1--- I am backing up your live filesystem now. ---progress-form---2--- Volume 1: 1,190,016 kilobytes archived so far ---progress-form---3--- Please wait. This may take a couple of hours. ---progress-form---E--- ---progress-form---4--- TASK: [********************] 99% done; 0:10 to go (Unknown) signal received from OS Fatal error... Mondoarchive is terminating in response to a signal from the OS Fatal error... Execution run aborted (main loop) Execution run ended; result=1 Type 'less /var/log/mondo-archive.log' to see the output log
The final rows of mondo-archive.log are this:
[Main] libmondo-archive.c->archive_this_fileset#527: command='afio -o -b 131072 -M 16m -Z -P bzip2 -G 3 -T 3k -E /usr/share/mondo/do-not-compress-these /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/tmpfs/117.afio.bz2 < /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/filelist.117 2>> /var/log/mondo-archive.log' [Main] libmondo-archive.c->make_afioballs_and_images#1729: AL DEBUG: At end of if in make_afioballs_and_images() [Main] libmondo-archive.c->make_afioballs_and_images#1683: Storing set 116 [Main] libmondo-files.c->does_file_exist#261: /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/filelist.116 exists [Main] libmondo-files.c->does_file_exist#261: /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/tmpfs/116.afio.bz2 exists running: cp -f /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/tmpfs/116.afio.bz2 /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/tmpfs/backcatalog > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err --------------------------------start of output----------------------------- --------------------------------end of output------------------------------ ...ran just fine. :-) [Main] libmondo-archive.c->make_afioballs_and_images#1699: libmondo-archive.c, make_afioballs_and_images, 1699: Writing EXAT files [Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_START_EXTENDED_ATTRIBUTES (fname=xattr_list.116.gz, size=0 K) [Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_START_EXAT_FILE (fname=xattr_list.116.gz, size=0 K) [Main] libmondo-stream.c->write_file_to_stream_from_file#1620: Writing file 'xattr_list.116.gz' to tape (0 KB) [Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_START_FILE (fname=xattr_list.116.gz, size=0 K) [Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_STOP_FILE (fname=728eeead8447, size=0 K) [Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_STOP_EXAT_FILE (fname=xattr_list.116.gz, size=-1 K) [Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_START_EXAT_FILE (fname=acl_list.116.gz, size=0 K) [Main] libmondo-stream.c->write_file_to_stream_from_file#1620: Writing file 'acl_list.116.gz' to tape (0 KB) [Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_START_FILE (fname=acl_list.116.gz, size=0 K) [Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_STOP_FILE (fname=76103a4c9048, size=0 K) [Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_STOP_EXAT_FILE (fname=acl_list.116.gz, size=-1 K) [Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_STOP_EXTENDED_ATTRIBUTES (fname=xattr_list.116.gz, size=0 K) [Main] libmondo-files.c->does_file_exist#261: /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/tmpfs/116.afio.bz2 exists [Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_START_AN_AFIO_OR_SLICE (fname=116.afio.bz2, size=13184 K) [Main] libmondo-stream.c->write_file_to_stream_from_file#1620: Writing file '116.afio.bz2' to tape (13184 KB) [Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_START_FILE (fname=116.afio.bz2, size=13184 K) [Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_STOP_FILE (fname=0d7e3982cbca, size=0 K) [Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_STOP_AN_AFIO_OR_SLICE (fname=finished-writing-file, size=0 K) [Main] libmondo-archive.c->make_afioballs_and_images#1729: AL DEBUG: At end of if in make_afioballs_and_images() [Main] libmondo-archive.c->make_afioballs_and_images#1729: AL DEBUG: At end of if in make_afioballs_and_images() [Main] libmondo-archive.c->make_afioballs_and_images#1729: AL DEBUG: At end of if in make_afioballs_and_images() [Main] libmondo-archive.c->create_afio_files_in_background#1222: [8973:0] - archived 117 OK [Main] libmondo-files.c->does_file_exist#258: /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/filelist.118 does not exist [Main] libmondo-archive.c->create_afio_files_in_background#1233: [8973:0] - exiting (Unknown) signal received from OS [Main] libmondo-fifo.c->kill_buffer#247: kill_buffer() --- command = ps auxww | grep -F "buffer -m 32m -p75 -B -s32768 -o /dev/ht0 2>> /var/log/mondo-archive.log" | grep -Fv grep | awk '{print $2;}' | grep -v PID | tr -s ' ' ' ' | awk '{ print $1; }' [Main] libmondo-fifo.c->kill_buffer#250: kill_buffer() --- command = kill 2800 running: kill 2800 > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err --------------------------------start of output----------------------------- --------------------------------end of output------------------------------ ...ran just fine. :-) [Main] newt-specific.c->fatal_error#377: Fatal error received - 'Mondoarchive is terminating in response to a signal from the OS' [Main] newt-specific.c->fatal_error#395: OK, I think I'm the main PID. [Main] newt-specific.c->fatal_error#403: I'm going to do some cleaning up now. [Main] newt-specific.c->fatal_error#404: killall mindi 2> /dev/null running: kill `ps auxww | grep " /mondo/do-not " | awk '{print $1;}' | grep -vx "\?"` > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err --------------------------------start of output----------------------------- sh: line 0: kill: root: arguments must be process or job IDs sh: line 0: kill: root: arguments must be process or job IDs --------------------------------end of output------------------------------ ...ran with res=256 running: kill `ps auxww | grep " tmp.mondo " | awk '{print $1;}' | grep -vx "\?"` > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err --------------------------------start of output----------------------------- sh: line 0: kill: root: arguments must be process or job IDs sh: line 0: kill: root: arguments must be process or job IDs sh: line 0: kill: root: arguments must be process or job IDs --------------------------------end of output------------------------------ ...ran with res=256 running: kill `ps auxww | grep " ntfsclone " | awk '{print $1;}' | grep -vx "\?"` > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err --------------------------------start of output----------------------------- sh: line 0: kill: root: arguments must be process or job IDs sh: line 0: kill: root: arguments must be process or job IDs sh: line 0: kill: root: arguments must be process or job IDs --------------------------------end of output------------------------------ ...ran with res=256 [Main] libmondo-archive.c->make_afioballs_and_images#1729: AL DEBUG: At end of if in make_afioballs_and_images() [Main] newt-specific.c->fatal_error#377: Fatal error received - 'Execution run aborted (main loop)' [Main] newt-specific.c->fatal_error#395: OK, I think I'm the main PID. [Main] newt-specific.c->fatal_error#397: ...I'm already exiting. Give me time, Julian! [Main] libmondo-files.c->register_pid#812: Unregistering PID running: umount /mnt/cdrom > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err --------------------------------start of output----------------------------- umount: /mnt/cdrom: not mounted --------------------------------end of output------------------------------ ...ran with res=256 running: rm -Rf /mondo.scratch.* /tmp.mondo.* > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err --------------------------------start of output----------------------------- --------------------------------end of output------------------------------ ...ran just fine. :-) running: rm -Rf /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027 /home/mondorescue/scratch/mondo.scratch.7496/mondo.scratch.11697 > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err --------------------------------start of output----------------------------- rm: cannot remove directory `/home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/tmpfs': Device or resource busy --------------------------------end of output------------------------------ ...ran with res=256 [Main] libmondo-tools.c->do_libmondo_global_strings_thing#1586: libmondo-tools.c, do_libmondo_global_strings_thing, 1586: Freeing globals
All not very informative I'm afraid.
We can either try to figure out what the differences in behaviour between make_afioballs_and_images_OLD() and than make_afioballs_and_images() are and fix this. Or we can try to fix above segmentation fault.
If you have an idea, Bruno, please let me know. Otherwise I'll keep plodding along...
(As a side note, the kill command fails because it should be $2 in the awk call not $1- that or ps_options needs be different.)
Change History (14)
comment:1 by , 18 years ago
comment:2 by , 18 years ago
> The undelying reason os that mondoarchive calls function > make_afioballs_and_images_OLD() rather than make_afioballs_and_images() > for tape archives. (This happens in function make_those_afios_phase().) At > least this explains why there is different behaviour. The difference, as it is documented, is : use the singlethreaded make_afioballs_and_images_OLD() instead of the multithreaded make_afioballs_and_images() if backing up to tape (It's nearly the first time I find a useful comment in mondo ;-) {{{ > [Main] libmondo-files.c->does_file_exist#258: > /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/filelist.118 does not > exist > [Main] libmondo- > archive.c->create_afio_files_in_background#1233: > [8973:0] - exiting > (Unknown) signal received from OS }}} Isn't there any link between the fact that the latest doesn't exist (which should be always the case), that we are with a tape, and multi-threaded ? {{{ > running: kill `ps auxww | grep " /mondo/do-not " | awk '{print $1;}' | > grep -vx "\?"` > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog- > thing.err > --------------------------------start of > output----------------------------- > sh: line 0: kill: root: arguments must be process or job IDs }}} > We can either try to figure out what the differences in behaviour between > make_afioballs_and_images_OLD() and than make_afioballs_and_images() are > and fix this. Or we can try to fix above segmentation fault. I'm all for removing a function if we can obsolete it. > If you have an idea, Bruno, please let me know. Otherwise I'll keep > plodding along... Not like that, I need to have a closer look. > (As a side note, the kill command fails because it should be $2 in the awk > call not $1- that or ps_options needs be different.) ps_options was introduced recently and should be OK. Clearly for me $1 is wrong for a ps result. It's always $2. Bruno.
comment:3 by , 18 years ago
Milestone: | → 2.2.1 |
---|---|
Owner: | changed from | to
comment:4 by , 18 years ago
Status: | new → assigned |
---|
Some more information:
- only happens on i386, amd64 is not affected
- does not seem to depend on kernel version - bug occurs both with 2.6.18 and 2.6.16
The reason for make_afioballs_and_images_OLD() skipping the normal files is this:
[Main] libmondo-archive.c->make_afioballs_and_images_OLD#2379: Archiving set 0 [Main] libmondo-files.c->does_file_exist#261: /home/mondorescue/tmp/tmp.mondo.11723/tmp.mondo.2908/filelist.0 exists [Main] libmondo-files.c->does_file_exist#261: /usr/share/mondo/do-not-compress-these exists [Main] libmondo-files.c->does_file_exist#261: /home/mondorescue/tmp/tmp.mondo.11723/tmp.mondo.2908 exists [Main] libmondo-files.c->does_file_exist#261: /home/mondorescue/scratch/mondo.scratch.23479/mondo.scratch.18669 exists [Main] libmondo-archive.c->archive_this_fileset#527: command='afio -o -b 131072 -M 16m -Z -P bzip2 -G 3 -T 3k -E /usr/share/mondo/do-not-compress-these /home/mondorescue/tmp/tmp.mondo.11723/tmp.mondo.2908/tmpfs/0.afio.bz2 < /home/mondorescue/tmp/tmp.mondo.11723/tmp.mondo.2908/filelist.0 2>> /var/log/mondo-archive.log' [Main] libmondo-archive.c->archive_this_fileset#552: min(free_ramdisk_space) is now 335 running: cp -f /home/mondorescue/tmp/tmp.mondo.11723/tmp.mondo.2908/tmpfs/0.afio.bz2 /tmpfs/backcatalog > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err --------------------------------start of output----------------------------- cp: cannot create regular file `/tmpfs/backcatalog': No such file or directory --------------------------------end of output------------------------------ ...ran with res=256 [Main] libmondo-stream.c->maintain_collection_of_recent_archives#604: Warning - failed to copy /home/mondorescue/tmp/tmp.mondo.11723/tmp.mondo.2908/tmpfs/0.afio.bz2 to backcatalog at /tmpfs/backcatalog
The backcatalog directory can not be created because variable td of maintain_collection_of_recent_archives() is blank.
Adding a log_it() debug line to output bkpinfo->tmpdir before the call to maintain_collection_of_recent_archives() in make_afioballs_and_images_OLD() makes this go away but then make_afioballs_and_images_OLD() loops forever doing fileset 1.
This looks like a memory allocation bug (what else is new).
comment:5 by , 18 years ago
I wrote:
The backcatalog directory can not be created because variable td of maintain_collection_of_recent_archives() is blank.
This looks like a compiler bug in gcc 4.1.2. When I compile with -O0 instead of -O2, td has the correct value.
comment:6 by , 18 years ago
Tested with gcc 3.4.6 with same result. Also, to clarify, gcc 4.1.2 in previous comment is actually "4.1.2 20061007 (prerelease)".
comment:7 by , 18 years ago
Have gone back now to trying with make_afioballs_and_images(), i.e.:
--- mondo-2.20.orig/mondo/common/libmondo-archive.c +++ mondo-2.20/mondo/common/libmondo-archive.c @@ -2468,7 +2468,7 @@ "Using single-threaded make_afioballs_and_images() to suit b0rken FreeBSD 5.0"); res = make_afioballs_and_images_OLD(bkpinfo); #else - res = make_afioballs_and_images_OLD(bkpinfo); + res = make_afioballs_and_images(bkpinfo); #endif write_header_block_to_stream((off_t)0, "stop-afioballs", BLK_STOP_AFIOBALLS);
I have also turned off singal handling to avoid gdb crashing like this:
--- mondo-2.20.orig/mondo/mondoarchive/main.c +++ mondo-2.20/mondo/mondoarchive/main.c @@ -121,7 +121,7 @@ //static char cvsid[] = "$Id: main.c 814 2006-09-23 08:36:11Z bruno $"; /************************* external variables *************************/ -extern void set_signals(int); +//extern void set_signals(int); extern int g_current_media_number; extern void register_pid(pid_t, char *); extern int g_currentY; @@ -315,7 +315,7 @@ log_msg(9, "This"); register_pid(g_main_pid, "mondo"); - set_signals(TRUE); // catch SIGTERM, etc. + //set_signals(TRUE); // catch SIGTERM, etc. run_program_and_log_output("dmesg -n1", TRUE); log_msg(9, "Next"); @@ -538,7 +538,7 @@ } distro_specific_kludges_at_end_of_mondoarchive(); register_pid(0, "mondo"); - set_signals(FALSE); + //set_signals(FALSE); chdir("/tmp"); // just in case there's something wrong with g_erase_tmpdir_and_scratchdir system(g_erase_tmpdir_and_scratchdir); free_libmondo_global_strings();
Result is that I am getting pretty much the same behaviour as described above now for make_afioballs_and_images_OLD(). Now it is a double-free error for storing_filelist_fname (line 1748 in libmondo-archive.c). Compiling with -O0 instead of -O2 makes the problem go away. I suspect something fishy going on with move_files_to_cd().
comment:8 by , 17 years ago
Milestone: | 2.2.1 → 2.2.2 |
---|
comment:9 by , 17 years ago
Milestone: | 2.2.2 → 2.2.3 |
---|
comment:10 by , 17 years ago
Milestone: | 2.2.3 → 2.2.4 |
---|
comment:11 by , 17 years ago
Milestone: | 2.2.4 → 2.2.5 |
---|
Any way valgrind could be use to diagnostic where the problem is ?
comment:12 by , 17 years ago
Milestone: | 2.2.5 → 2.2.6 |
---|
comment:13 by , 16 years ago
Milestone: | 2.2.6 → 2.2.7 |
---|
comment:14 by , 16 years ago
Resolution: | → worksforme |
---|---|
Status: | assigned → closed |
I close this bug as my own tests with tape on Debian have been successful. Feel free to reopen if you think it's still there with latest gcc/mondo 2.2.7/kernel/debian 4/lenny ...
I'd didn't mean to assign to you and it also happens in 2.2.0.
Cheers, Andree