Opened 13 years ago

Closed 11 years ago

#82 closed defect (worksforme)

Archiving to tape leaves out most files (biggiefiles are ok)

Reported by: andree Owned by: andree
Priority: normal Milestone: 2.2.7
Component: mondo Version: 2.0.9
Severity: normal Keywords:
Cc:

Description

The undelying reason os that mondoarchive calls function make_afioballs_and_images_OLD() rather than make_afioballs_and_images() for tape archives. (This happens in function make_those_afios_phase().) At least this explains why there is different behaviour.

Changing the code so it calls make_afioballs_and_images() works fine until towards the end of archiving the normal files where it segfaults. Stupidly, gdb segfaults as well when this happens and I can't even get a backtrace. :-(

The screenout looks like this:

---progress-form---4--- TASK:  [********************]  97% done;  0:32 to go
---progress-form---1--- I am backing up your live filesystem now.
---progress-form---2--- Volume 1: 1,181,312 kilobytes archived so far
---progress-form---3--- Please wait. This may take a couple of hours.
---progress-form---E---
---progress-form---4--- TASK:  [********************]  98% done;  0:21 to go
---progress-form---1--- I am backing up your live filesystem now.
---progress-form---2--- Volume 1: 1,181,312 kilobytes archived so far
---progress-form---3--- Please wait. This may take a couple of hours.
---progress-form---E---
---progress-form---4--- TASK:  [********************]  98% done;  0:21 to go
---progress-form---1--- I am backing up your live filesystem now.
---progress-form---2--- Volume 1: 1,190,016 kilobytes archived so far
---progress-form---3--- Please wait. This may take a couple of hours.
---progress-form---E---
---progress-form---4--- TASK:  [********************]  99% done;  0:10 to go
(Unknown) signal received from OS

Fatal error... Mondoarchive is terminating in response to a signal from the OS
Fatal error... Execution run aborted (main loop)
Execution run ended; result=1
Type 'less /var/log/mondo-archive.log' to see the output log

The final rows of mondo-archive.log are this:

				[Main] libmondo-archive.c->archive_this_fileset#527: command='afio -o -b 131072 -M 16m -Z -P bzip2 -G 3 -T 3k -E /usr/share/mondo/do-not-compress-these /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/tmpfs/117.afio.bz2 < /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/filelist.117 2>> /var/log/mondo-archive.log'
[Main] libmondo-archive.c->make_afioballs_and_images#1729: AL DEBUG: At end of if in make_afioballs_and_images()
	[Main] libmondo-archive.c->make_afioballs_and_images#1683: Storing set 116
																			[Main] libmondo-files.c->does_file_exist#261: /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/filelist.116 exists
																			[Main] libmondo-files.c->does_file_exist#261: /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/tmpfs/116.afio.bz2 exists
running: cp -f /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/tmpfs/116.afio.bz2 /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/tmpfs/backcatalog > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err
--------------------------------start of output-----------------------------
--------------------------------end of output------------------------------
...ran just fine. :-)
	[Main] libmondo-archive.c->make_afioballs_and_images#1699: libmondo-archive.c, make_afioballs_and_images, 1699: Writing EXAT files
					[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_START_EXTENDED_ATTRIBUTES (fname=xattr_list.116.gz, size=0 K)
					[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_START_EXAT_FILE (fname=xattr_list.116.gz, size=0 K)
	[Main] libmondo-stream.c->write_file_to_stream_from_file#1620: Writing file 'xattr_list.116.gz' to tape (0 KB)
					[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_START_FILE (fname=xattr_list.116.gz, size=0 K)
					[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_STOP_FILE (fname=728eeead8447, size=0 K)
					[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_STOP_EXAT_FILE (fname=xattr_list.116.gz, size=-1 K)
					[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_START_EXAT_FILE (fname=acl_list.116.gz, size=0 K)
	[Main] libmondo-stream.c->write_file_to_stream_from_file#1620: Writing file 'acl_list.116.gz' to tape (0 KB)
					[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_START_FILE (fname=acl_list.116.gz, size=0 K)
					[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_STOP_FILE (fname=76103a4c9048, size=0 K)
					[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_STOP_EXAT_FILE (fname=acl_list.116.gz, size=-1 K)
					[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_STOP_EXTENDED_ATTRIBUTES (fname=xattr_list.116.gz, size=0 K)
																			[Main] libmondo-files.c->does_file_exist#261: /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/tmpfs/116.afio.bz2 exists
					[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_START_AN_AFIO_OR_SLICE (fname=116.afio.bz2, size=13184 K)
	[Main] libmondo-stream.c->write_file_to_stream_from_file#1620: Writing file '116.afio.bz2' to tape (13184 KB)
					[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_START_FILE (fname=116.afio.bz2, size=13184 K)
					[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_STOP_FILE (fname=0d7e3982cbca, size=0 K)
					[Main] libmondo-stream.c->write_header_block_to_stream#1728: BLK_STOP_AN_AFIO_OR_SLICE (fname=finished-writing-file, size=0 K)
[Main] libmondo-archive.c->make_afioballs_and_images#1729: AL DEBUG: At end of if in make_afioballs_and_images()
[Main] libmondo-archive.c->make_afioballs_and_images#1729: AL DEBUG: At end of if in make_afioballs_and_images()
[Main] libmondo-archive.c->make_afioballs_and_images#1729: AL DEBUG: At end of if in make_afioballs_and_images()
			[Main] libmondo-archive.c->create_afio_files_in_background#1222:                                          [8973:0] - archived 117 OK
																			[Main] libmondo-files.c->does_file_exist#258: /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/filelist.118 does not exist
		[Main] libmondo-archive.c->create_afio_files_in_background#1233:                                          [8973:0] - exiting
(Unknown) signal received from OS

	[Main] libmondo-fifo.c->kill_buffer#247: kill_buffer() --- command = ps auxww | grep -F "buffer -m 32m -p75 -B -s32768 -o /dev/ht0 2>> /var/log/mondo-archive.log" | grep -Fv grep | awk '{print $2;}' | grep -v PID | tr -s '
' ' ' | awk '{ print $1; }'
	[Main] libmondo-fifo.c->kill_buffer#250: kill_buffer() --- command = kill 2800
running: kill 2800 > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err
--------------------------------start of output-----------------------------
--------------------------------end of output------------------------------
...ran just fine. :-)
[Main] newt-specific.c->fatal_error#377: Fatal error received - 'Mondoarchive is terminating in response to a signal from the OS'
		[Main] newt-specific.c->fatal_error#395: OK, I think I'm the main PID.
	[Main] newt-specific.c->fatal_error#403: I'm going to do some cleaning up now.
			[Main] newt-specific.c->fatal_error#404: killall mindi 2> /dev/null
running: kill `ps auxww | grep " /mondo/do-not " | awk '{print $1;}' | grep -vx "\?"` > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err
--------------------------------start of output-----------------------------
sh: line 0: kill: root: arguments must be process or job IDs
sh: line 0: kill: root: arguments must be process or job IDs
--------------------------------end of output------------------------------
...ran with res=256
running: kill `ps auxww | grep " tmp.mondo " | awk '{print $1;}' | grep -vx "\?"` > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err
--------------------------------start of output-----------------------------
sh: line 0: kill: root: arguments must be process or job IDs
sh: line 0: kill: root: arguments must be process or job IDs
sh: line 0: kill: root: arguments must be process or job IDs
--------------------------------end of output------------------------------
...ran with res=256
running: kill `ps auxww | grep " ntfsclone " | awk '{print $1;}' | grep -vx "\?"` > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err
--------------------------------start of output-----------------------------
sh: line 0: kill: root: arguments must be process or job IDs
sh: line 0: kill: root: arguments must be process or job IDs
sh: line 0: kill: root: arguments must be process or job IDs
--------------------------------end of output------------------------------
...ran with res=256
[Main] libmondo-archive.c->make_afioballs_and_images#1729: AL DEBUG: At end of if in make_afioballs_and_images()
[Main] newt-specific.c->fatal_error#377: Fatal error received - 'Execution run aborted (main loop)'
		[Main] newt-specific.c->fatal_error#395: OK, I think I'm the main PID.
		[Main] newt-specific.c->fatal_error#397: ...I'm already exiting. Give me time, Julian!
	[Main] libmondo-files.c->register_pid#812: Unregistering PID
running: umount /mnt/cdrom > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err
--------------------------------start of output-----------------------------
umount: /mnt/cdrom: not mounted
--------------------------------end of output------------------------------
...ran with res=256
running: rm -Rf /mondo.scratch.* /tmp.mondo.* > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err
--------------------------------start of output-----------------------------
--------------------------------end of output------------------------------
...ran just fine. :-)
running: rm -Rf /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027 /home/mondorescue/scratch/mondo.scratch.7496/mondo.scratch.11697 > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err
--------------------------------start of output-----------------------------
rm: cannot remove directory `/home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/tmpfs': Device or resource busy
--------------------------------end of output------------------------------
...ran with res=256
	[Main] libmondo-tools.c->do_libmondo_global_strings_thing#1586: libmondo-tools.c, do_libmondo_global_strings_thing, 1586: Freeing globals

All not very informative I'm afraid.

We can either try to figure out what the differences in behaviour between make_afioballs_and_images_OLD() and than make_afioballs_and_images() are and fix this. Or we can try to fix above segmentation fault.

If you have an idea, Bruno, please let me know. Otherwise I'll keep plodding along...

(As a side note, the kill command fails because it should be $2 in the awk call not $1- that or ps_options needs be different.)

Change History (14)

comment:1 Changed 13 years ago by andree

I'd didn't mean to assign to you and it also happens in 2.2.0.

Cheers, Andree

comment:2 Changed 13 years ago by Bruno Cornec

>  The undelying reason os that mondoarchive calls function
>  make_afioballs_and_images_OLD() rather than make_afioballs_and_images()
>  for tape archives. (This happens in function make_those_afios_phase().) At
>  least this explains why there is different behaviour.

The difference, as it is documented, is :
  use the singlethreaded make_afioballs_and_images_OLD()
  instead of the multithreaded make_afioballs_and_images()
  if backing up to tape
(It's nearly the first time I find a useful comment in mondo ;-)

{{{
>  [Main] libmondo-files.c->does_file_exist#258:
>  /home/mondorescue/tmp/tmp.mondo.30609/tmp.mondo.7027/filelist.118 does not
>  exist
>                  [Main] libmondo-
>  archive.c->create_afio_files_in_background#1233:
>  [8973:0] - exiting
>  (Unknown) signal received from OS
}}}

Isn't there any link between the fact that the latest doesn't exist
(which should be always the case), that we are with a tape, and
multi-threaded ?

{{{
>  running: kill `ps auxww | grep " /mondo/do-not " | awk '{print $1;}' |
>  grep -vx "\?"` > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-
>  thing.err
>  --------------------------------start of
>  output-----------------------------
>  sh: line 0: kill: root: arguments must be process or job IDs
}}}

>  We can either try to figure out what the differences in behaviour between
>  make_afioballs_and_images_OLD() and than make_afioballs_and_images() are
>  and fix this. Or we can try to fix above segmentation fault.

I'm all for removing a function if we can obsolete it.

>  If you have an idea, Bruno, please let me know. Otherwise I'll keep
>  plodding along...

Not like that, I need to have a closer look.

>  (As a side note, the kill command fails because it should be $2 in the awk
>  call not $1- that or ps_options needs be different.)

ps_options was introduced recently and should be OK.
Clearly for me $1 is wrong for a ps result. It's always $2.

Bruno.

comment:3 Changed 13 years ago by Bruno Cornec

Milestone: 2.2.1
Owner: changed from Bruno Cornec to andree

comment:4 Changed 13 years ago by andree

Status: newassigned

Some more information:

  • only happens on i386, amd64 is not affected
  • does not seem to depend on kernel version - bug occurs both with 2.6.18 and 2.6.16

The reason for make_afioballs_and_images_OLD() skipping the normal files is this:

[Main] libmondo-archive.c->make_afioballs_and_images_OLD#2379: Archiving set 0
																			[Main] libmondo-files.c->does_file_exist#261: /home/mondorescue/tmp/tmp.mondo.11723/tmp.mondo.2908/filelist.0 exists
																			[Main] libmondo-files.c->does_file_exist#261: /usr/share/mondo/do-not-compress-these exists
																			[Main] libmondo-files.c->does_file_exist#261: /home/mondorescue/tmp/tmp.mondo.11723/tmp.mondo.2908 exists
																			[Main] libmondo-files.c->does_file_exist#261: /home/mondorescue/scratch/mondo.scratch.23479/mondo.scratch.18669 exists
				[Main] libmondo-archive.c->archive_this_fileset#527: command='afio -o -b 131072 -M 16m -Z -P bzip2 -G 3 -T 3k -E /usr/share/mondo/do-not-compress-these /home/mondorescue/tmp/tmp.mondo.11723/tmp.mondo.2908/tmpfs/0.afio.bz2 < /home/mondorescue/tmp/tmp.mondo.11723/tmp.mondo.2908/filelist.0 2>> /var/log/mondo-archive.log'
	[Main] libmondo-archive.c->archive_this_fileset#552: min(free_ramdisk_space) is now 335
running: cp -f /home/mondorescue/tmp/tmp.mondo.11723/tmp.mondo.2908/tmpfs/0.afio.bz2 /tmpfs/backcatalog > /tmp/mondo-run-prog-thing.tmp 2> /tmp/mondo-run-prog-thing.err
--------------------------------start of output-----------------------------
cp: cannot create regular file `/tmpfs/backcatalog': No such file or directory
--------------------------------end of output------------------------------
...ran with res=256
	[Main] libmondo-stream.c->maintain_collection_of_recent_archives#604: Warning - failed to copy /home/mondorescue/tmp/tmp.mondo.11723/tmp.mondo.2908/tmpfs/0.afio.bz2 to backcatalog at /tmpfs/backcatalog

The backcatalog directory can not be created because variable td of maintain_collection_of_recent_archives() is blank.

Adding a log_it() debug line to output bkpinfo->tmpdir before the call to maintain_collection_of_recent_archives() in make_afioballs_and_images_OLD() makes this go away but then make_afioballs_and_images_OLD() loops forever doing fileset 1.

This looks like a memory allocation bug (what else is new).

comment:5 Changed 13 years ago by andree

I wrote:

The backcatalog directory can not be created because variable td of
maintain_collection_of_recent_archives() is blank.

This looks like a compiler bug in gcc 4.1.2. When I compile with -O0 instead of -O2, td has the correct value.

comment:6 Changed 13 years ago by andree

Tested with gcc 3.4.6 with same result. Also, to clarify, gcc 4.1.2 in previous comment is actually "4.1.2 20061007 (prerelease)".

comment:7 Changed 13 years ago by andree

Have gone back now to trying with make_afioballs_and_images(), i.e.:

--- mondo-2.20.orig/mondo/common/libmondo-archive.c
+++ mondo-2.20/mondo/common/libmondo-archive.c
@@ -2468,7 +2468,7 @@
                                "Using single-threaded make_afioballs_and_images() to suit b0rken FreeBSD 5.0");
                res = make_afioballs_and_images_OLD(bkpinfo);
 #else
-               res = make_afioballs_and_images_OLD(bkpinfo);
+               res = make_afioballs_and_images(bkpinfo);
 #endif
                write_header_block_to_stream((off_t)0, "stop-afioballs",
                                                                         BLK_STOP_AFIOBALLS);

I have also turned off singal handling to avoid gdb crashing like this:

--- mondo-2.20.orig/mondo/mondoarchive/main.c
+++ mondo-2.20/mondo/mondoarchive/main.c
@@ -121,7 +121,7 @@
 //static char cvsid[] = "$Id: main.c 814 2006-09-23 08:36:11Z bruno $";

 /************************* external variables *************************/
-extern void set_signals(int);
+//extern void set_signals(int);
 extern int g_current_media_number;
 extern void register_pid(pid_t, char *);
 extern int g_currentY;
@@ -315,7 +315,7 @@
        log_msg(9, "This");

        register_pid(g_main_pid, "mondo");
-       set_signals(TRUE);                      // catch SIGTERM, etc.
+       //set_signals(TRUE);                    // catch SIGTERM, etc.
        run_program_and_log_output("dmesg -n1", TRUE);

        log_msg(9, "Next");
@@ -538,7 +538,7 @@
        }
        distro_specific_kludges_at_end_of_mondoarchive();
        register_pid(0, "mondo");
-       set_signals(FALSE);
+       //set_signals(FALSE);
        chdir("/tmp");                          // just in case there's something wrong with g_erase_tmpdir_and_scratchdir
        system(g_erase_tmpdir_and_scratchdir);
        free_libmondo_global_strings();

Result is that I am getting pretty much the same behaviour as described above now for make_afioballs_and_images_OLD(). Now it is a double-free error for storing_filelist_fname (line 1748 in libmondo-archive.c). Compiling with -O0 instead of -O2 makes the problem go away. I suspect something fishy going on with move_files_to_cd().

comment:8 Changed 13 years ago by Bruno Cornec

Milestone: 2.2.12.2.2

comment:9 Changed 13 years ago by Bruno Cornec

Milestone: 2.2.22.2.3

comment:10 Changed 13 years ago by Bruno Cornec

Milestone: 2.2.32.2.4

comment:11 Changed 13 years ago by Bruno Cornec

Milestone: 2.2.42.2.5

Any way valgrind could be use to diagnostic where the problem is ?

comment:12 Changed 12 years ago by Bruno Cornec

Milestone: 2.2.52.2.6

comment:13 Changed 12 years ago by Bruno Cornec

Milestone: 2.2.62.2.7

comment:14 Changed 11 years ago by Bruno Cornec

Resolution: worksforme
Status: assignedclosed

I close this bug as my own tests with tape on Debian have been successful. Feel free to reopen if you think it's still there with latest gcc/mondo 2.2.7/kernel/debian 4/lenny ...

Note: See TracTickets for help on using tickets.