Opened 9 years ago

Closed 8 years ago

#421 closed defect (fixed)

buffer overflow detected: mondoarchive terminated

Reported by: awnugent Owned by: bruno
Priority: normal Milestone: 2.2.9.5
Component: mondo Version: 2.2.9.3
Severity: normal Keywords:
Cc:

Description

We are getting the following when trying to create a backup. Logs attached. Command line used:

mondoarchive -OVi -N -K 99 -d /mondo -I /dev/cciss/c0d0 -E "/net /mondo" -p dborad01.20100503-mondoa-1 -s 4480m

Snip: Archiving only the following file systems on /dev/cciss/c0d0:

/mondo /boot /home /var /tmp /usr /

Not archiving the following file systems: * buffer overflow detected *: mondoarchive terminated

System is RHEL 5, other systems in the environment are working fine.

Attachments (5)

mondoarchive.log.gz (4.0 KB) - added by awnugent 9 years ago.
mindi.log.gz (5.8 KB) - added by awnugent 9 years ago.
mondoarchive.2.2.9.4-0.20100513013504.rhel5.x86_64.log.gz (1.6 KB) - added by awnugent 9 years ago.
Log file from 2.2.9.4 test run.
gdb.txt.gz (79.1 KB) - added by awnugent 9 years ago.
gdb from debug run.
mondoarchive.2.2.9.4-0.20100517031424.rhel5.x86_64.log.gz (1.6 KB) - added by awnugent 9 years ago.

Download all attachments as: .zip

Change History (18)

Changed 9 years ago by awnugent

Changed 9 years ago by awnugent

comment:1 Changed 9 years ago by bruno

  • Status changed from new to assigned

Could you try to run valgrind/gdb on it in order to check where precisely the issue is (as described in http://trac.mondorescue.org/wiki/TroubleShooting#Trouble-Shootingmondo)

Maybe linked to #368 as well.

Could you also try to test the latest beta of 2.2.9.4 at ftp://ftp.mondorescue.org/test/rhel/5 I make to try to improve that issue ?

comment:2 Changed 9 years ago by awnugent

Thanks for getting back to me Bruno. In the interest of speed, I went straight to the 2.2.9.4 (mondo-2.2.9.4-0.20100513013504.rhel5.x86_64.rpm) test and it still failed. I'm attaching the mondoarchive.log and will pursue building and running under gdb.

I saw somewhere that there was a limit of 384 characters in an exclude and suspect this is the issue, but I can't seem to find it now. Our exclude lists can get pretty long.

Changed 9 years ago by awnugent

Log file from 2.2.9.4 test run.

comment:3 Changed 9 years ago by awnugent

Here's the gdb run and I'll attach the gdb.txt.

Your backup will probably occupy a single ISO. Maybe two. Done. Detaching after fork from child process 22743. Detaching after fork from child process 22769. Detaching after fork from child process 22773. Detaching after fork from child process 22779. Copying Mondo's core files to the scratch directory Detaching after fork from child process 22784. Detaching after fork from child process 22792. Fatal error... Failed to copy Mondo's stuff to scratchdir Detaching after fork from child process 22796. Detaching after fork from child process 22808. Detaching after fork from child process 22875. Detaching after fork from child process 22877. Detaching after fork from child process 22883. Detaching after fork from child process 22885. Detaching after fork from child process 22890. Detaching after fork from child process 22892. Detaching after fork from child process 22898. Detaching after fork from child process 22900. Detaching after fork from child process 22905. Detaching after fork from child process 22908. Detaching after fork from child process 22922. ---FATALERROR--- Failed to copy Mondo's stuff to scratchdir If you require technical support, please contact the mailing list. See http://www.mondorescue.org for details. The list's members can help you, if you attach that file to your e-mail. Log file: /var/log/mondoarchive.log Mondo has aborted. Detaching after fork from child process 22926. Detaching after fork from child process 22928. Execution run ended; result=254 Type 'less /var/log/mondoarchive.log' to see the output log Detaching after fork from child process 22930. Detaching after fork from child process 22931.

Program exited with code 0376. (gdb) bt No stack.

Changed 9 years ago by awnugent

gdb from debug run.

comment:4 Changed 9 years ago by awnugent

  • Summary changed from buffer overflow detected: mondoarhive terminated to buffer overflow detected: mondoarchive terminated

comment:5 Changed 9 years ago by awnugent

--- Workaround found --- Based on some information found at "http://us.generation-nt.com/bug-328682-found-source-bug-code-help-166181561.html", we recompiled with "MAX_STR_LEN 1024" from my-stuff.h in an attempt prove/disprove that this was at least partially the issue. It appears to have worked! We'd be willing to do some additional testing if you can tell us what to test. 1024 was a guesstimate number based on being larger than 384 and still divisible by 8.

Here is the abbreviated output from the run:

./mondoarchive -OVi -N -K 2 -9 -S /mondo -T /mondo -d /mondo -I /dev/cciss/c0d0 -E "/net /mondo" -p testMSL1024 -s 4480m . . . Done. Backup and/or verify ran to completion. Everything appears to be fine. /var/cache/mindi/mondorescue.iso, a boot/utility CD, is available if you want it Data archived OK. Mondoarchive ran OK. See /var/log/mondoarchive.log for details of backup run. Execution run ended; result=0 Type 'less /var/log/mondoarchive.log' to see the output log

comment:6 Changed 9 years ago by bruno

I fully agree with what is mentioned in your URL.

Now since that the code has changed, and especially the limit for exclusion path has been pushed to 4 times (Cf: http://trac.mondorescue.org/browser/branches/2.2.9/mondo/src/common/mondostructures.h)

However, it seems that there was an error introduced in the course of 2.2.9 (probably a wrong backport from 2.2.10 :-() Cf: rev [2633]

Could you try again the latest 2.2.9.4 beta please I'm rebuilding now at ftp://ftp.mondorescue.org/test/rhel/5 ?

comment:7 follow-up: Changed 9 years ago by awnugent

Installed mondo-2.2.9.4-0.20100513013504.rhel5.x86_64.rpm and ran again. Still erroring. I'll attach the log as well. The mondo package is the only one I updated.

Also, after the error, I retried using the custom compiled binary with the 2.2.9.4 package still installed and it runs clean. I'm not sure how valid that is, but thought you might like to know.

# mondoarchive -v mondoarchive v2.2.9.4-r2632

Archiving only the following file systems on /dev/cciss/c0d0: ==> /mondo /boot /home /var /tmp /usr / Not archiving the following file systems: * buffer overflow detected *: /usr/sbin/mondoarchive terminated ======= Backtrace: ========= /lib64/libc.so.6(chk_fail+0x2f)[0x33602e77af] /lib64/libc.so.6[0x33602e6c19] /lib64/libc.so.6(_IO_default_xsputn+0x94)[0x336026e294] /lib64/libc.so.6(_IO_vfprintf+0x3e13)[0x3360246503] /lib64/libc.so.6(vsprintf_chk+0x9d)[0x33602e6cbd] /usr/sbin/mondoarchive[0x42e110] /usr/sbin/mondoarchive[0x410435] /usr/sbin/mondoarchive[0x430c84] /usr/sbin/mondoarchive[0x43318b] /usr/sbin/mondoarchive[0x403323] /lib64/libc.so.6(libc_start_main+0xf4)[0x336021d994] /usr/sbin/mondoarchive[0x402bb9] ======= Memory map: ======== 00400000-0044e000 r-xp 00000000 fd:01 587096 /usr/sbin/mondoarchive 0064e000-00650000 rw-p 0004e000 fd:01 587096 /usr/sbin/mondoarchive 00650000-00654000 rw-p 00650000 00:00 0 19a2c000-19a4e000 rw-p 19a2c000 00:00 0 [heap] 335fe00000-335fe1c000 r-xp 00000000 fd:00 95234 /lib64/ld-2.5.so 336001b000-336001c000 r--p 0001b000 fd:00 95234 /lib64/ld-2.5.so 336001c000-336001d000 rw-p 0001c000 fd:00 95234 /lib64/ld-2.5.so 3360200000-336034d000 r-xp 00000000 fd:00 95236 /lib64/libc-2.5.so 336034d000-336054d000 ---p 0014d000 fd:00 95236 /lib64/libc-2.5.so 336054d000-3360551000 r--p 0014d000 fd:00 95236 /lib64/libc-2.5.so 3360551000-3360552000 rw-p 00151000 fd:00 95236 /lib64/libc-2.5.so 3360552000-3360557000 rw-p 3360552000 00:00 0 3360600000-3360602000 r-xp 00000000 fd:00 95247 /lib64/libdl-2.5.so 3360602000-3360802000 ---p 00002000 fd:00 95247 /lib64/libdl-2.5.so 3360802000-3360803000 r--p 00002000 fd:00 95247 /lib64/libdl-2.5.so 3360803000-3360804000 rw-p 00003000 fd:00 95247 /lib64/libdl-2.5.so 3360a00000-3360a82000 r-xp 00000000 fd:00 95249 /lib64/libm-2.5.so 3360a82000-3360c81000 ---p 00082000 fd:00 95249 /lib64/libm-2.5.so 3360c81000-3360c82000 r--p 00081000 fd:00 95249 /lib64/libm-2.5.so 3360c82000-3360c83000 rw-p 00082000 fd:00 95249 /lib64/libm-2.5.so 3360e00000-3360e16000 r-xp 00000000 fd:00 95256 /lib64/libpthread-2.5.so 3360e16000-3361015000 ---p 00016000 fd:00 95256 /lib64/libpthread-2.5.so 3361015000-3361016000 r--p 00015000 fd:00 95256 /lib64/libpthread-2.5.so 3361016000-3361017000 rw-p 00016000 fd:00 95256 /lib64/libpthread-2.5.so 3361017000-336101b000 rw-p 3361017000 00:00 0 3361200000-3361212000 r-xp 00000000 fd:01 460219 /usr/lib64/libnewt.so.0.52.1 3361212000-3361411000 ---p 00012000 fd:01 460219 /usr/lib64/libnewt.so.0.52.1 3361411000-3361413000 rw-p 00011000 fd:01 460219 /usr/lib64/libnewt.so.0.52.1 3361600000-33616c2000 r-xp 00000000 fd:01 459448 /usr/lib64/libslang.so.2.0.6 33616c2000-33618c1000 ---p 000c2000 fd:01 459448 /usr/lib64/libslang.so.2.0.6 33618c1000-33618dc000 rw-p 000c1000 fd:01 459448 /usr/lib64/libslang.so.2.0.6 33618dc000-336190f000 rw-p 33618dc000 00:00 0 336f400000-336f40d000 r-xp 00000000 fd:00 95251 /lib64/libgcc_s-4.1.2-20080825.so.1 336f40d000-336f60d000 ---p 0000d000 fd:00 95251 /lib64/libgcc_s-4.1.2-20080825.so.1 336f60d000-336f60e000 rw-p 0000d000 fd:00 95251 /lib64/libgcc_s-4.1.2-20080825.so.1 2b6e3bfe1000-2b6e3bfe3000 rw-p 2b6e3bfe1000 00:00 0 2b6e3bffa000-2b6e3bffe000 rw-p 2b6e3bffa000 00:00 0 7fffabd6b000-7fffabd82000 rw-p 7ffffffe8000 00:00 0 [stack] ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0 [vdso] SIGABRT signal received from OS Abort - probably failed assertion. I'm sleeping for a few seconds so you can rea Fatal error... MondoRescue? is terminating in response to a signal from the OS ---FATALERROR--- MondoRescue? is terminating in response to a signal from the OS If you require technical support, please contact the mailing list. See http://www.mondorescue.org for details. The list's members can help you, if you attach that file to your e-mail. Log file: /var/log/mondoarchive.log Mondo has aborted. Execution run ended; result=254 Type 'less /var/log/mondoarchive.log' to see the output log

comment:8 in reply to: ↑ 7 Changed 9 years ago by awnugent

Disregard, I was working with the 2.2.9.4*0513 version. I'll repost the corrected information.

comment:9 Changed 9 years ago by awnugent

Corrected run information: Installed mondo-2.2.9.4-0.20100517031424.rhel5.x86_64.rpm and ran again. Still erroring. I'll attach the log as well. The mondo package is the only one I updated.

Also, after the error, I retried using the custom compiled binary with the 2.2.9.4 package still installed and it runs clean. I'm not sure how valid that is, but thought you might like to know.

mondoarchive -v

mondoarchive v2.2.9.4-r2636

Archiving only the following file systems on /dev/cciss/c0d0: ==> /mondo /boot /home /var /tmp /usr / Not archiving the following file systems: * buffer overflow detected *: /usr/sbin/mondoarchive terminated ======= Backtrace: ========= /lib64/libc.so.6(chk_fail+0x2f)[0x33602e77af] /lib64/libc.so.6[0x33602e6c19] /lib64/libc.so.6(_IO_default_xsputn+0x94)[0x336026e294] /lib64/libc.so.6(_IO_vfprintf+0x3e13)[0x3360246503] /lib64/libc.so.6(vsprintf_chk+0x9d)[0x33602e6cbd] /usr/sbin/mondoarchive[0x42e110] /usr/sbin/mondoarchive[0x410435] /usr/sbin/mondoarchive[0x430c84] /usr/sbin/mondoarchive[0x43318b] /usr/sbin/mondoarchive[0x403323] /lib64/libc.so.6(libc_start_main+0xf4)[0x336021d994] /usr/sbin/mondoarchive[0x402bb9] ======= Memory map: ======== 00400000-0044e000 r-xp 00000000 fd:01 587420 /usr/sbin/mondoarchive 0064e000-00650000 rw-p 0004e000 fd:01 587420 /usr/sbin/mondoarchive 00650000-00654000 rw-p 00650000 00:00 0 057c9000-057eb000 rw-p 057c9000 00:00 0 [heap] 335fe00000-335fe1c000 r-xp 00000000 fd:00 95234 /lib64/ld-2.5.so 336001b000-336001c000 r--p 0001b000 fd:00 95234 /lib64/ld-2.5.so 336001c000-336001d000 rw-p 0001c000 fd:00 95234 /lib64/ld-2.5.so 3360200000-336034d000 r-xp 00000000 fd:00 95236 /lib64/libc-2.5.so 336034d000-336054d000 ---p 0014d000 fd:00 95236 /lib64/libc-2.5.so 336054d000-3360551000 r--p 0014d000 fd:00 95236 /lib64/libc-2.5.so 3360551000-3360552000 rw-p 00151000 fd:00 95236 /lib64/libc-2.5.so 3360552000-3360557000 rw-p 3360552000 00:00 0 3360600000-3360602000 r-xp 00000000 fd:00 95247 /lib64/libdl-2.5.so 3360602000-3360802000 ---p 00002000 fd:00 95247 /lib64/libdl-2.5.so 3360802000-3360803000 r--p 00002000 fd:00 95247 /lib64/libdl-2.5.so 3360803000-3360804000 rw-p 00003000 fd:00 95247 /lib64/libdl-2.5.so 3360a00000-3360a82000 r-xp 00000000 fd:00 95249 /lib64/libm-2.5.so 3360a82000-3360c81000 ---p 00082000 fd:00 95249 /lib64/libm-2.5.so 3360c81000-3360c82000 r--p 00081000 fd:00 95249 /lib64/libm-2.5.so 3360c82000-3360c83000 rw-p 00082000 fd:00 95249 /lib64/libm-2.5.so 3360e00000-3360e16000 r-xp 00000000 fd:00 95256 /lib64/libpthread-2.5.so 3360e16000-3361015000 ---p 00016000 fd:00 95256 /lib64/libpthread-2.5.so 3361015000-3361016000 r--p 00015000 fd:00 95256 /lib64/libpthread-2.5.so 3361016000-3361017000 rw-p 00016000 fd:00 95256 /lib64/libpthread-2.5.so 3361017000-336101b000 rw-p 3361017000 00:00 0 3361200000-3361212000 r-xp 00000000 fd:01 460219 /usr/lib64/libnewt.so.0.52.1 3361212000-3361411000 ---p 00012000 fd:01 460219 /usr/lib64/libnewt.so.0.52.1 3361411000-3361413000 rw-p 00011000 fd:01 460219 /usr/lib64/libnewt.so.0.52.1 3361600000-33616c2000 r-xp 00000000 fd:01 459448 /usr/lib64/libslang.so.2.0.6 33616c2000-33618c1000 ---p 000c2000 fd:01 459448 /usr/lib64/libslang.so.2.0.6 33618c1000-33618dc000 rw-p 000c1000 fd:01 459448 /usr/lib64/libslang.so.2.0.6 33618dc000-336190f000 rw-p 33618dc000 00:00 0 336f400000-336f40d000 r-xp 00000000 fd:00 95251 /lib64/libgcc_s-4.1.2-20080825.so.1 336f40d000-336f60d000 ---p 0000d000 fd:00 95251 /lib64/libgcc_s-4.1.2-20080825.so.1 336f60d000-336f60e000 rw-p 0000d000 fd:00 95251 /lib64/libgcc_s-4.1.2-20080825.so.1 2b6a6ea84000-2b6a6ea86000 rw-p 2b6a6ea84000 00:00 0 2b6a6ea9d000-2b6a6eaa1000 rw-p 2b6a6ea9d000 00:00 0 7fff91e18000-7fff91e2e000 rw-p 7ffffffe9000 00:00 0 [stack] ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0 [vdso] SIGABRT signal received from OS Abort - probably failed assertion. I'm sleeping for a few seconds so you can rea Fatal error... MondoRescue? is terminating in response to a signal from the OS ---FATALERROR--- MondoRescue? is terminating in response to a signal from the OS If you require technical support, please contact the mailing list. See http://www.mondorescue.org for details. The list's members can help you, if you attach that file to your e-mail. Log file: /var/log/mondoarchive.log Mondo has aborted. Execution run ended; result=254 Type 'less /var/log/mondoarchive.log' to see the output log

comment:10 Changed 9 years ago by bruno

I'm unable to reproduce the issue by using an exclude length of more than 400:

[Main] libmondo-devices.c->mr_make_devlist_from_pathlist#2027: exclude_paths is now ' /users  /var/log  /usr/share/doc  /mondo  /var/cache  /usr/src  /usr/share/texmf  /usr/share/games  /usr/share/webmin  /usr/lib/jdk-1.4.2_07  /pub  /usr/share/gnome-background-properties/  /usr/share/gnome-bluetooth/  /usr/share/gnome-control-center/  /usr/share/gnome-doc-utils/  /usr/share/gnome-games-common/  /usr/share/gnome-games/  /usr/share/gnome-media/  /usr/share/gnome-mount/  /usr/share/mobile-broadband-provider-info/ '
                        [Main] libmondo-cli.c->process_switches#647: Finished with the -E option

using 2.2.9.3 official version.

So I'm really puzzled here :-( Wonder how we can make progresses.

comment:11 Changed 9 years ago by awnugent

The workaround of updating MAX_STR_LEN has been stable for us and we haven't run into this on any other systems yet. Currently this is the largest system in our environment from the number of file systems point of view, but our environment is growing. I'll monitor and update if we run into it again and as new releases come out I'll update and test. If you have any ideas on how to identify, I'd be happy to try them. Thanks for all your help! Andrew

comment:12 Changed 9 years ago by bruno

  • Milestone changed from 2.2.9.4 to 2.2.9.5

comment:13 Changed 8 years ago by bruno

  • Resolution set to fixed
  • Status changed from assigned to closed

Should be fixed as of rev [2709] in 2.2.9.5, backporting dynamic allocation for exclude list from 2.2.10

Note: See TracTickets for help on using tickets.