[1765] | 1 | Keeping data small
|
---|
| 2 |
|
---|
| 3 | When many applets are compiled into busybox, all rw data and
|
---|
| 4 | bss for each applet are concatenated. Including those from libc,
|
---|
| 5 | if static busybox is built. When busybox is started, _all_ this data
|
---|
| 6 | is allocated, not just that one part for selected applet.
|
---|
| 7 |
|
---|
| 8 | What "allocated" exactly means, depends on arch.
|
---|
| 9 | On NOMMU it's probably bites the most, actually using real
|
---|
| 10 | RAM for rwdata and bss. On i386, bss is lazily allocated
|
---|
| 11 | by COWed zero pages. Not sure about rwdata - also COW?
|
---|
| 12 |
|
---|
| 13 | In order to keep busybox NOMMU and small-mem systems friendly
|
---|
| 14 | we should avoid large global data in our applets, and should
|
---|
| 15 | minimize usage of libc functions which implicitly use
|
---|
| 16 | such structures.
|
---|
| 17 |
|
---|
| 18 | Small experiment to measure "parasitic" bbox memory consumption:
|
---|
| 19 | here we start 1000 "busybox sleep 10" in parallel.
|
---|
| 20 | busybox binary is practically allyesconfig static one,
|
---|
| 21 | built against uclibc. Run on x86-64 machine with 64-bit kernel:
|
---|
| 22 |
|
---|
| 23 | bash-3.2# nmeter '%t %c %m %p %[pn]'
|
---|
| 24 | 23:17:28 .......... 168M 0 147
|
---|
| 25 | 23:17:29 .......... 168M 0 147
|
---|
| 26 | 23:17:30 U......... 168M 1 147
|
---|
| 27 | 23:17:31 SU........ 181M 244 391
|
---|
| 28 | 23:17:32 SSSSUUU... 223M 757 1147
|
---|
| 29 | 23:17:33 UUU....... 223M 0 1147
|
---|
| 30 | 23:17:34 U......... 223M 1 1147
|
---|
| 31 | 23:17:35 .......... 223M 0 1147
|
---|
| 32 | 23:17:36 .......... 223M 0 1147
|
---|
| 33 | 23:17:37 S......... 223M 0 1147
|
---|
| 34 | 23:17:38 .......... 223M 1 1147
|
---|
| 35 | 23:17:39 .......... 223M 0 1147
|
---|
| 36 | 23:17:40 .......... 223M 0 1147
|
---|
| 37 | 23:17:41 .......... 210M 0 906
|
---|
| 38 | 23:17:42 .......... 168M 1 147
|
---|
| 39 | 23:17:43 .......... 168M 0 147
|
---|
| 40 |
|
---|
| 41 | This requires 55M of memory. Thus 1 trivial busybox applet
|
---|
| 42 | takes 55k of memory on 64-bit x86 kernel.
|
---|
| 43 |
|
---|
| 44 | On 32-bit kernel we need ~26k per applet.
|
---|
| 45 |
|
---|
| 46 | Script:
|
---|
| 47 |
|
---|
| 48 | i=1000; while test $i != 0; do
|
---|
| 49 | echo -n .
|
---|
| 50 | busybox sleep 30 &
|
---|
| 51 | i=$((i - 1))
|
---|
| 52 | done
|
---|
| 53 | echo
|
---|
| 54 | wait
|
---|
| 55 |
|
---|
| 56 | (Data from NOMMU arches are sought. Provide 'size busybox' output too)
|
---|
| 57 |
|
---|
| 58 |
|
---|
| 59 | Example 1
|
---|
| 60 |
|
---|
| 61 | One example how to reduce global data usage is in
|
---|
| 62 | archival/libunarchive/decompress_unzip.c:
|
---|
| 63 |
|
---|
| 64 | /* This is somewhat complex-looking arrangement, but it allows
|
---|
| 65 | * to place decompressor state either in bss or in
|
---|
| 66 | * malloc'ed space simply by changing #defines below.
|
---|
| 67 | * Sizes on i386:
|
---|
| 68 | * text data bss dec hex
|
---|
| 69 | * 5256 0 108 5364 14f4 - bss
|
---|
| 70 | * 4915 0 0 4915 1333 - malloc
|
---|
| 71 | */
|
---|
| 72 | #define STATE_IN_BSS 0
|
---|
| 73 | #define STATE_IN_MALLOC 1
|
---|
| 74 |
|
---|
| 75 | (see the rest of the file to get the idea)
|
---|
| 76 |
|
---|
| 77 | This example completely eliminates globals in that module.
|
---|
| 78 | Required memory is allocated in unpack_gz_stream() [its main module]
|
---|
| 79 | and then passed down to all subroutines which need to access 'globals'
|
---|
| 80 | as a parameter.
|
---|
| 81 |
|
---|
| 82 |
|
---|
| 83 | Example 2
|
---|
| 84 |
|
---|
| 85 | In case you don't want to pass this additional parameter everywhere,
|
---|
| 86 | take a look at archival/gzip.c. Here all global data is replaced by
|
---|
| 87 | single global pointer (ptr_to_globals) to allocated storage.
|
---|
| 88 |
|
---|
| 89 | In order to not duplicate ptr_to_globals in every applet, you can
|
---|
| 90 | reuse single common one. It is defined in libbb/messages.c
|
---|
| 91 | as struct globals *const ptr_to_globals, but the struct globals is
|
---|
| 92 | NOT defined in libbb.h. You first define your own struct:
|
---|
| 93 |
|
---|
| 94 | struct globals { int a; char buf[1000]; };
|
---|
| 95 |
|
---|
| 96 | and then declare that ptr_to_globals is a pointer to it:
|
---|
| 97 |
|
---|
| 98 | #define G (*ptr_to_globals)
|
---|
| 99 |
|
---|
| 100 | ptr_to_globals is declared as constant pointer.
|
---|
| 101 | This helps gcc understand that it won't change, resulting in noticeably
|
---|
| 102 | smaller code. In order to assign it, use PTR_TO_GLOBALS macro:
|
---|
| 103 |
|
---|
| 104 | PTR_TO_GLOBALS = xzalloc(sizeof(G));
|
---|
| 105 |
|
---|
| 106 | Typically it is done in <applet>_main().
|
---|
| 107 |
|
---|
| 108 | Now you can reference "globals" by G.a, G.buf and so on, in any function.
|
---|
| 109 |
|
---|
| 110 |
|
---|
| 111 | bb_common_bufsiz1
|
---|
| 112 |
|
---|
| 113 | There is one big common buffer in bss - bb_common_bufsiz1. It is a much
|
---|
| 114 | earlier mechanism to reduce bss usage. Each applet can use it for
|
---|
| 115 | its needs. Library functions are prohibited from using it.
|
---|
| 116 |
|
---|
| 117 | 'G.' trick can be done using bb_common_bufsiz1 instead of malloced buffer:
|
---|
| 118 |
|
---|
| 119 | #define G (*(struct globals*)&bb_common_bufsiz1)
|
---|
| 120 |
|
---|
| 121 | Be careful, though, and use it only if globals fit into bb_common_bufsiz1.
|
---|
| 122 | Since bb_common_bufsiz1 is BUFSIZ + 1 bytes long and BUFSIZ can change
|
---|
| 123 | from one libc to another, you have to add compile-time check for it:
|
---|
| 124 |
|
---|
| 125 | if (sizeof(struct globals) > sizeof(bb_common_bufsiz1))
|
---|
| 126 | BUG_<applet>_globals_too_big();
|
---|
| 127 |
|
---|
| 128 |
|
---|
| 129 | Drawbacks
|
---|
| 130 |
|
---|
| 131 | You have to initialize it by hand. xzalloc() can be helpful in clearing
|
---|
| 132 | allocated storage to 0, but anything more must be done by hand.
|
---|
| 133 |
|
---|
| 134 | All global variables are prefixed by 'G.' now. If this makes code
|
---|
| 135 | less readable, use #defines:
|
---|
| 136 |
|
---|
| 137 | #define dev_fd (G.dev_fd)
|
---|
| 138 | #define sector (G.sector)
|
---|
| 139 |
|
---|
| 140 |
|
---|
| 141 | Word of caution
|
---|
| 142 |
|
---|
| 143 | If applet doesn't use much of global data, converting it to use
|
---|
| 144 | one of above methods is not worth the resulting code obfuscation.
|
---|
| 145 | If you have less than ~300 bytes of global data - don't bother.
|
---|
| 146 |
|
---|
| 147 |
|
---|
| 148 | gcc's data alignment problem
|
---|
| 149 |
|
---|
| 150 | The following attribute added in vi.c:
|
---|
| 151 |
|
---|
| 152 | static int tabstop;
|
---|
| 153 | static struct termios term_orig __attribute__ ((aligned (4)));
|
---|
| 154 | static struct termios term_vi __attribute__ ((aligned (4)));
|
---|
| 155 |
|
---|
| 156 | reduces bss size by 32 bytes, because gcc sometimes aligns structures to
|
---|
| 157 | ridiculously large values. asm output diff for above example:
|
---|
| 158 |
|
---|
| 159 | tabstop:
|
---|
| 160 | .zero 4
|
---|
| 161 | .section .bss.term_orig,"aw",@nobits
|
---|
| 162 | - .align 32
|
---|
| 163 | + .align 4
|
---|
| 164 | .type term_orig, @object
|
---|
| 165 | .size term_orig, 60
|
---|
| 166 | term_orig:
|
---|
| 167 | .zero 60
|
---|
| 168 | .section .bss.term_vi,"aw",@nobits
|
---|
| 169 | - .align 32
|
---|
| 170 | + .align 4
|
---|
| 171 | .type term_vi, @object
|
---|
| 172 | .size term_vi, 60
|
---|
| 173 |
|
---|
| 174 | gcc doesn't seem to have options for altering this behaviour.
|
---|
| 175 |
|
---|
| 176 | gcc 3.4.3 and 4.1.1 tested:
|
---|
| 177 | char c = 1;
|
---|
| 178 | // gcc aligns to 32 bytes if sizeof(struct) >= 32
|
---|
| 179 | struct {
|
---|
| 180 | int a,b,c,d;
|
---|
| 181 | int i1,i2,i3;
|
---|
| 182 | } s28 = { 1 }; // struct will be aligned to 4 bytes
|
---|
| 183 | struct {
|
---|
| 184 | int a,b,c,d;
|
---|
| 185 | int i1,i2,i3,i4;
|
---|
| 186 | } s32 = { 1 }; // struct will be aligned to 32 bytes
|
---|
| 187 | // same for arrays
|
---|
| 188 | char vc31[31] = { 1 }; // unaligned
|
---|
| 189 | char vc32[32] = { 1 }; // aligned to 32 bytes
|
---|
| 190 |
|
---|
| 191 | -fpack-struct=1 reduces alignment of s28 to 1 (but probably
|
---|
| 192 | will break layout of many libc structs) but s32 and vc32
|
---|
| 193 | are still aligned to 32 bytes.
|
---|
| 194 |
|
---|
| 195 | I will try to cook up a patch to add a gcc option for disabling it.
|
---|
| 196 | Meanwhile, this is where it can be disabled in gcc source:
|
---|
| 197 |
|
---|
| 198 | gcc/config/i386/i386.c
|
---|
| 199 | int
|
---|
| 200 | ix86_data_alignment (tree type, int align)
|
---|
| 201 | {
|
---|
| 202 | #if 0
|
---|
| 203 | if (AGGREGATE_TYPE_P (type)
|
---|
| 204 | && TYPE_SIZE (type)
|
---|
| 205 | && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST
|
---|
| 206 | && (TREE_INT_CST_LOW (TYPE_SIZE (type)) >= 256
|
---|
| 207 | || TREE_INT_CST_HIGH (TYPE_SIZE (type))) && align < 256)
|
---|
| 208 | return 256;
|
---|
| 209 | #endif
|
---|
| 210 |
|
---|
| 211 | Result (non-static busybox built against glibc):
|
---|
| 212 |
|
---|
| 213 | # size /usr/srcdevel/bbox/fix/busybox.t0/busybox busybox
|
---|
| 214 | text data bss dec hex filename
|
---|
| 215 | 634416 2736 23856 661008 a1610 busybox
|
---|
| 216 | 632580 2672 22944 658196 a0b14 busybox_noalign
|
---|