source: branches/2.2.9/mindi-busybox/docs/keep_data_small.txt

Last change on this file was 3320, checked in by Bruno Cornec, 6 years ago
  • Re-add (thanks git BTW) the 2.2.9 branch which had been destroyed in the move to 3.0
  • Property svn:eol-style set to native
File size: 8.0 KB
Line 
1        Keeping data small
2
3When many applets are compiled into busybox, all rw data and
4bss for each applet are concatenated. Including those from libc,
5if static busybox is built. When busybox is started, _all_ this data
6is allocated, not just that one part for selected applet.
7
8What "allocated" exactly means, depends on arch.
9On NOMMU it's probably bites the most, actually using real
10RAM for rwdata and bss. On i386, bss is lazily allocated
11by COWed zero pages. Not sure about rwdata - also COW?
12
13In order to keep busybox NOMMU and small-mem systems friendly
14we should avoid large global data in our applets, and should
15minimize usage of libc functions which implicitly use
16such structures.
17
18Small experiment to measure "parasitic" bbox memory consumption:
19here we start 1000 "busybox sleep 10" in parallel.
20busybox binary is practically allyesconfig static one,
21built against uclibc. Run on x86-64 machine with 64-bit kernel:
22
23bash-3.2# nmeter '%t %c %m %p %[pn]'
2423:17:28 .......... 168M    0  147
2523:17:29 .......... 168M    0  147
2623:17:30 U......... 168M    1  147
2723:17:31 SU........ 181M  244  391
2823:17:32 SSSSUUU... 223M  757 1147
2923:17:33 UUU....... 223M    0 1147
3023:17:34 U......... 223M    1 1147
3123:17:35 .......... 223M    0 1147
3223:17:36 .......... 223M    0 1147
3323:17:37 S......... 223M    0 1147
3423:17:38 .......... 223M    1 1147
3523:17:39 .......... 223M    0 1147
3623:17:40 .......... 223M    0 1147
3723:17:41 .......... 210M    0  906
3823:17:42 .......... 168M    1  147
3923:17:43 .......... 168M    0  147
40
41This requires 55M of memory. Thus 1 trivial busybox applet
42takes 55k of memory on 64-bit x86 kernel.
43
44On 32-bit kernel we need ~26k per applet.
45
46Script:
47
48i=1000; while test $i != 0; do
49        echo -n .
50        busybox sleep 30 &
51        i=$((i - 1))
52done
53echo
54wait
55
56(Data from NOMMU arches are sought. Provide 'size busybox' output too)
57
58
59        Example 1
60
61One example how to reduce global data usage is in
62archival/libarchive/decompress_unzip.c:
63
64/* This is somewhat complex-looking arrangement, but it allows
65 * to place decompressor state either in bss or in
66 * malloc'ed space simply by changing #defines below.
67 * Sizes on i386:
68 * text    data     bss     dec     hex
69 * 5256       0     108    5364    14f4 - bss
70 * 4915       0       0    4915    1333 - malloc
71 */
72#define STATE_IN_BSS 0
73#define STATE_IN_MALLOC 1
74
75(see the rest of the file to get the idea)
76
77This example completely eliminates globals in that module.
78Required memory is allocated in unpack_gz_stream() [its main module]
79and then passed down to all subroutines which need to access 'globals'
80as a parameter.
81
82
83        Example 2
84
85In case you don't want to pass this additional parameter everywhere,
86take a look at archival/gzip.c. Here all global data is replaced by
87single global pointer (ptr_to_globals) to allocated storage.
88
89In order to not duplicate ptr_to_globals in every applet, you can
90reuse single common one. It is defined in libbb/messages.c
91as struct globals *const ptr_to_globals, but the struct globals is
92NOT defined in libbb.h. You first define your own struct:
93
94struct globals { int a; char buf[1000]; };
95
96and then declare that ptr_to_globals is a pointer to it:
97
98#define G (*ptr_to_globals)
99
100ptr_to_globals is declared as constant pointer.
101This helps gcc understand that it won't change, resulting in noticeably
102smaller code. In order to assign it, use SET_PTR_TO_GLOBALS macro:
103
104    SET_PTR_TO_GLOBALS(xzalloc(sizeof(G)));
105
106Typically it is done in <applet>_main().
107
108Now you can reference "globals" by G.a, G.buf and so on, in any function.
109
110
111        bb_common_bufsiz1
112
113There is one big common buffer in bss - bb_common_bufsiz1. It is a much
114earlier mechanism to reduce bss usage. Each applet can use it for
115its needs. Library functions are prohibited from using it.
116
117'G.' trick can be done using bb_common_bufsiz1 instead of malloced buffer:
118
119#define G (*(struct globals*)&bb_common_bufsiz1)
120
121Be careful, though, and use it only if globals fit into bb_common_bufsiz1.
122Since bb_common_bufsiz1 is BUFSIZ + 1 bytes long and BUFSIZ can change
123from one libc to another, you have to add compile-time check for it:
124
125if (sizeof(struct globals) > sizeof(bb_common_bufsiz1))
126    BUG_<applet>_globals_too_big();
127
128
129        Drawbacks
130
131You have to initialize it by hand. xzalloc() can be helpful in clearing
132allocated storage to 0, but anything more must be done by hand.
133
134All global variables are prefixed by 'G.' now. If this makes code
135less readable, use #defines:
136
137#define dev_fd (G.dev_fd)
138#define sector (G.sector)
139
140
141        Word of caution
142
143If applet doesn't use much of global data, converting it to use
144one of above methods is not worth the resulting code obfuscation.
145If you have less than ~300 bytes of global data - don't bother.
146
147
148        Finding non-shared duplicated strings
149
150strings busybox | sort | uniq -c | sort -nr
151
152
153        gcc's data alignment problem
154
155The following attribute added in vi.c:
156
157static int tabstop;
158static struct termios term_orig __attribute__ ((aligned (4)));
159static struct termios term_vi __attribute__ ((aligned (4)));
160
161reduces bss size by 32 bytes, because gcc sometimes aligns structures to
162ridiculously large values. asm output diff for above example:
163
164 tabstop:
165        .zero   4
166        .section        .bss.term_orig,"aw",@nobits
167-       .align 32
168+       .align 4
169        .type   term_orig, @object
170        .size   term_orig, 60
171 term_orig:
172        .zero   60
173        .section        .bss.term_vi,"aw",@nobits
174-       .align 32
175+       .align 4
176        .type   term_vi, @object
177        .size   term_vi, 60
178
179gcc doesn't seem to have options for altering this behaviour.
180
181gcc 3.4.3 and 4.1.1 tested:
182char c = 1;
183// gcc aligns to 32 bytes if sizeof(struct) >= 32
184struct {
185    int a,b,c,d;
186    int i1,i2,i3;
187} s28 = { 1 };    // struct will be aligned to 4 bytes
188struct {
189    int a,b,c,d;
190    int i1,i2,i3,i4;
191} s32 = { 1 };    // struct will be aligned to 32 bytes
192// same for arrays
193char vc31[31] = { 1 }; // unaligned
194char vc32[32] = { 1 }; // aligned to 32 bytes
195
196-fpack-struct=1 reduces alignment of s28 to 1 (but probably
197will break layout of many libc structs) but s32 and vc32
198are still aligned to 32 bytes.
199
200I will try to cook up a patch to add a gcc option for disabling it.
201Meanwhile, this is where it can be disabled in gcc source:
202
203gcc/config/i386/i386.c
204int
205ix86_data_alignment (tree type, int align)
206{
207#if 0
208  if (AGGREGATE_TYPE_P (type)
209       && TYPE_SIZE (type)
210       && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST
211       && (TREE_INT_CST_LOW (TYPE_SIZE (type)) >= 256
212           || TREE_INT_CST_HIGH (TYPE_SIZE (type))) && align < 256)
213    return 256;
214#endif
215
216Result (non-static busybox built against glibc):
217
218# size /usr/srcdevel/bbox/fix/busybox.t0/busybox busybox
219   text    data     bss     dec     hex filename
220 634416    2736   23856  661008   a1610 busybox
221 632580    2672   22944  658196   a0b14 busybox_noalign
222
223
224
225        Keeping code small
226
227Set CONFIG_EXTRA_CFLAGS="-fno-inline-functions-called-once",
228produce "make bloatcheck", see the biggest auto-inlined functions.
229Now, set CONFIG_EXTRA_CFLAGS back to "", but add NOINLINE
230to some of these functions. In 1.16.x timeframe, the results were
231(annotated "make bloatcheck" output):
232
233function             old     new   delta
234expand_vars_to_list    -    1712   +1712 win
235lzo1x_optimize         -    1429   +1429 win
236arith_apply            -    1326   +1326 win
237read_interfaces        -    1163   +1163 loss, leave w/o NOINLINE
238logdir_open            -    1148   +1148 win
239check_deps             -    1148   +1148 loss
240rewrite                -    1039   +1039 win
241run_pipe             358    1396   +1038 win
242write_status_file      -    1029   +1029 almost the same, leave w/o NOINLINE
243dump_identity          -     987    +987 win
244mainQSort3             -     921    +921 win
245parse_one_line         -     916    +916 loss
246summarize              -     897    +897 almost the same
247do_shm                 -     884    +884 win
248cpio_o                 -     863    +863 win
249subCommand             -     841    +841 loss
250receive                -     834    +834 loss
251
252855 bytes saved in total.
253
254scripts/mkdiff_obj_bloat may be useful to automate this process: run
255"scripts/mkdiff_obj_bloat NORMALLY_BUILT_TREE FORCED_NOINLINE_TREE"
256and select modules which shrank.
Note: See TracBrowser for help on using the repository browser.