Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

source: MondoRescue/branches/3.0/mindi-busybox/docs/unicode.txt@ 3119

Visit:

Last change on this file since 3119 was 2725, checked in by Bruno Cornec, 14 years ago
Update mindi-busybox to 1.18.3 to avoid problems with the tar command which is now failing on recent versions with busybox 1.7.3
Property svn:eol-style set to `native`
File size: 2.3 KB

Line
1	Unicode support in busybox
2
3	There are several scenarios where we need to handle unicode
4	correctly.
5
6	Shell input
7
8	We want to correctly handle input of unicode characters.
9	There are several problems with it. Just handling input
10	as sequence of bytes would break any editing. This was fixed
11	and now lineedit operates on the array of wchar_t's.
12	But we also need to handle the following problematic moments:
13
14	* It is unreasonable to expect that output device supports
15	_any_ unicode chars. Perhaps we need to avoid printing
16	those chars which are not supported by output device.
17	Examples: chars which are not present in the font,
18	chars which are not assigned in unicode,
19	combining chars (especially trying to combine bad pairs:
20	a_chinese_symbol + "combining grave accent" = ??!)
21
22	* We need to account for the fact that unicode chars have
23	different widths: 0 for combining chars, 1 for usual,
24	2 for ideograms (are there 3+ wide chars?).
25
26	* Bidirectional handling. If user wants to echo a phrase
27	in Hebrew, he types: echo "srettel werbeH"
28
29	Editors (vi, ed)
30
31	This case is a bit similar to "shell input", but unlike shell,
32	editors may encounder many more unexpected unicode sequences
33	(try to load a random binary file...), and they need to preserve
34	them, unlike shell which can afford to drop bogus input.
35
36	more, less
37
38	Need to correctly display any input file. Ideally, with
39	ASCII/unicode/filtered_unicode option or keyboard switch.
40	Note: need to handle tabs and backspaces specially
41	(bksp is for manpage compat).
42
43	cut, fold, watch
44
45	May need ability to cut unicode string to specified number of wchars
46	and/or to specified screen width. Need to handle tabs specially.
47
48	sed, awk, grep
49
50	Handle unicode-aware regexp match
51
52	ls (multi-column display)
53
54	ls will fail to line up columnar output if it will not account
55	for character widths (and maybe filter out some of them, see
56	above). OTOH, non-columnar views (ls -1, ls -l, ls \| car)
57	should NOT filter out bad unicode (but need to filter out
58	control chars (coreutils does that). Note that unlike more/less,
59	tabs and backspaces need not special handling.
60
61	top, ps
62
63	Need to perform filtering similar to ls.
64
65	Filename display (in error messages and elsewhere)
66
67	Need to perform filtering similar to ls.
68
69
70	TODO: write an email to Asmus Freytag (asmus@unicode.org),
71	author of http://unicode.org/reports/tr11/

Note: See TracBrowser for help on using the repository browser.

Download in other formats: