 |
Index for Section 4 |
|
 |
Alphabetical listing for C |
|
 |
Bottom of page |
|
charmap(4)
NAME
charmap - Defines character symbols as character encodings
DESCRIPTION
The character set description (charmap) file defines character symbols as
character encodings. This file is the source file for a coded character
set, or codeset. All supported codesets have the Portable Character Set
(PCS) as a proper subset. The PCS consists of the following character
symbols (listed by their standardized symbolic names) and hexadecimal
encodings:
__________________________________________
Symbol Name Hexadecimal Encoding
__________________________________________
<NUL> \x00
<SOH> \x01
<STX> \x02
<ETX> \x03
<EOT> \x04
<ENQ> \x05
<ACK> \x06
<alert> \x07
<backspace> \x08
<tab> \x09
<newline> \x0A
<vertical-tab> \x0B
<form-feed> \x0C
<carriage-return> \x0D
<SO> \x0E
<SI> \x0F
<DLE> \x10
<DC1> \x11
<DC2> \x12
<DC3> \x13
<DC4> \x14
<NAK> \x15
<SYN> \x16
<ETB> \x17
<CAN> \x18
<EM> \x19
<SUB> \x1A
<ESC> \x1B
<IS4> \x1C
<IS3> \x1D
<IS2> \x1E
<IS1> \x1F
<space> \x20
<exclamation-mark> \x21
<quotation-mark> \x22
<number-sign> \x23
<dollar-sign> \x24
<percent> \x25
<ampersand> \x26
<apostrophe> \x27
<left-parenthesis> \x28
<right-parenthesis> \x29
<asterisk> \x2A
<plus-sign> \x2B
<comma> \x2C
<hyphen> \x2D
<period> \x2E
<slash> \x2F
<zero> \x30
<one> \x31
<two> \x32
<three> \x33
<four> \x34
<five> \x35
<six> \x36
<seven> \x37
<eight> \x38
<nine> \x39
<colon> \x3A
<semi-colon> \x3B
<less-than> \x3C
<equal-sign> \x3D
<greater-than> \x3E
<question-mark> \x3F
<commercial-at> \x40
<A> \x41
<B> \x42
<C> \x43
<D> \x44
<E> \x45
<F> \x46
<G> \x47
<H> \x48
<I> \x49
<J> \x4A
<K> \x4B
<L> \x4C
<M> \x4D
<N> \x4E
<O> \x4F
<P> \x50
<Q> \x51
<R> \x52
<S> \x53
<T> \x54
<U> \x55
<V> \x56
<W> \x57
<X> \x58
<Y> \x59
<Z> \x5A
<left-bracket> \x5B
<backslash> \x5C
<right-bracket> \x5D
<circumflex> \x5E
<underscore> \x5F
<grave-accent> \x60
<a> \x61
<b> \x62
<c> \x63
<d> \x64
<e> \x65
<f> \x66
<g> \x67
<h> \x68
<i> \x69
<j> \x6A
<k> \x6B
<l> \x6C
<m> \x6D
<n> \x6E
<o> \x6F
<p> \x70
<q> \x71
<r> \x72
<s> \x73
<t> \x74
<u> \x75
<v> \x76
<w> \x77
<x> \x78
<y> \x79
<z> \x7A
<left-brace> \x7B
<vertical-line> \x7C
<right-brace> \x7D
<tilde> \x7E
<DEL> \x7F
__________________________________________
The charmap file has the following components:
· An optional special symbolic name declarations section
Each declaration in this section consists of a special symbolic name,
followed by one or more space or tab characters, and a value. The
following list describes the special symbolic names that you can
include in the declarations section:
<code_set_name>
Specifies the name of the codeset for which the charmap file is
defined. This value determines the value returned by the
nl_langinfo (CODESET) subroutine. If <code_set_name> is not
declared, the name for the Portable Character Set is used.
<mb_cur_max>
Specifies the maximum number of bytes in a character for the
codeset. Valid values are 1 to 4. The default value is 1.
<mb_cur_min>
Specifies the minimum number of bytes in a character for the
codeset. Since all supported codesets have the Portable Character
Set as a proper subset, this value must be 1.
<escape_char>
Specifies the escape character that indicates encodings in
hexadecimal or octal notation. The default value is a \
(backslash).
<comment_char>
Specifies the character used to indicate a comment within a
charmap file. The default value is a # (number sign).
· The CHARMAP section header
This header marks the beginning of the section that associates
character symbols with encodings.
· Mapping statements for characters in the codeset
Each statement lists a symbolic name for a character and its
associated encoding. The format of a mapping statement is:
<char_symbol> encoding
A symbolic name begins with the < (left-angle bracket) character and
ends with the > (right-angle bracket) character. The characters for
char_symbol (between < and >) can be any characters from the Portable
Character Set, except for control and space characters. The right-
angle bracket (>) can occur in char_symbol as well in the last
position of the name. You must precede all > characters but the last
one with the escape character (as specified by the <escape_char>
special symbolic name).
The format of a mapping statement is:
<char_symbol> encoding
An encoding is specified as one or more character constants, with the
maximum number of character constants specified by the <mb_cur_max>
special symbolic name. The encoding may be listed as decimal, octal,
or hexadecimal constants with the following formats:
Hexadecimal constant
\xxx, where x is a hexadecimal digit
Octal constant
\ooo or \oo, where o is an octal digit
Decimal constant
\dddd or \ddd, where d is a decimal digit
Some examples of character symbol definitions are the following:
<A> \d65 #decimal constant
<B> \x42 #hexadecimal constant
<j10101> \x81\xA1 #multiple hexadecimal constants
A range of symbolic names and corresponding encoded values may also be
defined, where the nonnumeric prefix for each symbolic name is common,
and the numeric portion of the second symbolic name is equal to or
greater than the numeric portion of the first symbolic name. In this
format, a symbolic name value consists of zero or more nonnumeric
characters followed by an integer of one or more decimal digits.
This format defines a series of symbolic names. For example, the
string <j0101>...<j0104> is interpreted as the <j0101>, <j0102>,
<j0103>, and <j0104> symbolic names, in that order.
In statements defining ranges of symbolic names, the encoded value
listed is the value for the first symbolic name in the range.
Subsequent symbolic names have encoded values in increasing order.
For example:
<j0101>...<j0104> \d129\d254
The preceding statement is interpreted as follows:
<j0101> \d129\d254
<j0102> \d129\d255
<j0103> \d130\d0
<j0104> \d130\d1
Although you cannot assign multiple encodings to one symbolic name,
you can create multiple names for one encoded value. This is allowed
because some characters have several common names. For example, the
"." character is called a period in some parts of the world, and a
full stop in others. Both names may appear in the charmap. For
example:
<period> \x2e
<full-stop> \x2e
If used, comments must begin with the character specified by the
<comment_char> special symbolic name. When an entire line is a
comment, you must specify <comment_char> in the first column of the
line.
· The END CHARMAP trailer
This entry denotes the end of character map statements.
The following example is a portion of a possible charmap file:
CHARMAP
<code_set_name> "ISO8859-1"
<mb_cur_max> 1
<mb_cur_min> 1
<escape_char> \
<comment_char> #
<NUL> \x00
<SOH> \x01
<STX> \x02
<ETX> \x03
<EOT> \x04
<ENQ> \x05
<ACK> \x06
<alert> \x07
<backspace> \x09
<tab> \x09
<newline> \x0a
<vertical-tab> \x0b
<form-feed> \x0c
<carriage-return> \x0d
END CHARMAP
FILES
/usr/lib/nls/loc/charmaps/*
Character set description (charmap) source files for supported locales.
The /usr/lib/nls/loc/charmaps directory does not exist when source
files for installed locales are not provided.
SEE ALSO
Commands: locale(1), localedef(1)
Files: locale(4)
 |
Index for Section 4 |
|
 |
Alphabetical listing for C |
|
 |
Top of page |
|