diff --git a/doc/Text Formatting.html b/doc/Text Formatting.html deleted file mode 100644 index e20eae36..00000000 --- a/doc/Text Formatting.html +++ /dev/null @@ -1,886 +0,0 @@ - - -
- --2016-08-19 -
- - -Victor Zverovich, victor.zverovich@gmail.com - - - - -
-This paper proposes a new text formatting functionality that can be used as a
-safe and extensible alternative to the printf
family of functions.
-It is intended to complement the existing C++ I/O streams library and reuse
-some of its infrastructure such as overloaded insertion operators for
-user-defined types.
-
-Example: - -
-std::string message = std::format("The answer is {}.", 42);
-
-
-
-Variations of the printf format string syntax are arguably the most popular
-among the programming languages and C++ itself inherits printf
-from C [1]. The advantage of the printf syntax is that many
-programmers are familiar with it. However, in its current form it has a number
-of issues:
-
hh
, h
, l
,
- j
, etc. are used only to convey type information.
- They are redundant in type-safe formatting and would unnecessarily
- complicate specification and parsing.'%'
in a custom format specifier, e.g. for
- put_time
-like time formatting, poses difficulties.-Although it is possible to address these issues, this will break compatibility -and can potentially be more confusing to users than introducing a different -syntax. -
- -
-Therefore we propose a new syntax based on the ones used in Python
-[3], the .NET family of languages [4],
-and Rust [5]. This syntax employs '{'
and
-'}'
as replacement field delimiters instead of '%'
-and it is described in details in the syntax reference.
-Here are some of the advantages:
-
-The syntax is expressive enough to enable translation, possibly automated,
-of most printf format strings. The correspondence between printf
-and the new syntax is given in the following table.
-
printf | new |
---|---|
- | < |
+ | + |
space | space |
# | # |
0 | 0 |
hh | unused |
h | unused |
l | unused |
ll | unused |
j | unused |
z | unused |
t | unused |
L | unused |
c | c (optional) |
s | s (optional) |
d | d (optional) |
i | d (optional) |
o | o |
x | x |
X | X |
u | d (optional) |
f | f |
F | F |
e | e |
E | E |
a | a |
A | A |
g | g (optional) |
G | G |
n | unused |
p | p (optional) |
-Width and precision are represented similarly in printf
and the
-proposed syntax with the only difference that runtime value is specified by
-*
in the former and {}
in the latter, possibly with
-the index of the argument inside the braces.
-
-As can be seen from the table above, most of the specifiers remain the same
-which simplifies migration from printf
. Notable difference is
-in the alignment specification. The proposed syntax allows left, center,
-and right alignment represented by '<'
, '^'
,
-and '>'
respectively which is more expressive than the
-corresponding printf
syntax. The latter only supports left and
-right (the default) alignment.
-
-The following example uses center alignment and '*'
as a fill
-character:
-
-std::format("{:*^30}", "centered");
-
-
-
-resulting in "***********centered***********"
.
-The same formatting cannot be easily achieved with printf
.
-
-Both the format string syntax and the API are designed with extensibility in -mind. The mini-language can be extended for user-defined types and users can -provide functions that do parsing and formatting for such types. -
- -The general syntax of a replacement field in a format string is - -
-replacement-field ::= '{' [arg-id] [':' format-spec] '}'
-
-
-
-where format-spec
is predefined for built-in types, but can be
-customized for user-defined types. For example, the syntax can be extended
-for put_time
-like date and time formatting
-
-std::time_t t = std::time(nullptr);
-std::string date = std::format("The date is {0:%Y-%m-%d}.", *std::localtime(&t));
-
-
-by providing an overload of std::format_arg
for
-std::tm
:
TODO: example
- -
-Formatting functions rely on variadic templates instead of the mechanism
-provided by <cstdarg>
. The type information is captured
-automatically and passed to formatters guaranteeing type safety and making
-many of the printf
specifiers redundant (see
-Format String Syntax). Buffer management is also automatic to prevent
-buffer overflow errors common to printf
.
-
-As pointed out in - -P0067R1: Elementary string conversions there is a number of use -cases that do not require internationalization support, but do require high -throughput when produced by a server. These include various text-based -interchange formats such as JSON or XML. The need for locale-independent -functions for conversions between integers and strings and between -floating-point numbers and strings has also been highlighted in - -N4412: Shortcomings of iostreams. Therefore a user should be able to -easily control whether to use locales or not during formatting. -
- -
-We follow Python's approach [3] and designate a separate format
-specifier 'n'
for locale-aware numeric formatting. It applies to
-all integral and floating-point types. All other specifiers produce output
-unaffected by locale settings. This can also have positive peformance effect
-because locale-independent formatting can be implemented more efficiently.
-
-An important feature for localization is the ability to rearrange formatting -arguments because the word order may vary in different languages -[3]. For example: -
- -
-printf("String `%s' has %d characters\n", string, length(string)));
-
-
-A possible German translation of the format string might be:
- -
-"%2$d Zeichen lang ist die Zeichenkette `%1$s'\n"
-
-
--using POSIX positional arguments [2]. Unfortunately these -positional specifiers are not portable [6]. The C++ I/O -streams don't support such rearranging of arguments by design because they -are interleaved with the portions of the literal string: -
- -
-std::cout << "String `" << string << "' has " << length(string) << " characters\n";
-
-
--The current proposal allows both positional and automatically numbered -arguments, for example: -
- -
-std::format("String `{}' has {} characters\n", string, length(string)));
-
-
-with the German translation of the format string:
- -
-"{1} Zeichen lang ist die Zeichenkette `{0}'\n"
-
-
-TODO
- -TODO
- -<format>
synopsis
-namespace std {
- class format_error;
-
- class format_args;
-
- template <class Char>
- basic_string<Char> format(const Char* fmt, format_args args);
-
- template <class Char, class ...Args>
- basic_string<Char> format(const Char* fmt, const Args&... args);
-}
-
-
-
-Format strings contain replacement fields surrounded by curly braces
-{}
. Anything that is not contained in braces is considered literal
-text, which is copied unchanged to the output. A brace character can be
-included in the literal text by doubling: {{
and }}
.
-
-The grammar for a replacement field is as follows: -
- - -
-replacement-field ::= '{' [arg-id] [':' format-spec] '}'
-arg-id ::= integer
-integer ::= digit+
-digit ::= '0'...'9'
-
-
-
-In less formal terms, the replacement field can start with an
-arg-id
that specifies the argument whose value is to be formatted
-and inserted into the output instead of the replacement field. The
-arg-id
is optionally followed by a format-spec
,
-which is preceded by a colon ':'
. These specify a non-default
-format for the replacement value.
-
-See also the Format specification mini-language -section. -
- -
-If the numerical arg-id
s in a format string are 0, 1, 2, ... in
-sequence, they can all be omitted (not just some) and the numbers 0, 1, 2, ...
-will be automatically inserted in that order.
-
-Some simple format string examples: -
- -
-"First, thou shalt count to {0}" // References the first argument
-"Bring me a {}" // Implicitly references the first argument
-"From {} to {}" // Same as "From {0} to {1}"
-
-
-
-The format-spec
field contains a specification of how the value
-should be presented, including such details as field width, alignment, padding,
-decimal precision and so on. Each value type can define its own formatting
-mini-language or interpretation of the format-spec
.
-
-Most built-in types support a common formatting mini-language, which is -described in the next section. -
- -
-A format-spec
field can also include nested replacement fields
-in certain position within it. These nested replacement fields can contain only
-an argument index; format specifications are not allowed. This allows the
-formatting of a value to be dynamically specified.
-
-Format specifications are used within replacement fields contained -within a format string to define how individual values are presented (see -Format string syntax). Each formattable type may define -how the format specification is to be interpreted. -
- --Most built-in types implement the following options for format specifications, -although some of the formatting options are only supported by the numeric types. -
- --The general form of a standard format specifier is: -
- -
-format-spec ::= [[fill] align] [sign] ['#'] ['0'] [width] ['.' precision] [type]
-fill ::= <a character other than '{' or '}'>
-align ::= '<' | '>' | '=' | '^'
-sign ::= '+' | '-' | ' '
-width ::= integer | '{' arg-id '}'
-precision ::= integer | '{' arg-id '}'
-type ::= int-type | 'a' | 'A' | 'c' | 'e' | 'E' | 'f' | 'F' | 'g' | 'G' | 'p' | 's'
-int-type ::= 'b' | 'B' | 'd' | 'o' | 'x' | 'X'
-
-
-
-The fill
character can be any character other than '{'
-or '}'
. The presence of a fill character is signaled by the
-character following it, which must be one of the alignment options. If the
-second character of format-spec
is not a valid alignment option,
-then it is assumed that both the fill character and the alignment option are
-absent.
-
-
-The meaning of the various alignment options is as follows: -
- -Option | Meaning |
---|---|
'<' |
-Forces the field to be left-aligned within the available space (this is - the default for most objects). | -
'>' |
-Forces the field to be right-aligned within the available space (this is - the default for numbers). | -
'=' |
-Forces the padding to be placed after the sign (if any) but before the
- digits. This is used for printing fields in the form
- +000000120 . This alignment option is only valid for numeric
- types. |
-
'^' |
-Forces the field to be centered within the available space. | -
-Note that unless a minimum field width is defined, the field width will always -be the same size as the data to fill it, so that the alignment option has no -meaning in this case. -
- -
-The sign
option is only valid for number types, and can be one of
-the following:
-
Option | Meaning |
---|---|
'+' |
-Indicates that a sign should be used for both positive as well as negative - numbers. | -
'-' |
-Indicates that a sign should be used only for negative numbers (this is - the default behavior). | -
space | -Indicates that a leading space should be used on positive numbers, and a - minus sign on negative numbers. | -
-The '#'
option causes the alternate form"0b" ("0B"
), "0"
, or
-"0x"
("0X"
) to the output value. Whether the prefix
-is lower-case or upper-case is determined by the case of the type specifier,
-for example, the prefix "0x"
is used for the type 'x'
-and "0X"
is used for 'X'
. For floating-point numbers
-the alternate form causes the result of the conversion to always contain a
-decimal-point character, even if no digits follow it. Normally, a decimal-point
-character appears in the result of these conversions only if a digit follows it.
-In addition, for 'g'
and 'G'
conversions, trailing
-zeros are not removed from the result.
-
-width
is a decimal integer defining the minimum field width. If
-not specified, then the field width will be determined by the content.
-
-Preceding the width
field by a zero ('0'
) character
-enables sign-aware zero-padding for numeric types. This is equivalent to a
-fill
character of '0'
with an alignment
-type of '='
.
-
-The precision
is a decimal number indicating how many digits should
-be displayed after the decimal point for a floating-point value formatted with
-'f'
and 'F'
, or before and after the decimal point
-for a floating-point value formatted with 'g'
or 'G'
.
-For non-number types the field indicates the maximum field size - in other
-words, how many characters will be used from the field content. The
-precision
is not allowed for integer, character, Boolean, and
-pointer values.
-
-Finally, the type
determines how the data should be presented.
-
The available string presentation types are:
- -Type | Meaning |
---|---|
's' |
-String format. This is the default type for strings and may be omitted. | -
none | -The same as 's' . |
-
The available character presentation types are:
- -Type | Meaning |
---|---|
'c' |
-Character format. This is the default type for characters and may be - omitted. | -
none | -The same as 'c' . |
-
The available integer presentation types are:
- -Type | Meaning |
---|---|
'b' |
-Binary format. Outputs the number in base 2. Using the '#'
- option with this type adds the prefix "0b" to the output
- value. |
-
'B' |
-Binary format. Outputs the number in base 2. Using the '#'
- option with this type adds the prefix "0B" to the output
- value. |
-
'd' |
-Decimal integer. Outputs the number in base 10. | -
'o' |
-Octal format. Outputs the number in base 8. | -
'x' |
-Hex format. Outputs the number in base 16, using lower-case letters for the
- digits above 9. Using the '#' option with this type adds the
- prefix "0x" to the output value. |
-
'X' |
-Hex format. Outputs the number in base 16, using upper-case letters for the
- digits above 9. Using the '#' option with this type adds the
- prefix "0X" to the output value. |
-
'n' |
-Number. This is the same as 'd' , except that it uses the
- current locale setting to insert the appropriate number separator
- characters. |
-
none | -The same as 'd' . |
-
-Integer presentation types can also be used with character and Boolean values. -Boolean values are formatted using textual representation, either true or false, -if the presentation type is not specified. -
- -The available presentation types for floating-point values are:
- -Type | Meaning |
---|---|
'a' |
-Hexadecimal floating point format. Prints the number in base 16 with prefix
- "0x" and lower-case letters for digits above 9. Uses
- 'p' to indicate the exponent. |
-
'A' |
-Same as 'a' except it uses upper-case letters for the prefix,
- digits above 9 and to indicate the exponent. |
-
'e' |
-Exponent notation. Prints the number in scientific notation using the
- letter 'e' to indicate the exponent. |
-
'E' |
-Exponent notation. Same as 'e' except it uses an upper-case
- 'E' as the separator character. |
-
'f' |
-Fixed point. Displays the number as a fixed-point number. | -
'F' |
-Fixed point. Same as 'f' , but converts nan to
- NAN and inf to INF . |
-
'g' |
-General format. For a given precision p >= 1 , this rounds the
- number to p significant digits and then formats the result in
- either fixed-point format or in scientific notation, depending on its
- magnitude.
-
- A precision of 0 is treated as equivalent to a precision of
- 1 . |
-
'n' |
-Number. This is the same as 'g' , except that it uses the
- current locale setting to insert the appropriate number separator
- characters. |
-
none | -The same as 'g' . |
-
The available presentation types for pointers are:
- -Type | Meaning |
---|---|
'p' |
-Pointer format. This is the default type for pointers and may be - omitted. | -
none | -The same as 'p' . |
-
format_error
-class format_error : public std::runtime_error {
-public:
- explicit format_error(const string& what_arg);
- explicit format_error(const char* what_arg);
-};
-
-
-
-The class format_error
defines the type of objects thrown as
-exceptions to report errors from the formatting library.
-
format_error(const string& what_arg);
Effects: Constructs an object of class format_error
.
Postcondition: strcmp(what(), what_arg.c_str()) == 0
.
format_error(const char* what_arg);
Effects: Constructs an object of class format_error
.
Postcondition: strcmp(what(), what_arg) == 0
.
format_args
TODO
- -format
template <class Char>
- basic_string<Char> format(const Char* fmt, format_args args);
-
-template <class Char, class ...Args>
- basic_string<Char> format(const Char* fmt, const Args&... args);
-Requires: fmt
shall not be a null pointer.
-Effects: Each function returns a basic_string
object
-constructed from the format string argument fmt
with each
-replacement field substituted with the character representation of the
-argument it refers to, formatted according to the specification given in the
-field.
-
Returns: The formatted string.
-Throws: format_error
if fmt
is not a valid
-format string.
-The ideas proposed in this paper have been implemented in the open-source fmt -library. TODO: link and mention other implementations (Boost Format, FastFormat) -
- -
-[1]
-The fprintf
function. ISO/IEC 9899:2011. 7.21.6.1.
-[2]
-
-fprintf, printf, snprintf, sprintf - print formatted output. The Open
-Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 Edition.
-[3]
-
-6.1.3. Format String Syntax. Python 3.5.2 documentation.
-[4]
-
-String.Format Method. .NET Framework Class Library.
-[5]
-
-Module std::fmt
. The Rust Standard Library.
-[6]
-
-Format Specification Syntax: printf and wprintf Functions. C++ Language and
-Standard Libraries.
-[7]
-
-10.4.2 Rearranging printf Arguments. The GNU Awk User's Guide.
-