BufferWriter Formatting

Synopsis

#include "swoc/bwf_base.h"

Formatted output was added to BufferWriter for several reasons.

  • Type safe formatted output in addition to buffer safe formatted output. Rather than non-obvious cleverness with snprintf and BufferWriter::commit(), build the formatting in directly.

  • Specialized output functions for complex types, to have the class provide the formatting logic instead of cut and pasted code in multiple locations. This also avoids breaking modularity to get the data needed for good formatting. This also enables formatting wrappers which can provide generic and simple ways to do specific styles of output beyond formatting codes (e.g. As_Hex).

  • Argument naming, both for ordering, repeating, and for “global” names which can be used without arguments. This is also intended for use where there are context dependent names, e.g. for printing in the context of an HTTP header, the header field names could be made so their use is replaced by the value of that field.

  • The ability to pass arbitrary “extra” data to formatting functions for special, type dependent purposes.

The formatting style is the “prefix” or “printf” style - the format is specified first and then all the arguments. The syntax is based on Python formatting. This contrasts to the “infix” or “streaming” style where formatting, literals, and argument are intermixed in the order of output. There are various arguments for both styles but conversations within the Trafffic Server community indicated a clear preference for the prefix style. Therefore creating formatted output consists of a format string, containing literal text and format specifiers, which are replaced with generated text, usually based on the values of arguments to the print function.

The design is optimized for formatted output to fixed buffers. This is by far the dominant style in the expected use cases and during the design phase I was told any performance loss compared to snprintf must be minimal. While work has and will be done to extend BufferWriter to operate on non-fixed buffers, such use is secondary to operating directly on contiguous buffers.

Important

The overriding design goal is to provide the type specific formatting and flexibility of C++ stream operators with the performance of snprintf and memcpy.

Usage

As noted BufferWriter formatting is modeled on Python string formatting because the Traffic Server project uses quite a bit of Python. It seemed a good model for prefix style formatting, mapping easily in to the set of desired features. The primary divergences are

  • Names do not refer to in scope variables, but to output generators local to the print context via Name Binding.

  • The addition of a third colon separated field to provide extension data to the formatting logic.

The primary entry point for this is BufferWriter::print().

A format string consists of literal text in which format specifiers are embedded. Each specifier marks a place where generated output will be placed. The specifier is marked by paired braces and is divided in to three fields, separated by colons. These fields are optional - if default output is acceptable, a pair of braces will suffice. In a sense, {} serves the same function for output as auto does for programming - the compiler knows the type, it should be able to do something reasonable without the programmer needing to be explicit. The fields are used in the less common cases where greater control of the output is required.

Format Specifier Grammar

This is the grammar for the fields inside a format specifier.

specifier ::= "{" [name] [":" [style] [":" extension]] "}"
name      ::= index | ICHAR+
index     ::= non-negative integer
extension ::= ICHAR*
ICHAR     ::= a printable ASCII character except for '{', '}', ':'
style     ::= formatting instructions.

The three fields are name, style, and extension.

name

The name of the argument to use. This can be a non-negative integer in which case it is the zero based index of the argument to the method call. E.g. {0} means the first argument and {2} is the third argument after the format.

bw.print("{0} {1}", 'a', 'b') => a b

bw.print("{1} {0}", 'a', 'b') => b a

The name can be omitted in which case it is treated as an index in parallel to the position in the format string relative to other argument based specifiers. Only the position in the format string matters, not what arguments other format specifiers may have used.

bw.print("{0} {2} {}", 'a', 'b', 'c') => a c c

bw.print("{0} {2} {2}", 'a', 'b', 'c') => a c c

Note an argument can be printed more than once if the name is used more than once.

bw.print("{0} {} {0}", 'a', 'b') => a b a

bw.print("{0} {1} {0}", 'a', 'b') => a b a

Alphanumeric names refer to values in a format context table. These will be described in more detail someday. Such names do not count in terms of default argument indexing. These rules are designed to be natural, but any ambiguity can be eliminated by explicit indexing in the specifiers.

style

Basic formatting control.

style     ::= [[fill]align][sign]["#"]["0"][[min][.precision][,max][type]]
fill      ::= fill-char | URI-char
URI-char  ::= "%" hex-digit hex-digit
fill-char ::= printable character except "{", "}", ":", "%"
align     ::= "<" | ">" | "=" | "^"
sign      ::= "+" | "-" | " "
min       ::= non-negative integer
precision ::= positive integer
max       ::= non-negative integer
type      ::= "g" | "s" | "S" | "x" | "X" | "d" | "o" | "b" | "B" | "p" | "P"
hex-digit ::= "0" .. "9" | "a" .. "f" | "A" .. "F"

The output is placed in a field that is at least min wide and no more than max wide. If the output is less than min then

  • The fill character is used for the extra space required. This can be an explicit character or a URI encoded one (to allow otherwise reserved characters).

  • The output is shifted according to the fmt:align.

    <

    Align to the left, fill to the right.

    >

    Align to the right, fill to the left.

    ^

    Align in the middle, fill to left and right.

    =

    Numerically align, putting the fill between the sign character (left aligned) and the value (right aligned).

The output is clipped by max width characters and by the end of the buffer. precision is used by floating point values to specify the number of places of precision.

type is used to indicate type specific formatting. For integers it indicates the output radix and if # is present the radix is prefix is generated (one of 0xb, 0, 0x). Format types of the same letter are equivalent, varying only in the character case used for output. Most commonly ‘x’ prints values in lower cased hexadecimal (0x1337beef) while ‘X’ prints in upper case hexadecimal (0X1337BEEF). Note there is no upper case decimal or octal type because case is irrelevant for those.

g

generic, default.

b

binary

B

Binary

d

decimal

o

octal

x

hexadecimal

X

Hexadecimal

p

pointer (hexadecimal address)

P

Pointer (Hexadecimal address)

s

string

S

String (upper case)

For several specializations the hexadecimal format is taken to indicate printing the value as if it were a hexidecimal value, in effect providing a hex dump of the value. This is the case for std::string_view and therefore a hex dump of an object can be done by creating a std::string_view covering the data and then printing it with {:x}.

The string type (‘s’ or ‘S’) is generally used to cause alphanumeric output for a value that would normally use numeric output. For instance, a bool is normally 0 or 1. Using the type ‘s’ yields true or false. The upper case form, ‘S’, applies only in these cases where the formatter generates the text, it does not apply to normally text based values unless specifically noted. Therefore a bool printed with the type ‘S’ yields TRUE or FALSE. This is frequently done with formatting for enumerations, printing the numeric value by default and printing a text equivalent for format ‘s’ or ‘S’.

extension

Text (excluding braces) passed to the type specific formatter function. This can be used to provide extensions for specific argument types (e.g., IP addresses). It is never examined by BufferWriter formatting, it is only effective in type specific formatting overloads.

When a format specifier is parsed, the result is placed in an instance of bwf::Spec.

Examples

Some examples, comparing snprintf and BufferWriter::print().

if (len > 0) {
   auto n = snprintf(buff, len, "count %d", count);
   len -= n;
   buff += n;
}

bw.print("count {}", count);

// --

if (len > 0) {
   auto n = snprintf(buff, len, "Size %" PRId64 " bytes", sizeof(thing));
   len -= n;
   buff += n;
}

bw.print("Size {} bytes", sizeof(thing));

// --

if (len > 0) {
   auto n = snprintf(buff, len, "Number of items %ld", thing->count());
   len -= n;
   buff += n;
}

bw.print("Number of items {}", thing->count());

Enumerations become easier. Note in this case argument indices are used in order to print both a name and a value for the enumeration. A key benefit here is the lack of need for a developer to know the specific free function or method needed to do the name lookup. In this case, HttpDebugNuames::get_server_state_name. Rather than every developer having to memorize the assocation between the type and the name lookup function, or grub through the code hoping for an example, the compiler is told once and henceforth does the lookup. The implementation of the formatter is described in an example <bwf-http-debug-name-example>. A sample of code previously used to output an error message using this enumeration.

if (len > 0) {
   auto n = snprintf(buff, len, "Unexpected event %d in state %s[%d] for %.*s",
      event,
      HttpDebugNames::get_server_state_name(t_state.current.state),
      t_state.current.state,
      static_cast<int>(host_len), host);
   buff += n;
   len -= n;
}

Using BufferWriter

bw.print("Unexpected event {0} in state {1}[{1:d}] for {2}",
   event, t_state.current.state, std::string_view{host, host_len});

Adapting to use of std::string_view illustrates the advantage of a formatter overload knowing how to get the size from the object and not having to deal with restrictions on the numeric type (e.g., that %.*s requires an int, not a size_t).

if (len > 0) {
   len -= snprintf(buff, len, "%.*s", static_cast<int>(s.size()), s.data());
}

vs

bw.print("{}", s);

or even

bw.write(s);

The difference is even more stark with dealing with IP addresses. There are two big advantages here. One is not having to know the conversion function name. The other is the lack of having to declare local variables and having to remember what the appropriate size is. Not requiring local variables can be particularly nice in the context of a switch statement where local variables for a case mean having to add extra braces, or declare the temporaries at an outer scope.

char ip_buff1[INET6_ADDRPORTSTRLEN];
char ip_buff2[INET6_ADDRPORTSTRLEN];
ats_ip_nptop(ip_buff1, sizeof(ip_buff1), addr1);
ats_ip_nptop(ip_buff2, sizeof(ip_buff2), add2);
if (len > 0) {
   snprintf(buff, len, "Connecting to %s from %s", ip_buff1, ip_buff2);
}

vs

bw.print("Connecting to {} from {}", addr1, addr2);

User Defined Formatting

To get the full benefit of type safe formatting it is necessary to provide type specific formatting functions which are called when a value of that type is formatted. This is how type specific knowledge such as the names of enumeration values are encoded in a single location. The special formatting for IP address data is done by providing default formatters, it is not built in to the base formatting logic.

Most of the support for this is in the nested namespace bwf.

The format style is stored in an instance of bwf::Spec.

class Spec

Format specifier data.

Reference.

Additional type specific formatting can be provided via the extension field. This provides another option for tweaking formatted output vs. using wrapper classes.

To provide a formatter for a type V the function bwformat is overloaded. The signature would look like this:

swoc::BufferWriter&
swoc::bwformat( swoc::BufferWriter& w
              , swoc::bwf::Spec const& spec
              , V const& v
              )

w is the output and spec the parsed format specifier, including the name and extension (if any). The calling framework will handle basic alignment as per spec therefore the overload normally does not need to do so. In some cases, however, the alignment requirements are more detailed (e.g. integer alignment operations) or performance is critical. In the latter case the formatter should make sure to use at least the minimum width in order to disable any framework alignment operation.

It is important to note a formatter can call another formatter. For example, the formatter for std::string looks like

 */
BufferWriter &bwformat(BufferWriter &w, bwf::Spec const &spec, const void *ptr);

/** Format a generic (void) memory span.

A more complex example of this which illustrates other mechanisms is formatting a character pointer.

inline BufferWriter &
bwformat(BufferWriter &w, bwf::Spec const &spec, const char *v) {
  if (spec._type == 'x' || spec._type == 'X' || spec._type == 'p' || spec._type == 'P') {
    bwformat(w, spec, static_cast<const void *>(v));
  } else if (v != nullptr) {
    bwformat(w, spec, std::string_view(v));
  } else {
    bwformat(w, spec, nullptr);
  }
  return w;
}

This checks the format and if it’s a pointer or hex format, delegates to generic pointer formatting. Otherwise if it’s not nullptr then it’s treated as a C-string and delegated to the string_view formatter. If it is nullptr then it’s delegated to the formatter for nullptr_t.

The implementation for generic pointer formatting is

The code first copies the format specification and forces a leading radix. Next it does special handling for nullptr. If the pointer is valid, the code checks if the type p or P was used in order to select the appropriate case, then delegates the actual rendering to the integer formatter with a type of x or X as appropriate. In turn other formatters, if given the type p or P can cast the value to const void* and call bwformat on that to output the value as a pointer. The difference between calling bwformat vs. BufferWriter::write() is the ability to pass the format specifier instance. If all of the formatting is handled directly, then direct BufferWriter methods are a good choice. If the formatter wants to use the built in formatting then bwformat is the right choice. This is what is done with the pointer example above - the format specifier is copied and tweaked, and then passed on so that any formatting provided from the original format string remains valid.

To help reduce duplication, the output stream operator operator<< on a BufferWriter is defined to call bwformat with a default constructed bwf::Spec instance. This makes

w << thing;

identical to

bwformat(w, swoc::bwf::Spec::DEFAULT, thing);

which is also the same as

w.print("{}", thing);

Enum Example

For a specific example of using BufferWriter formatting to make debug messages easier, consider the case of HttpDebugNames in the Traffic Server code base. This is a class that serves as a namespace to provide various methods that convert state machine related enumerations into descriptive strings. Currently this is undocumented (and uncommented) and is therefore used infrequently, as that requires either blind cut and paste, or tracing through header files to understand the code. The result is much less useful diagnostics. This can be greatly simplified by adding formatters to proxy/http/HttpDebugNames.h

inline swoc::BufferWriter &
bwformat(swoc::BufferWriter &w, swoc::bwf::Spec const &spec, HttpTransact::ServerState_t state)
{
   if (spec.has_numeric_type()) {
      // allow the user to force numeric output with '{:d}' or other numeric type.
      return bwformat(w, spec, static_cast<uintmax_t>(state));
   } else {
      return bwformat(w, spec, HttpDebugNames::get_server_state_name(state));
   }
}

With this in place, the code to print the name of the server state enumeration is

bw.print("{}", t_state.current_state);

There is no need to remember names like HttpDebugNames nor which method in it does the conversion. The developer making the HttpDebugNames class or equivalent can take care of that in the same header file that provides the type. The type specific formatting is incorporated in to the general printing mechanism and from that point on works without any local code required, or memorization by the developer.

Argument Forwarding

It will frequently be useful for other libraries to support formatting for input strings. For such use cases the class methods will need to take variable arguments and then forward them on to the formatter. BufferWriter provides BufferWriter::print_v() for this purpose. Instead of taking C style variable arguments, these overloads take a reference to a std::tuple of arguments. Such as tuple is easily created with std::forward_as_tuple. An example of this is a container of messages. The message class is

class Message {
  using self_type = Message; ///< Self reference type.

public:
  // Message severity level.
  enum Severity { LVL_DEBUG, LVL_INFO, LVL_WARN, LVL_ERROR };

protected:
  std::string _text; // Text of the message.
  Severity _severity{LVL_DEBUG};
  int _indent{0}; // indentation level for display.

The container class has a debug method to append Message instances using BufferWriter formatting.

public:
  ~Container();
  void print() const;

The implementation is simple.

}

Message::Severity
Container::max_severity() const {
  auto spot = std::max_element(_msgs.begin(), _msgs.end(),
                               [](Message const &lhs, Message const &rhs) { return lhs._severity < rhs._severity; });
  return spot == _msgs.end() ? Message::Severity::LVL_DEBUG : spot->_severity;
}

void

This gathers the argument (generally references to the arguments) in to a single tuple which is then passed by reference, to avoid restacking the arguments for every nested function call. In essence refernces the arguments are put on the stack (inside the tuple) once and a reference to that stack is passed to nested functions. This replaces the C style va_list and provides not just arguments but also complete type information.

The example code uses bwprint_v() to print to a std::string. There is corresponding method, BufferWriter::print_v(), which takes a tuple instead of an explicit list of arguments when working with BufferWriter instances. Internally, of course, bwprint_v() is implemented using a local FixedBufferWriter instance and BufferWriter::print_v().

Default Type Specific Formatting

BufferWriter formatting has a number of user defined formatting overloads built in, primarily for types used inside the BufferWriter formatting implementation, to avoid circular reference problems. There is also support for formatting IP addresses via an additional include file.

Specific types

std::string_view

Generally the contents of the view.

‘x’ or ‘X’

A hexadecimal dump of the contents of the view in lower (‘x’) or upper (‘X’) case.

‘p’ or ‘P’

The pointer and length value of the view in lower (‘p’) or upper (‘P’) case.

‘s’

The string in (forced) lower case.

‘S’

The string in (forced) upper case.

For printing substrings, views are sufficiently cheap to do this in the arguments. For instance, printing the 10th through 20th characters of the view text means passing text.substr(9,11) instead of text.

  std::string_view text{"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"};

  bw.clear().print("Text: |{0:->20}|", text.substr(9, 11));
  REQUIRE(bw.view() == "Text: |---------9abcdefghij|");

However, for those terminally addicted to C style formatting, this can also be done by setting the precision.

  bw.clear().print("Text: |{0:->20.11}|", text.substr(9));
  REQUIRE(bw.view() == "Text: |---------9abcdefghij|");

  bw.clear().print("Text: |{:20}|", text.substr(0, 10));
  REQUIRE(bw.view() == "Text: |0123456789          |");
  bw.clear().print("Text: |{:20.10}|", text);
  REQUIRE(bw.view() == "Text: |0123456789          |");
TextView

Because this is a subclass of std::string_view, all of the formatting for that works the same for this class.

sockaddr const *

#include "swoc/bwf_ip.h"

The IP address is printed. Fill is used to fill in address segments if provided, not to the minimum width if specified. IPEndpoint and IPAddr are supported with the same formatting. The formatting support in this case is extensive because of the commonality and importance of IP address data.

Type overrides

‘p’ or ‘P’

The pointer address is printed as hexadecimal lower (‘p’) or upper (‘P’) case.

The extension can be used to control which parts of the address are printed. These can be in any order, the output is always address, port, family. The default is the equivalent of “ap”. In addition, the character ‘=’ (“numeric align”) can be used to internally right justify the elements.

‘a’

The address.

‘p’

The port (host order).

‘f’

The IP address family.

‘=’

Internally justify the numeric values. This must be the first or second character. If it is the second the first character is treated as the internal fill character. If omitted ‘0’ (zero) is used.

E.g.

void func(sockaddr const* addr) {
  bw.print("To {}", addr); // -> "To 172.19.3.105:4951"
  bw.print("To {0::a} on port {0::p}", addr); // -> "To 172.19.3.105 on port 4951"
  bw.print("To {::=}", addr); // -> "To 127.019.003.105:04951"
  bw.print("Using address family {::f}", addr);
  bw.print("{::a}",addr);      // -> "172.19.3.105"
  bw.print("{::=a}",addr);     // -> "172.019.003.105"
  bw.print("{::0=a}",addr);    // -> "172.019.003.105"
  bw.print("{:: =a}",addr);    // -> "172. 19.  3.105"
  bw.print("{:>20:a}",addr);   // -> "        172.19.3.105"
  bw.print("{:>20:=a}",addr);  // -> "     172.019.003.105"
  bw.print("{:>20: =a}",addr); // -> "     172. 19.  3.105"
}

Format Classes

Although the extension for a format can be overloaded to provide additional features, this can become too confusing and complex to use if it is used for fundamentally different semantics on the same based type. In that case it is better to provide a format wrapper class that holds the base type but can be overloaded to produce different (wrapper class based) output. The classic example is errno which is an integral type but frequently should be formatted with additional information such as the descriptive string for the value. To do this the format wrapper class swoc::bwf::Errno is provided. Using it is simple:

w.print("File not open - {}", swoc::bwf::Errno(errno));

which will produce output that looks like

“File not open - EACCES: Permission denied [13]”

For errno this is handy in another way as swoc::bwf::Errno will preserve the value of errno across other calls that might change it. E.g.:

swoc::bwf::Errno last_err(errno);
// some other code generating diagnostics that might tweak errno.
w.print("File not open - {}", last_err);

This can also be useful for user defined data types. For instance, in the HostDB component of Traffic Server the type of the entry is printed in multiple places and each time this code is repeated

"%s%s %s", r->round_robin ? "Round-Robin" : "",
   r->reverse_dns ? "Reverse DNS" : "", r->is_srv ? "SRV" : "DNS"

This could be wrapped in a class, HostDBFmt such as

struct HostDBFmt {
   HostDBInfo* _r { nullptr };
   HostDBFmt(r) : _r(r) {}
};

Then define a formatter for the wrapper

swoc::BufferWriter& bwformat( swoc::BufferWriter& w
                            , swoc::bwf::Spec const&
                            , HostDBFmt const& wrap
) {
   return w.print("{}{} {}", wrap._r->round_robin ? "Round-Robin" : "",
      r->reverse_dns ? "Reverse DNS" : "",
      r->is_srv ? "SRV" : "DNS");
}

Now all of the cut and paste formatting code is replaced with

w.print("{}", HostDBFmt(r));

These are the existing format classes in header file bfw_std_format.h. All are in the swoc::bwf namespace.

class Errno

Formatting for errno. Generically the formatted output is the short name, the description, and the numeric value. A format type of d will generate just the numeric value, while a format type of s will generate the short name and description without a number.

For more detailed output, the extension can be used to pick just the short or long name. For non-numeric format codes, if the extension has the character ‘s’ then the short name is output, and if it contains the character ‘l’ the long name is output.

Examples:

Format

Result

:n

[13]

:s

EACCES: Permission denied

:s:sl

EACCES: Permission denied

:s:s

EACCES

:s:l

Permission denied

::s

EACCES [13]

Reference.

class Date

Date formatting in the strftime style. An instance can be constructed with a strftime compatible format, or with a time_t and format string.

When used the format specification can take an extention of “local” which formats the time as local time. Otherwise it is GMT. w.print("{}", Date("%H:%M")); will print the hour and minute as GMT values. w.print("{::local}", Date("%H:%M")); will print the hour and minute in the local time zone. w.print("{::gmt}"), ...); will output in GMT if additional explicitness is desired.

Reference.

template<typename ...Args>
FirstOf(Args&&... args)

Print the first non-empty string in an argument list. All arguments must be convertible to std::string_view.

By far the most common case is the two argument case used to print a special string if the base string is null or empty. For instance, something like this:

w.print("{}", name != nullptr ? name : "<void>")

This could also be done like:

w.print("{}", swoc::bwf::FirstOf(name, "<void>"));

If the first argument is a local variable that exists only to do the empty check, that variable can eliminated entirely.

const char * name = thing.get_name(); w.print(“{}”, name != nullptr ? name : “<void>”)

can be simplified to

w.print(“{}”, swoc::bwf::FirstOf(thing.get_name(), “<void>”));

In general avoiding ternary operators in the print argument list makes the code cleaner and easier to understand.

Reference.

class Optional

A wrapper for optional output generation. This wraps a format string and a set of arguments and generates output conditional, either the format string with the arguments applied, or nothing. This is useful for output data that requires additional delimiters if present, but nothing if not. A common pattern for this is something like

printf("Text: %d%s%s", count, data ? data : "", data ? " " : "");

or something like

printf("Text: %d");
if (data) {
   printf(" %s", data);
}

In both cases, the leading space separating data from the previous output is printed iff data is not nullptr. Using Optional with BufferWriter formatting this is done with something like

w.print("Text: {}{}", count, swoc:bwf::Optional(data != nullptr, " {}", data);

The first argument is a conditional, which determines if output is generated, followed by a format string and then arguments for the format string. The number of specifiers in the format string and the number of arguments must agree.

Because the case where the argument and the conditional are effective the same is so common, there is a specialization of Optional which takes just a format string and an argument. This requires the format string to have take only one parameter, and the argument to either

  • Have the method empty which returns false if there is content.

  • Be convertible to bool such that the argument converts to true if there is content.

This enables the example to be further reduced to

w.print("Text: {}{}", count, swoc:bwf::Optional(" {}", data);

Note this works with raw C strings, the STL string classes, and TextView. The more general form can be used if this specialization doesn’t suffice.

Reference.

Writing a Format Class

Writing addtional format classes is designed to be easy, taking two or three steps. For example, consider a wrapper to output a string in rot13.

The first step is to declare the wrapper class.

  return bwformat(w, spec, swoc::transform_view_of(rot13, wrap._src));
}
Rotter(std::string_view const &sv) {

This class simply stores the std::string_view for later use.

Next the formatting for the wrapper class must be provided by overloading bwformat.

}

struct Thing {
  std::string _name;
  unsigned _n{0};
};

As_Rot13

This uses transform_view_of() to do the character rotation. The lambda to perform the per character transform is defined separate for code cleanliness, it could just as easily have been defined directly as an argument.

That’s all that is strictly required - this code now works as expected.

  // Verify symmetry.
  w.clear().print("Rot {}.", As_Rot13("Sepideh"));
  REQUIRE(w.view() == "Rot Frcvqru.");
};

Note the universal initializer must be used because there is no constructor. That is easily fixed.

  return bwformat(w, spec, swoc::transform_view_of(rot13, wrap._src));
}

As_Rot13
Rotter(std::string_view const &sv) {

and now this works as expected.

Obviously other constructors can be provided for different ways to use the wrapper.

An optional third step is to use free functions, rather than constructors, to access the wrapper. This is useful in some circumstances, one example being that it is desirable other classes can overload the format class construction, which is not possible using only constructors. In this case, a wrapper function could be done as

  return As_Rot13(thing._name);
}

TEST_CASE("bwf wrapper", "[libswoc][bwf][wrapper]") {
  LocalBufferWriter<256> w;

and used

Now, if there was a struct that needed Rot13 support

  w.clear().print("Rot {}.", As_Rot13{s1});
  REQUIRE(w.view() == "Rot Sepideh.");

then the wrapper could be overloaded with

  REQUIRE(w.view() == "Rot Sepideh.");

  w.clear().print("Rot {}.", Rotter(s1));
  REQUIRE(w.view() == "Rot Sepideh.");

and used

In general, provide wrapper class constructors unless there is a specific need for using free functions instead. Care should be used with the content of the format class to avoid expensive copies. In this case a std::string_view is very cheap to copy and the style of the wrapper takes advantage of return value optimization.

Working with standard I/O

For convenience a stream operator for std::stream is provided to make the use more natural.

std::cout << bw;
std::cout << bw.view(); // identical effect as the previous line.

Using a BufferWriter with printf is straight forward by use of the sized string format code if necessary (generally using C++ IO streams is a better choice).

swoc::LocalBufferWriter<256> bw;
bw.print("Failed to connect to {}", addr1);
printf("%.*s\n", int(bw.size()), bw.data());

Alternatively the output can be null terminated in the formatting to avoid having to pass the size.

swoc::LocalBufferWriter<256> bw;
printf("%s\n", bw.print("Failed to connect to {}\0", addr1).data());

When using C++ stream I/O, writing to a stream can be done without any local variables at all.

std::cout << swoc::LocalBufferWriter<256>().print("Failed to connect to {}", addr1)
          << std::endl;

If done repeatedly, a using improves the look

using LBW = swoc::LocalBufferWriter<256>;
// ...
std::cout << LBW().print("Failed to connect to {}", addr1) << std::endl;

This is handy for temporary debugging messages as it avoids having to clean up local variable declarations later, particularly when the types involved themselves require additional local declarations (such as in this example, an IP address which would normally require a local text buffer for conversion before printing). As noted previously this is particularly useful inside a case where local variables are more annoying to set up.

Name Binding

The first part of each format specifier is a name. This was originally done to be more compliant with Python formatting and is most commonly left blank, although sometimes it is used to format arguments out of order or use them multiple times. To make this a more useful feature, BufferWriter formatting supports name binding which binds names to text generator functors. The generator is expected to write output to a BufferWriter instance to replace the specifier, rather than a formatting argument.

The base formatting logic is passed a functor by constant reference which provides the name binding service. The functor is expected to have the signature

unspecified_type (BufferWriter & w, bwf::Spec const& spec) const

As the format string is processed, if a format specifier has a name that is not numeric, the formatting logic calls the functor, ignoring the return value (which can therefore be of any type, including void). w is the output buffer and spec is the specifier that caused the functor to be invoked. The binding functor is expected to generate text in w in accordance with the format specifier spec. Generally this involves looking up a functor based on the name and calling that in turn to generate the text. The name for the binding is contained in the Spec::_name member of spec.

The class NameBinding is provided as a base class for supporting name binding. It

  • Forces a virtual destructor.

  • Provides a pure virtual declaration to ensure the correct function operator is implemented.

  • Provides a standardized “missing name” method.

This class is handy but not required.

BufferWriter formatting provides support for two use cases.

External Generators

The first use case is for an “external generator” which generates text based on static or global data. An example would be a “timestamp” generator which generates a timestamp based on the current time. This could be associated with the name “timestamp” and used like

  w.clear();

to generate output such as “Nov 16 12:21:05.545 Test Started”.

Context Generators

The second is a “context generator” which generates text based on a context object. This use case presumes a set of generators which access parts of a context object for text generation such that the output of the generator depends on the state of the context object. For example, the context object might be an HTTP request and the generators field accessors, each of which outputs the value for a specific field of the request. Because the name is handed to the name binding object, an implementation could subclass ContextNames and override the function operator to check the name first against fields in the request, and only if that doesn’t match, do a lookup for a generator. ContextNames provides an implementation for storing and using name bindings.

Global Names

The external name generator support is used to create a set of default global names. A global singleton instance of an external name binding, ExternalNames, is used by default when generating formatting output. Generators assigned to this instance are therefore available in the default printing context. Here are a couple of examples for illustration of how this can be used.

A “timestamp” name was used as an example of a name useful to implement, so the example here will start by doing that.

First, the generator is defined.

BWF_Timestamp(BufferWriter &w, Spec const &spec) {
  auto now   = std::chrono::system_clock::now();
  auto epoch = std::chrono::system_clock::to_time_t(now);
  LocalBufferWriter<48> lw;

  ctime_r(&epoch, lw.aux_data());
  lw.commit(19); // take only the prefix.
  lw.print(".{:03}", std::chrono::time_point_cast<std::chrono::milliseconds>(now).time_since_epoch().count() % 1000);
  bwformat(w, spec, lw.view().substr(4));
  return w;
}

BufferWriter &

This generates a time stamp with the month through seconds, dropping the leading year and clipping everything past the seconds. It then adds milliseconds. Sample output looks like “Nov 16 11:40:20.833”. This is then attached to the default global name binding in an initialization function called during process startup.

} // namespace

void
EX_BWF_Format_Init() {
  swoc::bwf::Global_Names().assign("timestamp", &BWF_Timestamp);
  swoc::bwf::Global_Names().assign("now", &BWF_Now);
  swoc::bwf::Global_Names().assign("version", &BWF_Version);

Because the test code is statically linked to the library, this must be done via a function called from main to be sure the library statics have been fully initialized. That taken care of, using the global name is trivial.

  w.clear();

The output from a run is “Nov 16 12:21:05.545 Test Started”. Note because this is a format specifier, all of the supported format style works without additional work. That’s not very useful with a timestamp but consider printing the epoch time. Again, the generator is defined.

  return swoc::bwf::Format_Integer(w, spec, std::chrono::system_clock::to_time_t(std::chrono::system_clock::now()), false);
}

BufferWriter &
BWF_Version(BufferWriter &w, Spec const &spec) {

The generator is then assigned to the name “now”.

void
EX_BWF_Format_Init() {
  swoc::bwf::Global_Names().assign("timestamp", &BWF_Timestamp);
  swoc::bwf::Global_Names().assign("now", &BWF_Now);

And used with various styles.

  w.clear();

Sample output from a run is “Time is 1542393187 5bef0d63 5BEF0D63 0x5bef0d63”.

Context Binding Example

Context name binding is useful for front ends to BufferWriter, not for direct use. The expected use case is format string provided by an external agent, with format specifiers to pull data from a context object where explicitly naming the context object isn’t possible. As an example use case consider a Traffic Server plugin that provides a cookie manipulation function. When setting a cookie value, it is useful to access transaction specific data such as the URL, portions of the URL (e.g. the path), HTTP field values, some other cookie item value, etc. This can be provided easily by setting up a context binding which binds a request context, and binds the various names to the appropriate elements in the context.

To start the example, a very simplified context will be used - it is hardwired for comprehensibility, in production code the elements would be initialized for each transaction.

  std::string_view scheme{"http"};
  std::string_view query{"sureness=outofbounds"};
  std::string tls_version{"tls/1.2"};
  std::string ip_family{"ipv4"};
  std::string ip_remote{"172.99.80.70"};
  Fields http_fields = {
    {{"Host", "docs.solidwallofcode.com"},
     {"YRP", "10.28.56.112"},
     {"Connection", "keep-alive"},
     {"Age", "956"},
     {"ETag", "1337beef"}}
  };
  static inline std::string A{"A"};
  static inline std::string alpha{"alpha"};
  static inline std::string B{"B"};
  static inline std::string bravo{"bravo"};
  Fields cookie_fields = {

This holds the interesting information. Next up is a context name binding class that binds an instance of Context. This can be done with the template ContextNames. The template class provides both a map of names to generators and the subclass of NameBinding to pass to the formatter.

  using CookieBinding = swoc::bwf::ContextNames<Context const>;

For each supported name a function is defined to extract that data. For fields and cookies, the extension will hold the field name and so the generator needs to look up the name from the extension in the specifier. The field generators are done as local lambda functions. The other generators are done as in place lambdas, since they simply pass a member of Context to bwformat. In production code this might done with lambdas, or file scope functions, or via methods in Context. For writing the exmaple, lambdas were easiest and so those were used.

First the field generators, as those are more complex.

  auto field_gen = [](BufferWriter &w, Spec const &spec, Context const &ctx) -> BufferWriter & {
    if (auto spot = ctx.http_fields.find(spec._ext); spot != ctx.http_fields.end()) {
      bwformat(w, spec, spot->second);
    } else {
      bwformat(w, spec, NA);
    }
    return w;
  };
  auto cookie_gen = [](BufferWriter &w, Spec const &spec, Context const &ctx) -> BufferWriter & {
    if (auto spot = ctx.cookie_fields.find(spec._ext); spot != ctx.cookie_fields.end()) {
      bwformat(w, spec, spot->second);
    } else {
      bwformat(w, spec, NA);
    }
    return w;
  };

NA is a constant string used to indicate a missing field / cookie.


With the field generators in place, time to hook up the generators. For the direct member ones, just define a lambda in place.

  CookieBinding cb;
  cb.assign("field", field_gen);
  cb.assign("cookie", cookie_gen);
  cb.assign("url",
            [](BufferWriter &w, Spec const &spec, Context const &ctx) -> BufferWriter & { return bwformat(w, spec, ctx.url); });
  cb.assign("scheme",
            [](BufferWriter &w, Spec const &spec, Context const &ctx) -> BufferWriter & { return bwformat(w, spec, ctx.scheme); });
  cb.assign("host",
            [](BufferWriter &w, Spec const &spec, Context const &ctx) -> BufferWriter & { return bwformat(w, spec, ctx.host); });
  cb.assign("path",
            [](BufferWriter &w, Spec const &spec, Context const &ctx) -> BufferWriter & { return bwformat(w, spec, ctx.path); });

In production code, cb would be a process static, initialized at process start up, as the relationship between the names and the generators doesn’t change. Time to try it out.

This test gets the “YRP” field.

  w.print_n(cb.bind(CTX), TextView{"YRP is {field::YRP}, Cookie B is {cookie::B}."});
  REQUIRE(w.view() == "YRP is 10.28.56.112, Cookie B is bravo.");

This test reconstructs the URL without the query parameters.

  w.print_n(cb.bind(CTX), "{scheme}://{host}{path}");
  REQUIRE(w.view() == "http://docs.solidwallofcode.com/libswoc/index.html");

That’s a minimalist approach, using as little additional code as possible. But it’s a bit funky to require the field names in the extension. There are various alternative approaches that could be used. The one considered here is to do more parsing work to make it easier for the users, by making the names more structured in the form “cookie.name” which means the value of the cookie element with the name “name”. The two implementations shown here were chosen to demonstrate features of BufferWriter formatting.

One type of implementation is to change how names are handled by the context binding (example). Note the base formatting logic does not do name look, it only passes the name (embedded in the specifier) to the binding. By subclassing the binding this lookup can be intercepted and done differently, specifically by checking for names of the format “A.B” and using A to select the table in which to lookup B. The other alternative is to change the parsing of the format string so that a field name such as “{cookie.name}” is parsed as if it had been “{cookie::name}” (example). Both of these approaches require understanding the core formatting logic and how to customize it, as explained in Custom Formatting.

Custom Formatting

The internals of BufferWriter formatting are designed to enable using other format syntax. The one described in this document is simply the one implemented by default. Any format which can be used to generate literal output along with instances of bwf::Spec instances can be made to work. Along with support for binding names, this makes it relatively easy to create custom format styles for use in specialized applications, particularly with formatting user input, e.g. for user defined diagnostic messages.

This starts with the BufferWriter::print_nfv() method. This is the formatted output implementation, all of the other variants serving as shims to call this method. The method has three arguments.

names

This is a container for bound names. If a specifier has a name that is not numeric, the specifier is passed to the name binding for output.

ex

The :term:` format extractor`. This is a functor that detects end of input and extracts literals and specifiers. It has two required overloads and one optional.

class Extractor
explicit operator bool() const
Returns:

true if there is more format string to process, otherwise false.

bool operator()(std::string_view &literal, bwf::Spec &spec)
Returns:

true if a specifier was parsed and spec updated, otherwise false.

Extract the next literal and/or specifier. It may be assumed both literal and :arg;`spec` are initialized as if default constructed. If no literal is available literal should be unmodified, otherwise it should be set to the literal. If a specifier is found, spec must be updated to the parsed value of the specifier. If a specifier is found the method must return true otherwise it must return false. The method must always return at least one of literal or spec if the extractor is not empty.

void capture(BufferWriter &w, const bwf::Spec &spec, std::any &&value)

This is an optional method used to capture an argument. A pointer to the argument is placed in value with full type information. The method may generate output but this is not required. If this method is not present and the extractor returns a specifier with the type Spec::CAPTURE_TYPE, an exception will be thrown.

args

A tuple containing the arguments to be formatted.

The formatting logic in BufferWriter::print_nfv() is

title Core Formatting

start
while (ex()) is (not empty)
  :ex(literal, spec);
  if (literal) then (not empty)
    :w.write(literal);
  endif
  if (spec) then (found)
    if (spec._name) then (numeric or empty)
      :format arg[spec];
    else
      :names(spec);
    endif
  endif

endwhile (empty)

stop

If the name in spec is not empty and not numeric, rather than selecting a member of args the specifier is passed to the name binding, which presumably generates the appropriate output. The name is embedded in the specifier spec in the Spec::_name member for use by the name binding. Otherwise, an empty or numeric name means an argument is selected and passed to a bwformat overload, the specific overload selected based on the type of the argument.

For examples of this, the Context Binding Example will be redone in two different ways, each illustrating a different approach to customizing output formatting.

Parsing Example

For this case, the parsing of the format specifier is overridden and if the name is of the form “A.B” it is treated as “A::B”, that is “A” is put in the _name member and “B” is put in the _ext member. Any extension is ignored. In addition, to act more like a Traffic Server plugin (and illustrate how to use alternate specifier formats), the parser requires format specifiers to be of the form “%{name:style}“. A double percent “%%” will mark a percent that is not part of a format specifier.

The first step is to declare a class that will be the extractor functor.

  bool operator()(std::string_view &literal, swoc::bwf::Spec &spec);
  // This holds the format string being parsed.
  TextView _fmt;
};

// Construct by copying a view of the format string.
AltFormatEx::AltFormatEx(TextView fmt) : _fmt{fmt} {}

// The extractor is empty if the format string is empty.
AltFormatEx::operator bool() const {
  return !_fmt.empty();

This will be used only as a temporary passed to BufferWriter::print_nfv() and is therefore always constructed with the format string. The format string left to parse is kept in _fmt which means the empty check is really just a check on that.

    literal = _fmt.take_prefix_at('%');
    if (_fmt.empty()) { // no '%' found, it's all literal, we're done.
      return false;
    }

The function operator, which parses the format string to extract literals and specifiers, is a bit more complex.

    if (_fmt.size() >= 1) { // Something left that's a potential specifier.
      char c = _fmt[0];
      if (c == '%') { // %% -> not a specifier, slap the leading % on the literal, skip the trailing.
        literal = {literal.data(), literal.size() + 1};
        ++_fmt;
      } else if (c == '{') {
        ++_fmt; // drop open brace.
        auto style = _fmt.split_prefix_at('}');
        if (style.empty()) {
          throw std::invalid_argument("Unclosed open brace");
        }
        spec.parse(style);        // stuff between the braces
        if (spec._name.empty()) { // no format args, must have a name to be useable.
          throw std::invalid_argument("No name in specifier");
        }
        // Check for structured name - put the tag in _name and the value in _ext if found.
        TextView name{spec._name};
        auto key = name.split_prefix_at('.');
        if (key) {
          spec._ext  = name;
          spec._name = key;
        }
        return true;
      }
    }
  }
  return false;
}

} // namespace

TEST_CASE("bwf alternate syntax", "[libswoc][bwf][alternate]") {
  using BW       = BufferWriter;
  using AltNames = swoc::bwf::ContextNames<Context>;
  AltNames names;
  Context CTX;
  LocalBufferWriter<256> w;

The rough logic is

  • Search for a ‘%’ - if not found, it’s all literal, return that.

  • Make sure the ‘%’ isn’t ‘%%’ - if it is, need to return just a literal with the leading ‘%’ and skip the trailing ‘%’, doing more parsing on the next call.

  • Check for an open brace, and if found find the close brace, then parse the internals into a specifier. Because the same style format as the default is used, the parser for bwf::Spec can be used. Otherwise if something different were needed that parsing logic would replace

              spec._name = key;
    
  • If a specifier was found, check the name for a period. If found, split it and put the prefix in the name and the suffix in the extension.

      }
      return false;
    }
    
    } // namespace
    

A name binding

  names.assign("scheme", [](BufferWriter &w, Spec const &spec, Context const &ctx) -> BufferWriter & {
    return bwformat(w, spec, ctx.scheme);

is declared and names are assigned in the usual way. In addition to assigning context related names, external generators can also be assigned to the name binding, which can be a useful feature to inject external names in addition to the context specific ones.

  w.clear().print_nfv(names.bind(CTX), AltFormatEx("Width |%{proto:>10}| dig?"));

After that, everything is ready to try it out.

  w.clear().print_nfv(names.bind(CTX), AltFormatEx("I hear %{dave} wants to see YRP=%{field.YRP} and cookie A is %{cookie.A}"));
  REQUIRE(w.view() == "I hear Evil Dave wants to see YRP=10.28.56.112 and cookie A is alpha");
}

/** C / printf style formatting for BufferWriter.
 *
 * This is a wrapper style class, it is not for use in a persistent context. The general use pattern
 * will be to pass a temporary instance in to the @c BufferWriter formatting. E.g
 *
 * @code
 * void bwprintf(BufferWriter& w, TextView fmt, arg1, arg2, arg3, ...) {
 *   w.print_v(C_Format(fmt), std::forward_as_tuple(args));

Name Binding Example

Another approach is to override how name lookup is done in the binding. Because the field handling will be done in the override, methods are added to the Context to do the generation for structured names, rather than placing that logic in the binding.

    void
    field_gen(BufferWriter &w, Spec const &spec, TextView const &field) const {
      if (auto spot = http_fields.find(field); spot != http_fields.end()) {
        bwformat(w, spec, spot->second);
      } else {
        bwformat(w, spec, NA);
      }
    };

    void
    cookie_gen(BufferWriter &w, Spec const &spec, TextView const &tag) const {
      if (auto spot = cookie_fields.find(tag); spot != cookie_fields.end()) {
        bwformat(w, spec, spot->second);
      } else {
        bwformat(w, spec, NA);
      }
    };

  } CTX;

  // Container for name bindings.
  // Override the name lookup to handle structured names.

Next a subclass of ContextNames is created which binds to a ExContext object.

  public:
    // Intercept name dispatch to check for structured names and handle those. If not structured,
    // chain up to super class to dispatch normally.
    BufferWriter &

Inside the class the function operator is overloaded to handle name look up.

      TextView name{spec._name};
      TextView key = name.split_prefix_at('.');
      if (key == FIELD_TAG) {
        ctx.field_gen(w, spec, name);
      } else if (key == COOKIE_TAG) {
        ctx.cookie_gen(w, spec, name);
      } else if (!key.empty()) {
        // error case - unrecognized prefix
        w.print("!{}!", name);
      } else { // direct name, do normal dispatch.
        this->super_type::operator()(w, spec, ctx);
      }
      return w;
    }
  };

  // Hook up the generators.
  CookieBinding cb;
  cb.assign("url",
            [](BufferWriter &w, Spec const &spec, Context const &ctx) -> BufferWriter & { return bwformat(w, spec, ctx.url); });

The incoming name is taken from the specifier and split on a period. If that yields a non-empty result it is checked against the two valid structure names and the appropriate method on ExContext called to generate the output. Otherwise the normal name look up is done to find the direct access generators.

An instance is constructed and the direct access names assigned

            [](BufferWriter &w, Spec const &spec, Context const &ctx) -> BufferWriter & { return bwformat(w, spec, ctx.host); });
  cb.assign("path",
            [](BufferWriter &w, Spec const &spec, Context const &ctx) -> BufferWriter & { return bwformat(w, spec, ctx.path); });
  cb.assign("version", BWF_Version);

  w.print_n(cb.bind(CTX), "B cookie is {cookie.B}");
  REQUIRE(w.view() == "B cookie is bravo");
  w.clear();
  w.print_n(cb.bind(CTX), "{scheme}://{host}{path}");
  REQUIRE(w.view() == "http://docs.solidwallofcode.com/libswoc/index.html");

and it’s time to try it out.

  w.print_n(cb.bind(CTX), "Version is {version}");
  REQUIRE(w.view() == "Version is 1.0.2");
  w.clear();
  w.print_n(cb.bind(CTX), "Potzrebie is {field.potzrebie}");
  REQUIRE(w.view() == "Potzrebie is N/A");
  w.clear();
  w.print_n(cb.bind(CTX), "Align: |{host:<30}|");
  REQUIRE(w.view() == "Align: |docs.solidwallofcode.com      |");
  w.clear();
  w.print_n(cb.bind(CTX), "Align: |{host:>30}|");
  REQUIRE(w.view() == "Align: |      docs.solidwallofcode.com|");
};

namespace {
// Alternate format string parsing.
// This is the extractor, an instance of which is passed to the formatting logic.
struct AltFormatEx {

This tests structured names, direct access names, external names (“version”), and some formatting.

C Style

The formatting is sufficiently flexible to emulate C style or “printf” formatting. Given that a major motivation for this work was the inadequacy of C style formatting, it’s a bit odd to have this example but it was done to show that even emulating printf, it’s still better. I must note this, although this works reasonably well, it’s still an example and not suitable for production code. There are still some edge cases not handled, but as an proof of concept it’s not worth fixing every detail.

The first step is creating a format extractor, since the format string syntax is completley different from the default. This is done by creating a class to perform the extraction and hold state, although it will only be used as a temporary passed to BufferWriter::print_nfv(). The state is required to track “captured” arguments. These are used to emulate the ‘*’ marker for integers in format specifiers, which indicate their value is in an argument, not the format string. This can be done both for maximum size and precision, so both of the must be capturable. The basic logic is to keep a bwf::Spec in the class to hold the captured values, along with flags indicating the capture state (it may be necessary to do two captures, if both the maximum size and precision are variable).

class C_Format {
public:
  /// Construct for @a fmt.
  C_Format(TextView const &fmt);

  /// Check if there is any more format to process.
  explicit operator bool() const;

  /// Get the next pieces of the format.
  bool operator()(std::string_view &literal, Spec &spec);

  /// Capture an argument use as a specifier value.
  void capture(BufferWriter &w, Spec const &spec, std::any const &value);

protected:
  TextView _fmt;        // The format string.
  Spec _saved;          // spec for which the width and/or prec is needed.
  bool _saved_p{false}; // flag for having a saved _spec.
  bool _prec_p{false};  // need the precision captured?
};

The empty indicator needs to be a bit different in that even if the format is empty, if the last part of the format string had a capture (indicated by _saved_p being true) a non-empty state needs to be returned to get an invocation to output that last specifier.

inline C_Format::operator bool() const {
  return _saved_p || !_fmt.empty();
}

The capture logic takes advantage of the fact that only integers can be captured, and in fact printf itself requires exactly an int. This logic is a bit more flexible, accepting unsigned and size_t also, but otherwise is fairly restrictive. It should also generate an error instead of silently returning on a bad type, but you can’t have everything.

void
C_Format::capture(BufferWriter &, Spec const &spec, std::any const &value) {
  unsigned v;
  if (typeid(int *) == value.type())
    v = static_cast<unsigned>(*std::any_cast<int *>(value));
  else if (typeid(unsigned *) == value.type())
    v = *std::any_cast<unsigned *>(value);
  else if (typeid(size_t *) == value.type())
    v = static_cast<unsigned>(*std::any_cast<size_t *>(value));
  else
    return;

  if (spec._ext == "w")
    _saved._min = v;
  if (spec._ext == "p") {
    _saved._prec = v;
  }
}

The set up for the capture passes the capture element in the extension of the return specifier, which this logic checks to know where to stash the captured value.

The actual parsing logic will be skipped - it’s in the example file ex_bw_format.cc in the function operator method.

464bool

This handles all the basics of C style formatting including sign control, minimum and maximum widths, precision, and leading radix support. One thing of note is that integer size indicators (such as “l’ in “%ld”) are ignored - the type is known, therefore the sizing information is redundant at best and wrong at worst, so it is parsed and discarded. If a capture is needed, state is set the extrator instance and the specifier type is set to bwf::Spec::CAPTURE_TYPE which will cause the formatting logic to call the extractor method capture with the corresponding argument. The specifier name is always empty, as strict in order processing is mandatory.

Some example uses, along with verification of the results.

  LocalBufferWriter<256> w;

  bwprintf(w.clear(), "Fifty Six = %d", 56);
  REQUIRE(w.view() == "Fifty Six = 56");
  bwprintf(w.clear(), "int is %i", 101);
  REQUIRE(w.view() == "int is 101");
  bwprintf(w.clear(), "int is %zd", 102);
  REQUIRE(w.view() == "int is 102");
  bwprintf(w.clear(), "int is %ld", 103);
  REQUIRE(w.view() == "int is 103");
  bwprintf(w.clear(), "int is %s", 104);
  REQUIRE(w.view() == "int is 104");
  bwprintf(w.clear(), "int is %ld", -105);
  REQUIRE(w.view() == "int is -105");

  TextView digits{"0123456789"};
  bwprintf(w.clear(), "Chars |%*s|", 12, digits);
  REQUIRE(w.view() == "Chars |  0123456789|");
  bwprintf(w.clear(), "Chars %.*s", 4, digits);
  REQUIRE(w.view() == "Chars 0123");
  bwprintf(w.clear(), "Chars |%*.*s|", 12, 5, digits);
  REQUIRE(w.view() == "Chars |       01234|");

Summary

These example show that changing the format style and/or syntax can be done with relatively little code. Even the C style formatting takes less than 100 lines of code to be mostly complete, even though it can’t take advantage of the parsing in bwf::Spec and handle captures. This makes using BufferWriter formatting in existing projects with already defined syntax which is not the same as the default a low hurdle to get over.

Design Notes

This is essentially my own work but I want to call out Uthira Mohan, who was there at the start of what became BufferWriter formatting, a joint quicky project to play with variadic templates and formatting. This code is based directly on that project, rather excessively extended, as is my wont. Alan Wang contributed the floating point support, along with useful comments on the code and API while he was an intern. Thanks, Uthira and Alan!

Type safe formatting has two major benefits -

  • No mismatch between the format specifier and the argument. Although some modern compilers do better at catching this at run time, there is still risk (especially with non-constant format strings) and divergence between operating systems such that there is no universally correct choice. In addition the number of arguments can be verified to be correct which is often useful.

  • Formatting can be customized per type or even per partial type (e.g. T* for generic T). This enables embedding common formatting work in the format system once, rather than duplicating it in many places (e.g. converting enum values to names). This makes it easier for developers to make useful error messages. See this example for more detail.

As a result of these benefits there has been other work on similar projects, to replace printf a better mechanism. Unfortunately most of these are rather project specific and don’t suit the use case in Traffic Server. The two best options, Boost.Format and fmt, while good, are also not quite close enough to outweight the benefits of a version specifically tuned for Traffic Server. Boost.Format is not acceptable because of the Boost footprint. fmt has the problem of depending on C++ stream operators and therefore not having the required level of performance or memory characteristics. Its main benefit, of reusing stream operators, doesn’t apply to Traffic Server because of the nigh non-existence of such operators. The possibility of using C++ stream operators was investigated but changing those to use pre-existing buffers not allocated internally was very difficult, judged worse than building a relatively simple implementation from scratch. The actual core implementation of formatted output for BufferWriter is not very large - most of the overall work will be writing formatters, work which would need to be done in any case but in contrast to current practice, only done once.

This code has under gone multiple large scale revisions, some driven by use (the most recent only triggered by trying to write the examples in this document and finding some rough edges) and others by a need for additional functionality (the format extractor support). I think it’s close to its final form and I am quite pleased with it. The most recent revisions to the alternate formatting support have made it rather simple to retrofit this work in to existing / legacy applications. I do expect to have some ongoing work on the documentation, which I consider currently basically a first pass.