Coroutines in C++/Boost (2)

Also see my previous article: Coroutines in C++/Boost.

C++ finally has a native implementation in C++20. The principal difference between coroutines and routines is that a coroutine enables explicit suspend and resume of its progress via additional operations by preserving execution state and thus provides an enhanced control flow (maintaining the execution context).

1. Asymmetric vs Symmetric

From boost:

An asymmetric coroutine knows its invoker, using a special operation to implicitly yield control specifically to its invoker.

By contrast, all symmetric coroutines are equivalent; one symmetric coroutine may pass control to any other symmetric coroutine. Because of this, a symmetric coroutine must specify the coroutine to which it intends to yield control.

So C++20 coroutines are asymmetric ones. A coroutine only knows its parent. With the dependency, symmetric coroutines can be chained, just like a normal function calls another one. No goto semantics as with a symmetric one.

C++23 generators are also asymmetric. They are resumed repeatedly to generate a series of return values.

2. Stackless vs Stackful

Again From boost:

In contrast to a stackless coroutine, a stackful coroutine can be suspended from within a nested stackframe. Execution resumes at exactly the same point in the code where it was suspended before.

With a stackless coroutine, only the top-level routine may be suspended. Any routine called by that top-level routine may not itself suspend. This prohibits providing suspend/resume operations in routines within a general-purpose library.

Well, these two are confusing. Tutorials and Blogs have different description. To make it simple, if there is await/yield definition, it’s stackless. Then if there is something called Fiber in the language, it’s stackful.

Fibers are just like threads, they can be suspended at any stackframe. While await/yield is used as a suspend point, a stackless coroutine can only suspend at exactly that point.

A stackless coroutine shares a default stack among all the coroutines, while a stackful coroutine assigns a separate stack to each coroutine. With stackless coroutine, the code is transformed into event handlers at compile time, and driven by an event engine at run time, i.e. the scheduler of stackless coroutine. Transferring control of CPU to a stackless coroutine is merely a function call with an argument pointing to its context. Conversely, transferring CPU control to a stackful coroutine requires a context switch.

Here’s a summary of how coroutine is implemented in most popular programming languages.

Language Stackful coroutines (Fibers) Stackless coroutines (await/yield)
Java (Y2023) Virtual threads in Java 21 n/a
C n/a n/a
C++ n/a (Y2020) co_await, co_yield, co_return in C++ 20
Python n/a (Y2015) async, await/yield in Python 3.5
C# n/a (Y2012) async, await/yield in C# 5.0
Javascript n/a (Y2017) async, await/yield in ES 2017
PHP (Y2021) Fiber in PHP 8.1 n/a
Go (Y2012) Goroutine in Go 1.0
(Y2020) asynchronously preemptible in 1.14
n/a
Objective-C n/a n/a
Swift n/a (Y2021) async, await/yield in Swift 5.5
Rust n/a (Y2019) async, await in Rust 1.39

Reference

Boost.Coroutine2
Fibers under the magnifying glass
Stackful Coroutine Made Fast

Coroutines in C++/Boost

Starting with 1.56, boost/asio provides asio::spawn() to work with coroutines. Just paste the sample code here, with minor modifications:

The Python in my previous article can be used to work with the code above. I also tried to write a TCP server with only boost::coroutines classes. select() is used, since I want the code to be platform independent. NOTE: with coroutines, we have only _one_ thread.

Basic Usage of Boost MultiIndex Containers

Just take a simple note here.
The Boost Multi-index Containers Library provides a class template named multi_index_container which enables the construction of containers maintaining one or more indices with different sorting and access semantics.

Output:

To use with pointer values, only limited change needed as highlighted:

Spurious Wakeups

http://vladimir_prus.blogspot.com/2005/07/spurious-wakeups.html

One of the two basic synchronisation primitives in multithreaded programming is called “condition variables”. Here’s a small example:

Here, the call to “c.wait()” unlocks the mutex (allowing the other thread to eventually lock it), and suspends the calling thread. When another thread calls ‘notify’, the first thread wakes up, locks the mutex again (implicitly, inside ‘wait’), sees that variable is set to ‘true’ and goes on.

But why do we need the while loop, can’t we write:

We can’t. And the killer reason is that ‘wait’ can return without any ‘notify’ call. That’s called spurious wakeup and is explicitly allowed by POSIX. Essentially, return from ‘wait’ only indicates that the shared data might have changed, so that data must be evaluated again.

Okay, so why this is not fixed yet? The first reason is that nobody wants to fix it. Wrapping call to ‘wait’ in a loop is very desired for several other reasons. But those reasons require explanation, while spurious wakeup is a hammer that can be applied to any first year student without fail.

The second reason is that fixing this is supposed to be hard. Most sources I’ve seen say that fixing that would require very large overhead on certain architectures. Strangely, no details were ever given, which made me wonder if avoiding spurious wakeups is simple, but all the threading experts secretly decided to tell everybody it’s hard.

After asking on comp.programming.thread, I at least know the reason for Linux (thanks to Ben Hutchings). Internally, wait is implemented as a call to the ‘futex’ system call. Each blocking system call on Linux returns abruptly when the process receives a signal — because calling signal handler from kernel call is tricky. What if the signal handler calls some other system function? And a new signal arrives? It’s easy to run out of kernel stack for a process. Exactly because each system call can be interrupted, when glibc calls any blocking function, like ‘read’, it does it in a loop, and if ‘read’ returns EINTR, calls ‘read’ again.

Can the same trick be used to conditions? No, because the moment we return from ‘futex’ call, another thread can send us notification. And since we’re not waiting inside ‘futex’, we’ll miss the notification(A third thread can get it, and change the value of predicate. — gonwan). So, we need to return to the caller, and have it reevaluate the predicate. If another thread indeed set it to true, we’ll break out of the loop.

So much for spurious wakeups on Linux. But I’m still very interested to know what the original reasons were.

==============================
Also see the explanation for spurious wakeups on the linux man page: pthread_cond_signal.
Last note: PulseEvent() in windows(manual-reset) = pthread_cond_signal() in linux, while SetEvent() in windows(auto-reset) = pthread_cond_broadcast() in linux, see here and here. And spurious wakeups are also possible on windows when using condition variables.

Exception Safety with shared_ptr

Code snippet:

Output:

Exception safety is ensured, when using shared_ptr. Memory allocated by m_a is freed even when an exception is thrown. The trick is: the destructor of class shared_ptr is invoked after the destructor of class C.

Writing UTF-8 String Using ofstream in C++

I’ve googled a lot to find the answer. But none really solve the problem simply and gracefully, even on stackoverflow. So we’ll do ourselves here 🙂

Actually, std::string supports operation using multibytes characters. This is the base of our solution:

g_cs is a Chinese word(“你好” which means hello) encoded in UTF-8. The code works under both Windows(WinXP+VS2005) and Linux(Ubuntu12.04+gcc4.6). You may wanna open a.txt to check whether the string is correctly written.

NOTE: Under Linux, we print the string directly since the default console encoding is UTF-8, and we can view the string. While under Window, the console DOES NOT support UTF-8(codepage 65001) encoding. Printing to it simply causes typo. We just convert it to a std::wstring and use MessageBox() API to check the result. I will cover the encoding issue in windows console in my next post, maybe.

I began to investigate the problem, since I cannot find a solution to read/write a UTF-8 string to XML file using boost::property_tree. Actually, it’s a bug and is already fixed in boost 1.47 and later versions. Unfortunately, Ubuntu 12.04 came with boost 1.46.1. When reading non-ASCII characters, some bytes are incorrectly skipped. The failure function is boost::property_tree::detail::rapidxml::internal::get_index(). My test code looks like:

Almost the same structure with the previous function. And finally the utf8_to_ucs2() function:

Please add header files yourselves to make it compile 🙂

Logging in Multithreaded Environment Using Thread-Local Storage

Generally, A logger is a singleton class. The declaration may look like:

The Init function is used to set log name or maybe other configuration information. And We can use the Write function to write logs.

Well, in a multithreaded environment, locks must be added to prevent concurrent issues and keep the output log in order. And sometimes we want to have separate log configurations. How can we implement it without breaking the original interfaces?

One easy way is to maintain a list of all available Logger instances, so that we can find and use a unique Logger in each thread. The approach is somehow like the one used in log4j. But log4j reads configuration files to initialize loggers, while our configuration information is set in runtime.

Another big issue is that we must add a new parameter to the GetInstance function to tell our class which Logger to return. The change breaks interfaces.

By utilizing TLS (thread-local storage), we can easily solve the above issues. Every logger will be thread-local, say every thread has its own logger instance which is stored in its thread context. Here comes the declaration for our new Logger class, boost::thread_specific_ptr from boost library is used to simplify our TLS operations:

Simply use boost::thread_specific_ptr to wrap the original 2 static variables, and they will be in TLS automatically, that’s all. The implementation:

Our test code:

Output when using the original Logger may look like:

When using the TLS version, it may look like:

Everything is in order now. You may want to know what OS API boost uses to achieve TLS. I’ll show you the details in boost 1.43:

The underlying API is TlsGetValue under windows and pthread_getspecific under *nix platforms.

Smart Pointers in C++0x and Boost (2)

1. Environment

– windows xp
– gcc-4.4
– boost-1.43

2. auto_ptr

A smart pointer is an abstract data type that simulates a pointer while providing additional features, such as automatic garbage collection or bounds checking. There’s auto_ptr in C++03 library for general use. But it’s not so easy to deal with it. You may encounter pitfalls or limitations. The main drawback of auto_ptr is that it has the transfer-of-ownership semantic. I just walk through it. Please read comments in code carefully:

3. unique_ptr

To resolve the drawbacks, C++0x deprecates usage of auto_ptr, and unique_ptr is the replacement. unique_ptr makes use of a new C++ langauge feature called rvalue reference which is similar to our current (left) reference (&), but spelled (&&). GCC implemented this feature in 4.3, but unique_ptr is only available begin from 4.4.

What is rvalue?

rvalues are temporaries that evaporate at the end of the full-expression in which they live (“at the semicolon”). For example, 1729, x + y, std::string(“meow”), and x++ are all rvalues.

While, lvalues name objects that persist beyond a single expression. For example, obj, *ptr, ptr[index], and ++x are all lvalues.

NOTE: It’s important to remember: lvalueness versus rvalueness is a property of expressions, not of objects.

We may have another whole post to address the rvalue feature. Now, let’s take a look of the basic usage. Please carefully reading the comments:

One can ONLY make a copy of an rvalue unique_ptr. This confirms no ownership issues occur like that of auto_ptr. Since temporary values cannot be referenced after the current expression, it is impossible for two unique_ptr to refer to a same pointer. You may also noticed the move function. We will also discuss it in a later post.

Some more snippet:

unique_ptr can hold pointers to an array. unique_ptr defines deleters to free memory of its internal pointer. There are pre-defined default_deleter using delete and delete[](array) for general deallocation. You can also define your customized ones. In addition, a void type can be used.

NOTE: To compile the code, you must specify the -std=c++0x flag.

4. shared_ptr

A shared_ptr is used to represent shared ownership; that is, when two pieces of code needs access to some data but neither has exclusive ownership (in the sense of being responsible for destroying the object). A shared_ptr is a kind of counted pointer where the object pointed to is deleted when the use count goes to zero.

Following snippet shows the use count changes when using shared_ptr. The use count changes from 0 to 3, then changes back to 0:

Snippets showing pointer type conversion:

The void type can be used directly without a custom deleter, which is required in unique_ptr. Actually, shared_ptr has already save the exact type info in its constructor. Refer to source code for details :). And static_pointer_cast function is used to convert between pointer types.

Unlike auto_ptr, Since shared_ptr can be shared, it can be used in STL containers:

NOTE: shared_ptr is available in both TR1 and Boost library. You can use either of them, for their interfaces are compatible. In addition, there are dual C++0x and TR1 implementation. The TR1 implementation is considered relatively stable, so is unlikely to change unless bug fixes require it.

5. weak_ptr

weak_ptr objects are used for breaking cycles in data structures. See snippet:

If we use uncomment to use shared_ptr, head is not freed since there still one reference to it when exiting the function. By using weak_ptr, this code works fine.

6. scoped_ptr

scoped_ptr template is a simple solution for simple needs. It supplies a basic “resource acquisition is initialization” facility, without shared-ownership or transfer-of-ownership semantics.

This class is only available in Boost. Since unique_ptr is already there in C++0x, this class may be thought as redundant. Snippet is also simple:

Complete and updated code can be found on google code host here. I use conditional compilation to swith usage between TR1 and Boost implementation in code. Hope you find it useful.

Smart Pointers in C++0x and Boost

Let clarify some concepts first. What is C++0x? Wikipedia gives some overview here:

C++0x is intended to replace the existing C++ standard, ISO/IEC 14882, which was published in 1998 and updated in 2003. These predecessors are informally but commonly known as C++98 and C++03. The new standard will include several additions to the core language and will extend the C++ standard library, incorporating most of the C++ Technical Report 1 (TR1) libraries — with the exception of the library of mathematical special functions.

Then why it is called C++0x? As Bjarne Stroustrup addressed here:

The aim is for the ‘x’ in C++0x to become ‘9’: C++09, rather than (say) C++0xA (hexadecimal :-).

You may also noticed TR1, also refer here in Wikipedia:

C++ Technical Report 1 (TR1) is the common name for ISO/IEC TR 19768, C++ Library Extensions, which is a document proposing additions to the C++ standard library. The additions include regular expressions, smart pointers, hash tables, and random number generators. TR1 is not a standard itself, but rather a draft document. However, most of its proposals are likely to become part of the next official standard.

You got the relationship? C++0x is the standard adding features to both language and standard library. A large set of TR1 libraries and some additional libraries. For instance, unique_ptr is not defined in TR1, but is included in C++0x.

As of 12 August 2011, the C++0x specification has been approved by the ISO.

Another notable concept is the Boost library. It can be regarded as a portable, easy-to-use extension to the current C++03 standard library. And some libraries like smart pointers, regular expressions have already been included in TR1. You can find license headers regarding the donation of the boost code in libstdc++ source files. While in TR2, some more boost code are to be involved.

TR1 libraries can be accessed using std::tr1 namespace. More info on Wikipedia here:

Various full and partial implementations of TR1 are currently available using the namespace std::tr1. For C++0x they will be moved to namespace std. However, as TR1 features are brought into the C++0x standard library, they are upgraded where appropriate with C++0x language features that were not available in the initial TR1 version. Also, they may be enhanced with features that were possible under C++03, but were not part of the original TR1 specification.

The committee intends to create a second technical report (called TR2) after the standardization of C++0x is complete. Library proposals which are not ready in time for C++0x will be put into TR2 or further technical reports.

The article seems to be a bit too long so far, I decide to give my snippets in a later post.