I am a Staff Research Scientist at Visa Research where I work on Cryptography and Secure Computation. My primary research interest is the development and applications of efficient methods for computing on encrypted data. Most notably is the notion of being able to compute on encrypted data and the area of research known as Secure Computation:

A group of distrusting parties wish to compute a function on their private data sets. The parties should learn the output of the function but no additional information about the data sets themselves without using a trusted third party.

Below is a collection of my recent blog posts and publications.


Back Porting C++20 Coroutines to C++14 -- A Tutorial

This post gives another perspective on the C++20 coroutines by presenting a small library that emulates them using C++14 and a couple macros to aid readability. For example, the following two functions g20(), g14() both implement the same coroutine functionality. While the C++20 version is clearly a bit simpler, both are functionally equivalent and sufficiently readable. Next, we will unpack what the ‘‘coroutine transformation’’ is and how it can [partially] be implemented without specialized compiler support.

my_future<int>  h();
 
// C++20 version
my_future<intg20() {
    int i = co_await h();
    if (i == 42)
        i = 0;
    co_return i;
}
 
// C++14 version
my_future<intg14() {
    // Specify the return type and lambda capture and local variables.
    CO_BEGIN(my_future<int>, i = int{});
 
    // assign the result of h() to i
    CO_AWAIT(i, h());
    if (i == 42)
        i = 0;
    CO_RETURN(i);
 
    CO_END();
}

Due to certain limitations of macros, etc., all local variables that span a CO_XXXX macro must be “lambda captured” in the CO_BEGIN macro. We will discuss this more below but CO_BEGIN must be passed the return type followed by the body of a lambda capture as if the rest of the code is in a lambda that can not have local variables. For example, CO_END(my_future<int>, i = int{}); loosely gets translated into

[i = int{}] ()->my_future<int> {
    //<body-of-the-coroutine>
}

This likely gives the reader a hint at how we will port C++20 coroutines back to C++14.

Coroutine Usage

First, let us review some basic coroutine functionality. If the reader is familiar with coroutines feel free to skip to the next section. Coroutines can be thought of as a stateful function that can “pause” its execution and then be resumed later by the caller or someone else.

co_await

There are a couple of situations where a coroutine might be paused, or “suspend” in coroutine terminology. The most common is when the coroutine executes a co_await call of some awaitable (e.g. some other coroutine) and the result is not ready yet. For example, in the functions above we call co_await h(). The function h() returns a my_future<int> which implements the interface of an “awaitable” with value type int. This interface allows the compiler to auto generate some code that determines if the underlying value is ready now or if the current coroutine should suspend while we wait for h() to complete.

To make the example a bit more concrete, h() could be fetching a value from some server on the internet which is slow. While we wait, we want to pause this coroutine and allow the caller to go do something else. Once h() finishes, our coroutine can be resumed either by h(), by some other thread or we can be notified in some way and manually resume the coroutine. The how/when/who resumes the coroutine can be customized by our return type my_future<int> and what we are co_awaiting on.

If the underlying int that is provided by h() is ready immediately, our coroutine does not need to suspend. Whenever the underlying value is ready, either immediately after the call to h() or when the coroutine is resumed, execution of the coroutine continues. The next thing to be executed is compiler generated code that passes the value held in the my_future<int> to the current coroutine, i.e. assigns it to int i in our examples. More details on this compiler generated code later. Finally, execution of the coroutine continues with the next statement after the co_await.

co_yield

Another example of when a coroutine can be suspended is with the co_yield keyword. The exact semantics of this is customizable but typically it is used to suspend the current coroutine and pass a value back to the caller. This is similar to returning a value but we are allowed to resume the coroutine later. Typically, the caller will use this value and then resume the coroutine which can co_yield another value back to the caller or run to completion. A classic example of this is to lazily generate a range of values. For example, range20, range14 both return a my_range<int> which allows the caller to iterate over the values that are co_yield out of the coroutine. This example prints the string 2,3,4,5,6,7,8,9,.

my_range<intrange20(int beginint end) {
    for (int i = begini < end; ++i)
        co_yield i;
}
my_range<intrange14(int beginint end) {
    CO_BEGIN(my_range<int>, beginend, i = int{});
    for (i = begin; i < end; ++i)
        CO_YIELD(i);
    CO_END();
}
 
void example() {
    for (int i : range14(2, 10))
        std::cout << i << ',';
}

Importantly, the return type of the coroutine, in this case my_future<int>, must implement a certain interface in order for the coroutine body to interact with it. In brief, this interface consists of the return type having an associated promise_type which implements certain functions. For co_yield <expr>; statements, the <expr>; value is forwarded to promise_type::yield_value(<expr>) which, in our example, somehow forwards it to the return value my_range<int> so that the caller can consume it. Similar “customization points” are required for determining how the other situations are handled.

co_return

To return a value from a coroutine we must use co_return instead of normal return. This special return keyword helps the compiler know that the current function is a coroutine as opposed to a normal function. In particular, from the function signature my_future<int> g20() there is no way to tell if the body of the function is coroutine or a normal function that just happens to return a my_future<int>.

More generally, whenever the compiler sees at least one co_await, co_yield or co_return keyword, it will convert the function body into a coroutine by performing the “coroutine transformation.” Importantly, this transformation is just an implementation detail. The caller does not know or care if my_future<int> g20() is a coroutine or implements the required logic in some other way.

Next, we will discuss precisely that, how to implement the coroutine transformation manually and then introduce the macros you see above to aid readability.

The Coroutine Transformation

At the core of the coroutine specification i11111ds a relatively mechanical transformation. Every coroutine function is effectively transformed into two functions. The first (outer function) performs the following:

  • Constructs the so-called promise type <function-return-type>::promise_type promise. The role of the promise type is to decide when to start and/or resume the coroutine and what to do when it yields a value or returns.
  • Construct the return type <function-return-type> which in our first example is my_future<int>. This is obtained by calling promise.get_return_object(). This will be what is returned to the caller.
  • Query the promise_type and decide if the coroutine body should be started now (before returning to the caller) or later (possibly by the caller). This is determined by the result of promise.initial_suspend(). If we should start now then we call the second function which contains the user code.
  • Finally, we return the result of promise.get_return_object() to the caller.

The second function (inner function) of the transformation will contain the actual body of the coroutine that the user wrote. Each time co_await, co_yield or co_return is seen, code is inserted to perform the required logic and allow the coroutine to be suspended/resumed at this location. These are called suspension points.

Below is some C++ inspired pseudocode that loosely implements the required logic which we explain next. The outer function g_inlined is intended to have the same behavior as g20(), g14() above.

my_future<intg_inline()
{
    // Construct the coroutine frame and promise
    struct coroutine_frame {
        my_future<int>::promise_type promise;
        std::function<void(void)> inner_fn;
        void* suspend_address = &InitialSuspend;
    };
    auto frame = new coroutine_frame;
 
    // define the inner function
    frame->inner_fn = [frame]() {
        // jump to wherever we left off.
        goto frame->suspend_address;
InitialSuspend:
        // <---- beginning of user code. ---->
 
        // int i = co_await h();
        auto awaiter_h = h();
        if (!awaiter_h.await_ready()) {
            frame->suspend_address = &Suspend_1;
            awaiter_h.await_suspend(frame);
            return;
        }
Suspend_1:
        int i = awaiter_h.await_resume();
 
        // normal code
        if (i == 42)
            i = 0;
 
        // co_return i;
        frame->suspend_address = &FinalSuspend;
        frame->promise.return_value(i);
 
        // <---- end of user code. ---->
        // final suspend
        if (!promise.final_suspend().await_ready()) {
            frame->suspend_address = &FinalSuspend;
            return;
        }
FinalSuspend:
        delete frame;
    };
 
    my_future<intret = frame->promise.get_return_object();
 
    // call the inner function if we are ready.
    if (frame->promise.initial_suspend().await_ready())
        frame->inner_fn();
 
    return ret;
}

First, the so-called coroutine_frame is allocated. The frame will contain the promise_type and space for the local variables used in the inner function, which for clarity is being managed using std::function<void(void)>.

This inner function can be called many times with the desired logic being that the function should resume from wherever it left off. The code above achieves this using a hypothetical syntax for storing the addresses of goto labels. In particular, each time we enter the inner function we will perform goto suspend_address where suspend_address is the location in the function that is just after where we suspended last.

The first time we enter the function suspend_address will have the initial value of InitialSuspend: and begin executing the user code. At some point the execution will hit a co_* statement, in this case int i = co_await h(). This statement will be transformed as shown above to first query if the value return by auto awaiter_h = h() is ready to be consumed. This is achieved by calling awaiter_h.await_ready().

If awaiter_h.await_ready() returns true then we need to suspend the coroutine. This is achieved by updating the suspend_address to be just after the current co_await statement and calling awaiter_h.await_suspend(frame) which gives awaiter_h the responsibility to resume this coroutine whenever the value does become available. Finally, we return from the inner function, in this case back to the outer function and then to initial caller. Through some other mechanism, awaiter_h will eventually have a value (or exception) and the inner function will be resumed. This time the goto suspend_address statement will jump us back to just after the return that was taken last time, i.e. Suspend_1.

Alternatively, awaiter_h.await_ready() could have returned true and the program would similarly be at the Suspend_1 location. Regardless, the result of co_await h() statement is ready and can be obtained by calling awaiter_h.await_resume().

The co_return i statement is handled by calling promise.return_value(i); and then immidiately proceeding to the so-called final_suspend. The role of final_suspend is to determine if the current coroutine_frame which contains the user’s promise_type is safe to be deleted. Often it is preferred to not delete the frame at this time and instead allow the caller/consumer to delete the frame once they have obtained the result, i.e. i in this case.

Issues and Solutions

The code given above should give you an intuition about what the compiler is doing. However, there are several issues in emulating as described above.

goto suspent_point

First, is that standard C++ does not allow you to take the address of a goto label. To address this the goto statements can be replaced with a switch statement. The function body would then look like

{
    switch (frame->suspend_point)
    {
    case SuspensionPoint::InitialSuspend:
        // int i = co_await h();
        auto awaiter_h = h();
        if (!awaiter_h.await_ready()) {
            frame->suspend_point = (SuspendPoint)1;
            awaiter_h.await_suspend(frame);
            return;
        }
    case (SuspensionPoint)1:
        int i = awaiter_h.await_resume();
 
        //...
 
    case SuspensionPoint::FinalSuspend:
        delete frame;
    }
}

Using a switch statement is almost functionally equivalent. The main downside is that users can not use switch statements across co_* keywords (looking forward CO_* macros). Interestingly, switch statements allow for case locations that are in a nested scope, e.g. using can suspend in the middle of a for-loop and the switch statement can jump back to it upon resume.

Local variables

The second issue regards saving the state of local variables between suspend and resume calls. For example, if before co_await h() the user has constructed some local variable and we suspended for h(), then when the function is resumed that local variable would be uninitialized.

The compiler handles this in the transformation by moving any declaration of local variables into the coroutine_frame as opposed to being the body of our inner function. They are then destroyed when the coroutine completes.

While one can manually perform the transformation, we will instead consider an alternative solution. In particular, we will place/require all local variables to be declared in the lambda capture. While this results in slightly different semantics, it will enable us to write the CO_BEGIN macro.

A similar issue regards the lifetime of the awaiter when the coroutine is suspended, i.e. awaiter_h in the example above. This can be resolved by storing the awaiter either in a fixed sized buffer in coroutine_frame or allocating it on the heap if it is too large to fit in the buffer.

std::coroutine_handle<promise_type>

The final issue regards compatibility with C++20 coroutines. Ideally we can choose to implement our coroutines manually using the pattern outlined here or use the C++20 keywords. However, the primary barrier in achieving this is that C++20 defines <awaiter>.await_suspend as taking a std::coroutine_handle<promise_type> as input while our example takes <function-name>::coroutine_frame*.

Intuitively, these two parameters represent the same thing. Both are a pointer to the coroutine frame and provide a way for the awaiter to resume this coroutine once the thing being awaited becomes ready. However, the way we resume the <function-name>::coroutine_frame* vs a C++20 coroutine are concretely different. Moreover, there is no way to convert our <function-name>::coroutine_frame* into a std::coroutine_handle<promise_type> since we can not modify that type.

This issue could be resolved by giving promise_type another customization point which std::coroutine_handle<promise_type> interacts with and then dispatches to the correct method for resuming to coroutine. However, this is not part of the standard.

Instead, we will define our own macoro::coroutine_handle<promise_type> which can hold either a std::coroutine_handle<promise_type> or a <function-name>::coroutine_frame*. As such, our coroutines will only work with a promise_type that can take a macoro::coroutine_handle<*> as input to <awaiter>.await_suspend(…).

This is somewhat unfortunate because thirdparty awaitable types which are intended for C++20 will most likely not work out of the box. However, this is probably unavoidable given that they will likely take other dependencies to C++20.

Allocations

The current strategy requires several allocations. First, we must allocate the coroutine_frame. C++20 defines a way to allow the user to control where this allocation happens. A description of it can be found in Lewis Baker’s excellent blog posts. A similar strategy can be applied to our framework with some modifications.

Another allocation likely occurs for the std::function<void(void)>. However, this is not required. We can instead define the function

template<typename lambda_typetypename promise_type>
coroutine_frame<lambda_typepromise_type>* makeFrame(lambda_type&& inner_fn);

that returns a frame that stores the inner function with type lambda_type inside the coroutine_frame. Then the inner function will take a pointer to the frame as its only parameter.

The final allocation is with respect to the storage of any awaiters that are produced during an active co_await. As suggested previously, these can be partially handled by having an additional buffer within the coroutine_frame. However, it is still possible that the coroutine body awaits an awaiter that does not fit in the buffer and then an allocation would occur. However, it is possible for this to result in a compile time error. This can then be combined with allowing the buffer size to be a template parameter, i.e. an input to CO_BEGIN.

Combining these we could obtain a coroutine that is guaranteed to not to allocation any memory, something that C++20 coroutines can not currently achieve 100 percent of the time.

Macros for Readability

Using the pattern above we can define the macros shown in the beginning. A prototype of these ideas can be found in the macoro repo. Below is a simplified definition of the main macros. In summary, CO_BEGIN begins the call to makeFrame that allocates the frame with the inner function being stored inline. The closing brace of the lambda and makeFrame function is in CO_END. The first time we enter the inner function we finish the initial suspend (which was started in the outer function in CO_END) by calling await_resume and destroying the awaiter.

CO_AWAIT first computes the type of the awaiter and then constructs it within the frame (or on the heap). It then performs the await_ready/await_suspend/await_resume procedure. If it does suspend, it updates the suspend point with the line number.

CO_END performs the catch block, passing any exception to the promise and then performs the final suspend. This completes the definition of the inner function. CO_END then completes the outer function by constructing the return object and starting the initial suspend. If ready we start the coroutine by calling the inner function and then return the return type.

CO_BEGIN(return_type, …)

using promise_type = return_type::promise_type;
using Handle = coroutine_handle<promise_type>;
// make frame with inline body.
auto frame = makeFrame<promise_type>(
    [__VA_ARGS__](FrameBase<promise_type>* framemutable {
 
        promise_typepromise = frame->promise;
        try {
            // jump to where we suspended
            switch (frame->getSuspendPoint())
            {
            case SuspensionPoint::InitialSuspend:
            { // ------- initial suspend continued -------
                using Awaiter = typename awaiter_for<promise_type,
                    decltype(promise.initial_suspend())>;
                frame->getAwaiter<Awaiter>().await_resume();
                frame->destroyAwaiter<Awaiter>();
            } // -------- end of initial suspend ---------
            // ---------- beginning of user code -----------

CO_AWAIT(RETURN_VAL, EXPR)

{
    using Awaiter = awaiter_for<promise_typedecltype(EXPR)>;
    {
        autoawaiter = frame->constructAwaiter(EXPR);
        if (!awaiter.await_ready()) {
            // suspend-coroutine
            frame->setSuspendPoint((SuspensionPoint)__LINE__);
 
            // call awaiter.await_suspend(). If it's void return, then return true.
            if (await_suspend(awaiterHandle::from_promise(promise)))
                return;
        }
    }
    // resume-point, __LINE__ will be the same in macro.
case SuspensionPoint(__LINE__):
    RETURN_VAL = frame->getAwaiter<Awaiter>().await_resume();
    frame->destroyAwaiter<Awaiter>();
}

CO_END()

            // ---------- end of user code -----------
            default:
            }
        }
        catch (...) {
            frame->promise.unhandled_exception();
        }
 
        // ---------- beginning of final suspend -----------
        using Awaiter = typename awaiter_for<promise_typedecltype(promise.final_suspend())>;
        autoawaiter = frame->constructAwaiter(promise.final_suspend());
        auto handle = Handle::from_promise(promise);
        if (!awaiter.await_ready()) {
            // suspend-coroutine
            frame->setSuspendPoint(SuspensionPoint::FinalSuspend);
            // call awaiter.await_suspend(). If it's void return, then return true.
            if (await_suspend(awaiterhandle))
                return;
        }
        frame->getAwaiter<Awaiter>().await_resume();
        frame->destroyAwaiter<Awaiter>();
        handle.destroy();
        // ---------- end of final suspend -----------
    });
 
promise_typepromise = frame->promise;
auto ret = promise.get_return_object();
 
// ------- beginning of initial suspend ----------
using Awaiter = awaiter_for<promise_typedecltype(promise.initial_suspend())>;
autoawaiter = frame->constructAwaiter(promise.initial_suspend());
auto handle = Handle::from_promise(promise);
if (!awaiter.await_ready())
{
    // suspend-coroutine
    frame->setSuspendPoint(SuspensionPoint::InitialSuspend);
    //call awaiter.await_suspend(). If it's void return, then return true
    if (await_suspend(awaiterhandle))
        return ret;
}
 
/*begin coroutine*/
(*frame)(frame);
return ret;

result< T, E >: Seamless error_code's with C++ Coroutines

The Idea

The other day I was reviewing some code when I saw the use of a macro to simplify handling an error_code like type. I’m sure most of us have seen code like this

// Either return the error or assign 
// the value type of rexpr to lhs.
#define ASSIGN_OR_RETURN(lhs, rexpr) ...

result<int, error_code> foo() { ... }

result<bool, error_code> bar() {
  ASSIGN_OR_RETURN(int i, foo());
  return i == 3;
}

In this example result<T,E> is a discriminated union / sum type, it will either hold the expected type T, or in case of an error, it will hold an E. The macro ASSIGN_OR_RETURN returns the error type if the error type is active and otherwise assigns the value type to i.

I think it’s pretty obvious why the use of the macro is less than desirable.

More …

Fast Database Joins for Secret Shared Data

Payman Mohassel, Peter Rindal & Mike Rosulek ~ eprint/2019/518

We present a scalable database join protocol for secret shared data in the honest majority three party setting. The key features of our protocol are a rich set of SQL-like join/select queries and the ability to compose join operations together due to the inputs and outputs being generically secret shared between the parties. Given that the keys being joined on are unique, no information is revealed to any party during the protocol. In particular, not even the sizes of intermediate joins are revealed. All of our protocols are constant-round and achieve O(n) communication and computation overhead for joining two tables of n

rows.

In addition to performing database joins our protocol, we implement two applications on top of our framework. The first performs joins between different governmental agencies to identify voter registration errors in a privacy-preserving manner. The second application considers the scenario where several organizations wish to compare network security logs to more accurately identify common security threats, e.g. the IP addresses of a bot net. In both, cases the practicality of these applications depends on efficiently performing joins on millions of secret shared records. For example, our three party protocol can perform a join on two sets of 1 million records in 4.9 seconds or, alternatively, compute the cardinality of this join in just 3.1 seconds.

Efficient Two-Round OT Extension and Silent Non-Interactive Secure Computation

Elette Boyle, Geoffroy Couteau, Niv Gilboa, Yuval Ishai, Lisa Kohl, Peter Rindal, Peter Scholl ~ eprint/2019/706 ~ CCS’19

We consider the problem of securely generating useful instances of two-party correlations, such as many independent copies of a random oblivious transfer (OT) correlation, using a small amount of communication. This problem is motivated by the goal of secure computation with silent preprocessing, where a low-communication input-independent setup, followed by local (“silent”) computation, enables a lightweight “non-cryptographic” online phase once the inputs are known.

Recent works of Boyle et al. (CCS 2018, Crypto 2019) achieve this goal with good concrete efficiency for useful kinds of two-party correlations, including OT correlations, under different variants of the Learning Parity with Noise (LPN) assumption, and using a small number of “base” oblivious transfers. The protocols of Boyle et al. have several limitations. First, they require a large number of communication rounds. Second, they are only secure against semi-honest parties. Finally, their concrete efficiency estimates are not backed by an actual implementation. In this work we address these limitations, making three main contributions:

  • Eliminating interaction. Under the same assumption, we obtain the first concretely efficient 2-round protocols for generating useful correlations, including OT correlations, in the semi-honest security model. This implies the first efficient 2-round OT extension protocol of any kind and, more generally, protocols for non-interactive secure computation (NISC) that are concretely efficient and have the silent preprocessing feature.

  • Malicious security. We provide security against malicious parties (in the random oracle model) without additional interaction and with only a modest concrete overhead; prior to our work, no similar protocols were known with any number of rounds.

  • Implementation. Finally, we implemented, optimized, and benchmarked our 2-round OT extension protocol, demonstrating that it offers a more attractive alternative to the OT extension protocol of Ishai et al. (Crypto 2003) in many realistic settings.

Endemic Oblivious Transfer

Daniel Masny & Peter Rindal ~ eprint/2019/706 ~ CCS’19

Oblivious Transfer has played a crucial role in the design of secure multi party computation. Nevertheless, there are not many practical solutions that achieve simulation based security and at the same time instantiable based on different assumptions.

In this work, we consider a simulation based security notion that we call endemic security. We show how to construct highly efficient oblivious transfer in the random oracle model that achieves endemic security under a wide range of assumptions, among them DDH, CDH, LWE and coding based assumptions. We construct a secure oblivious transfer based on DDH that takes only a single communication round which allows significant performance gains. We also instantiate our oblivious transfer with the Crystals.Kyber key agreement. Our implementation shows that both instantiations can be computed in under one millisecond.

Further, we revisit, correct and improve existing oblivious transfer extension techniques. We provide an implementation of an oblivious transfer extension protocol in the ideal cipher model that is actively secure, processing up to 23 million OTs per second and up to 10 times faster than previous secure implementations. We also show that our framework can compute endemically secure OT extension and the base OTs in just two rounds.