Icicle C++ Usage Guide
Overview
This guide covers the usage of ICICLE's C++ API, including device management, memory operations, data transfer, synchronization, and compute APIs.
Device Management
See all ICICLE runtime APIs in runtime.h
Loading a Backend
The backend can be loaded from a specific path or from an environment variable. This is essential for setting up the computing environment.
#include "icicle/runtime.h"
eIcicleError result = icicle_load_backend_from_env_or_default();
// or load from custom install dir
eIcicleError result = icicle_load_backend("/path/to/backend/installdir", true);
Setting and Getting Active Device
You can set the active device for the current thread and retrieve it when needed:
icicle::Device device = {"CUDA", 0}; // or other
eIcicleError result = icicle_set_device(device);
// or query current (thread) device
eIcicleError result = icicle_get_active_device(device);
Setting and Getting the Default Device
You can set the default device for all threads:
icicle::Device device = {"CUDA", 0}; // or other
eIcicleError result = icicle_set_default_device(device);
Setting a default device should be done once from the main thread of the application. If another device or backend is needed for a specific thread icicle_set_device should be used instead.
Querying Device Information
Retrieve the number of available devices and check if a pointer is allocated on the host or on the active device:
int device_count;
eIcicleError result = icicle_get_device_count(device_count);
bool is_host_memory;
eIcicleError result = icicle_is_host_memory(ptr);
bool is_device_memory;
eIcicleError result = icicle_is_active_device_memory(ptr);
Memory Management
Allocating and Freeing Memory
Memory can be allocated and freed on the active device:
void* ptr;
eIcicleError result = icicle_malloc(&ptr, 1024); // Allocate 1024 bytes
eIcicleError result = icicle_free(ptr); // Free the allocated memory
Asynchronous Memory Operations
You can perform memory allocation and deallocation asynchronously using streams:
icicleStreamHandle stream;
eIcicleError err = icicle_create_stream(&stream);
void* ptr;
err = icicle_malloc_async(&ptr, 1024, stream);
err = icicle_free_async(ptr, stream);
Querying Available Memory
Retrieve the total and available memory on the active device:
size_t total_memory, available_memory;
eIcicleError err = icicle_get_available_memory(total_memory, available_memory);
Setting Memory Values
Set memory to a specific value on the active device, synchronously or asynchronously:
eIcicleError err = icicle_memset(ptr, 0, 1024); // Set 1024 bytes to 0
eIcicleError err = icicle_memset_async(ptr, 0, 1024, stream);
Data Transfer
Copying Data
Data can be copied between host and device, or between devices. The location of the memory is inferred from the pointers:
eIcicleError result = icicle_copy(dst, src, size);
eIcicleError result = icicle_copy_async(dst, src, size, stream);
Explicit Data Transfers
To avoid device-inference overhead, use explicit copy functions:
eIcicleError result = icicle_copy_to_host(host_dst, device_src, size);
eIcicleError result = icicle_copy_to_host_async(host_dst, device_src, size, stream);
eIcicleError result = icicle_copy_to_device(device_dst, host_src, size);
eIcicleError result = icicle_copy_to_device_async(device_dst, host_src, size, stream);
Stream Management
Creating and Destroying Streams
Streams are used to manage asynchronous operations:
icicleStreamHandle stream;
eIcicleError result = icicle_create_stream(&stream);
eIcicleError result = icicle_destroy_stream(stream);
Synchronization
Synchronizing Streams and Devices
Ensure all previous operations on a stream or device are completed before proceeding:
eIcicleError result = icicle_stream_synchronize(stream);
eIcicleError result = icicle_device_synchronize();
Device Properties
Checking Device Availability
Check if a device is available and retrieve a list of registered devices:
icicle::Device dev;
eIcicleError result = icicle_is_device_available(dev);
Querying Device Properties
Retrieve properties of the active device:
DeviceProperties properties;
eIcicleError result = icicle_get_device_properties(properties);
/******************/
// where DeviceProperties is
struct DeviceProperties {
bool using_host_memory; // Indicates if the device uses host memory
int num_memory_regions; // Number of memory regions available on the device
bool supports_pinned_memory; // Indicates if the device supports pinned memory
// Add more properties as needed
};
Compute APIs
Multi-Scalar Multiplication (MSM) Example
Icicle provides high-performance compute APIs such as the Multi-Scalar Multiplication (MSM) for cryptographic operations. Here's a simple example of how to use the MSM API.
#include <iostream>
#include "icicle/runtime.h"
#include "icicle/api/bn254.h"
using namespace bn254;
int main()
{
// Load installed backends
icicle_load_backend_from_env_or_default();
// trying to choose CUDA if available, or fallback to CPU otherwise (default device)
const bool is_cuda_device_available = (eIcicleError::SUCCESS == icicle_is_device_available("CUDA"));
if (is_cuda_device_available) {
Device device = {"CUDA", 0}; // GPU-0
ICICLE_CHECK(icicle_set_device(device)); // ICICLE_CHECK asserts that the api call returns eIcicleError::SUCCESS
} // else we stay on CPU backend
// Setup inputs
int msm_size = 1024;
auto scalars = std::make_unique<scalar_t[]>(msm_size);
auto points = std::make_unique<affine_t[]>(msm_size);
projective_t result;
// Generate random inputs
scalar_t::rand_host_many(scalars.get(), msm_size);
projective_t::rand_host_many(points.get(), msm_size);
// (optional) copy scalars to device memory explicitly
scalar_t* scalars_d = nullptr;
auto err = icicle_malloc((void**)&scalars_d, sizeof(scalar_t) * msm_size);
// Note: need to test err and make sure no errors occurred
err = icicle_copy(scalars_d, scalars.get(), sizeof(scalar_t) * msm_size);
// MSM configuration
MSMConfig config = default_msm_config();
// tell icicle that the scalars are on device. Note that EC points and result are on host memory in this example.
config.are_scalars_on_device = true;
// Execute the MSM kernel (on the current device)
eIcicleError result_code = msm(scalars_d, points.get(), msm_size, config, &result);
// OR call bn254_msm(scalars_d, points.get(), msm_size, config, &result);
// Free the device memory
icicle_free(scalars_d);
// Check for errors
if (result_code == eIcicleError::SUCCESS) {
std::cout << "MSM result: " << projective_t::to_affine(result) << std::endl;
} else {
std::cerr << "MSM computation failed with error: " << get_error_string(result_code) << std::endl;
}
return 0;
}
Polynomial Operations Example
Here's another example demonstrating polynomial operations using Icicle:
#include <iostream>
#include "icicle/runtime.h"
#include "icicle/polynomials/polynomials.h"
#include "icicle/api/bn254.h"
using namespace bn254;
// define bn254Poly to be a polynomial over the scalar field of bn254
using bn254Poly = Polynomial<scalar_t>;
static bn254Poly randomize_polynomial(uint32_t size)
{
auto coeff = std::make_unique<scalar_t[]>(size);
for (int i = 0; i < size; i++)
coeff[i] = scalar_t::rand_host();
return bn254Poly::from_rou_evaluations(coeff.get(), size);
}
int main()
{
// Load backend and set device
icicle_load_backend_from_env_or_default();
// trying to choose CUDA if available, or fallback to CPU otherwise (default device)
const bool is_cuda_device_available = (eIcicleError::SUCCESS == icicle_is_device_available("CUDA"));
if (is_cuda_device_available) {
Device device = {"CUDA", 0}; // GPU-0
ICICLE_CHECK(icicle_set_device(device)); // ICICLE_CHECK asserts that the API call returns eIcicleError::SUCCESS
} // else we stay on CPU backend
int poly_size = 1024;
// build domain for ntt is required for some polynomial ops that rely on ntt
ntt_init_domain(scalar_t::omega(12), default_ntt_init_domain_config());
// randomize polynomials f(x),g(x) over the scalar field of bn254
bn254Poly f = randomize_polynomial(poly_size);
bn254Poly g = randomize_polynomial(poly_size);
// Perform polynomial multiplication
auto result = f * g; // Executes on the current device
ICICLE_LOG_INFO << "Done";
return 0;
}
In this example, the polynomial multiplication is used to perform polynomial multiplication on CUDA or CPU, showcasing the flexibility and power of Icicle's compute APIs.
Error Handling
Checking for Errors
Icicle APIs return an eIcicleError
enumeration value. Always check the returned value to ensure that operations were successful.
if (result != eIcicleError::SUCCESS) {
// Handle error
}
This guide provides an overview of the essential APIs available in Icicle for C++. The provided examples should help you get started with integrating Icicle into your high-performance computing projects.