BlackBox2C — Quickstart¶
BlackBox2C converts any trained scikit-learn model into a minimal if-else function ready to run on a microcontroller — with zero runtime dependencies.
This notebook gets you from a trained model to deployable embedded code in under 5 minutes.
Installation¶
pip install blackbox2c
On Google Colab, run the cell below:
In [1]:
Copied!
# Uncomment on Colab
# !pip install blackbox2c -q
# Uncomment on Colab
# !pip install blackbox2c -q
1. Train any scikit-learn model¶
In [2]:
Copied!
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
iris.data, iris.target, test_size=0.3, random_state=42, stratify=iris.target
)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
print(f"Model accuracy: {model.score(X_test, y_test):.4f}")
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
iris.data, iris.target, test_size=0.3, random_state=42, stratify=iris.target
)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
print(f"Model accuracy: {model.score(X_test, y_test):.4f}")
Model accuracy: 0.8889
2. Convert to C with one call¶
In [3]:
Copied!
from blackbox2c import convert
c_code = convert(
model,
X_train,
X_test=X_test,
feature_names=["sepal_length", "sepal_width", "petal_length", "petal_width"],
class_names=["setosa", "versicolor", "virginica"],
)
print(c_code)
from blackbox2c import convert
c_code = convert(
model,
X_train,
X_test=X_test,
feature_names=["sepal_length", "sepal_width", "petal_length", "petal_width"],
class_names=["setosa", "versicolor", "virginica"],
)
print(c_code)
Starting conversion for model: RandomForestClassifier
Task: Classification, Features: 4, Classes: 3, Max depth: 5
[1/4] Extracting surrogate decision tree...
Surrogate fidelity: 0.9778
[2/4] Optimizing decision rules...
Nodes: 47, Leaves: 29, Depth: 5
[3/4] Generating C code...
[4/4] Estimating code size...
Estimated FLASH: 382 bytes, RAM: 32 bytes
[OK] Conversion complete!
/*
* Auto-generated C code by BlackBox2C
*
* Model Information:
* - Input features: 4
* * - Output classes: 3
* - Precision: 8-bit
* - Fixed-point: No
*
* This code is optimized for embedded systems with limited resources.
*/
#include <stdint.h>
/* Class labels */
#define SETOSA 0
#define VERSICOLOR 1
#define VIRGINICA 2
/* Prediction function */
uint8_t predict(float features[4]) {
if (features[2] <= 2.485839f) {
if (features[3] <= 0.717061f) {
return 0;
} else {
if (features[1] <= 3.143289f) {
if (features[0] <= 5.550231f) {
return 0;
} else {
if (features[3] <= 1.630129f) {
return 1;
} else {
return 2;
}
}
} else {
if (features[2] <= 2.432759f) {
return 0;
} else {
if (features[3] <= 1.602186f) {
return 1;
} else {
return 0;
}
}
}
}
} else {
if (features[3] <= 1.687708f) {
if (features[3] <= 0.698930f) {
if (features[2] <= 4.947937f) {
if (features[1] <= 3.690938f) {
return 1;
} else {
return 0;
}
} else {
return 0;
}
} else {
if (features[2] <= 4.950762f) {
return 1;
} else {
if (features[0] <= 6.299237f) {
return 1;
} else {
return 2;
}
}
}
} else {
if (features[2] <= 4.844587f) {
if (features[0] <= 6.045236f) {
if (features[2] <= 4.814570f) {
return 1;
} else {
return 2;
}
} else {
return 2;
}
} else {
if (features[3] <= 1.699546f) {
if (features[0] <= 5.970683f) {
return 1;
} else {
return 2;
}
} else {
return 2;
}
}
}
}
}
/*
* Usage Example:
*
* float input[4] = {...}; // Your feature values
* uint8_t result = predict(input);
*
* Input features: sepal_length, sepal_width, petal_length, petal_width
* Output classes: setosa, versicolor, virginica
*/
That's it! The generated function has this signature:
int predict(float features[4]);
// returns: 0=setosa, 1=versicolor, 2=virginica
3. Understanding the output¶
The generated C code is:
- Self-contained — no includes, no external libraries
- Zero-allocation — no heap usage, safe for bare-metal systems
- Standard C99 — compiles on any toolchain (GCC, Clang, MSVC, AVR-GCC, arm-none-eabi-gcc)
What happened under the hood?¶
Since RandomForest is a black-box ensemble, BlackBox2C:
- Generated synthetic samples around the decision boundaries
- Used the RF's predictions as labels to train a surrogate
DecisionTree - Converted that single decision tree to C if-else logic
The fidelity metric tells you how closely the surrogate matches the original model.
4. Check conversion metrics¶
In [4]:
Copied!
from blackbox2c import Converter, ConversionConfig
config = ConversionConfig(max_depth=5, optimize_rules="medium")
converter = Converter(config)
c_code = converter.convert(
model,
X_train,
X_test=X_test,
feature_names=["sepal_length", "sepal_width", "petal_length", "petal_width"],
class_names=["setosa", "versicolor", "virginica"],
)
metrics = converter.get_metrics()
print(f"Fidelity: {metrics['fidelity']:.4f} (surrogate vs original agreement)")
print(f"Flash estimate: {metrics['size_estimate']['flash_bytes']} bytes")
print(f"Tree depth: {metrics['complexity']['max_depth']}")
print(f"Decision nodes: {metrics['complexity']['n_internal_nodes']}")
from blackbox2c import Converter, ConversionConfig
config = ConversionConfig(max_depth=5, optimize_rules="medium")
converter = Converter(config)
c_code = converter.convert(
model,
X_train,
X_test=X_test,
feature_names=["sepal_length", "sepal_width", "petal_length", "petal_width"],
class_names=["setosa", "versicolor", "virginica"],
)
metrics = converter.get_metrics()
print(f"Fidelity: {metrics['fidelity']:.4f} (surrogate vs original agreement)")
print(f"Flash estimate: {metrics['size_estimate']['flash_bytes']} bytes")
print(f"Tree depth: {metrics['complexity']['max_depth']}")
print(f"Decision nodes: {metrics['complexity']['n_internal_nodes']}")
Starting conversion for model: RandomForestClassifier Task: Classification, Features: 4, Classes: 3, Max depth: 5 [1/4] Extracting surrogate decision tree... Surrogate fidelity: 0.9778 [2/4] Optimizing decision rules... Nodes: 47, Leaves: 29, Depth: 5 [3/4] Generating C code... [4/4] Estimating code size... Estimated FLASH: 382 bytes, RAM: 32 bytes [OK] Conversion complete! Fidelity: 0.9778 (surrogate vs original agreement) Flash estimate: 382 bytes Tree depth: 5 Decision nodes: 18
5. Use the generated code on a microcontroller¶
Save the output to a .c file and include it in your firmware:
// main.c
#include "iris_model.c"
void classify_iris(void) {
float features[4];
features[0] = read_sensor(SEPAL_LENGTH); // e.g. 5.1
features[1] = read_sensor(SEPAL_WIDTH); // e.g. 3.5
features[2] = read_sensor(PETAL_LENGTH); // e.g. 1.4
features[3] = read_sensor(PETAL_WIDTH); // e.g. 0.2
int result = predict(features);
// result: 0=setosa, 1=versicolor, 2=virginica
}
Next Steps¶
| Notebook | Topic |
|---|---|
| 02 — Classification | Multiple model types, config comparison |
| 03 — Regression | Regression models, fixed-point arithmetic |
| 04 — Feature Analysis | Sensor reduction with sensitivity analysis |
| 05 — Multi-Format Export | C++, Arduino, MicroPython |
| 06 — End-to-End IoT | Real dataset, full production pipeline |