Skip to content

Commit 77cb855

Browse files
committed
Improve DSL kernels: add syntax diagnostics, enable zero-input shape-based kernels, and expand tutorial coverage
1 parent a5ab2e4 commit 77cb855

10 files changed

Lines changed: 470 additions & 15 deletions

File tree

CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ endif()
6464

6565
FetchContent_Declare(miniexpr
6666
GIT_REPOSITORY https://github.com/Blosc/miniexpr.git
67-
GIT_TAG 1bd8d0cfe92b63ad463cd28783e824b5e64afea8
67+
GIT_TAG 24c8ce8d02ff0d6f52c29ebc9406215a7b81607b
6868
# SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR}/../miniexpr
6969
)
7070
FetchContent_MakeAvailable(miniexpr)

doc/getting_started/overview.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,8 @@ and tools in the Python ecosystem, including:
3131
* Excellent integration with Numba and Cython via
3232
`User Defined
3333
Functions <https://www.blosc.org/python-blosc2/getting_started/tutorials/03.lazyarray-udf.html>`_.
34+
* DSL kernels for miniexpr-backed UDF authoring and validation (see
35+
`this tutorial <https://www.blosc.org/python-blosc2/getting_started/tutorials/03.lazyarray-udf-kernels.html>`_).
3436
* By making use of the simple and open
3537
`C-Blosc2 format <https://github.com/Blosc/c-blosc2/blob/main/README_FORMAT.rst>`_
3638
for storing compressed data, Python-Blosc2 facilitates seamless integration with many other

doc/getting_started/tutorials.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ Tutorials
88
tutorials/01.ndarray-basics
99
tutorials/02.lazyarray-expressions
1010
tutorials/03.lazyarray-udf
11+
tutorials/03.lazyarray-udf-kernels
1112
tutorials/04.reductions
1213
tutorials/05.persistent-reductions
1314
tutorials/06.remote_proxy
Lines changed: 265 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,265 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "c87d8acac9166018",
6+
"metadata": {},
7+
"source": [
8+
"# LazyArray UDF DSL Kernels\n",
9+
"\n",
10+
"`@blosc2.dsl_kernel` lets you write kernels with Python function syntax while executing through the miniexpr DSL path.\n",
11+
"\n",
12+
"Use DSL kernels when you want:\n",
13+
"\n",
14+
"- A vectorized UDF model (operate over NDArray chunks/blocks, not Python scalar loops)\n",
15+
"- Optional JIT compilation via miniexpr backends (for example `tcc`/`cc`) without requiring Numba\n",
16+
"- Early syntax validation and actionable diagnostics for unsupported constructs\n",
17+
"\n",
18+
"This tutorial complements `03.lazyarray-udf.ipynb` (generic Python UDFs).\n",
19+
"\n",
20+
"For the canonical DSL syntax contract, see the miniexpr docs: `doc/dsl-syntax.md`.\n"
21+
]
22+
},
23+
{
24+
"cell_type": "code",
25+
"execution_count": 1,
26+
"id": "4743791e5436aa04",
27+
"metadata": {
28+
"ExecuteTime": {
29+
"end_time": "2026-02-16T05:32:35.309530Z",
30+
"start_time": "2026-02-16T05:32:35.071164Z"
31+
}
32+
},
33+
"outputs": [],
34+
"source": [
35+
"import numpy as np\n",
36+
"\n",
37+
"import blosc2"
38+
]
39+
},
40+
{
41+
"cell_type": "markdown",
42+
"id": "c400c3d7e37cda03",
43+
"metadata": {},
44+
"source": [
45+
"## 1. Define a DSL Kernel\n",
46+
"\n",
47+
"A valid DSL kernel can be used with `blosc2.lazyudf(...)` like a regular UDF."
48+
]
49+
},
50+
{
51+
"cell_type": "code",
52+
"execution_count": 2,
53+
"id": "8926a0c21237fef3",
54+
"metadata": {
55+
"ExecuteTime": {
56+
"end_time": "2026-02-16T05:32:35.322622Z",
57+
"start_time": "2026-02-16T05:32:35.311059Z"
58+
}
59+
},
60+
"outputs": [],
61+
"source": [
62+
"@blosc2.dsl_kernel\n",
63+
"def kernel_index_ramp(x):\n",
64+
" # _i* and _n* are reserved DSL index/shape symbols, so disable linter warnings\n",
65+
" return _i0 * _n1 + _i1 # noqa: F821"
66+
]
67+
},
68+
{
69+
"cell_type": "code",
70+
"execution_count": 3,
71+
"id": "fbe9cb59a4515c9c",
72+
"metadata": {
73+
"ExecuteTime": {
74+
"end_time": "2026-02-16T05:32:35.365979Z",
75+
"start_time": "2026-02-16T05:32:35.333433Z"
76+
}
77+
},
78+
"outputs": [
79+
{
80+
"data": {
81+
"text/plain": [
82+
"array([[ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],\n",
83+
" [10., 11., 12., 13., 14., 15., 16., 17., 18., 19.],\n",
84+
" [20., 21., 22., 23., 24., 25., 26., 27., 28., 29.],\n",
85+
" [30., 31., 32., 33., 34., 35., 36., 37., 38., 39.],\n",
86+
" [40., 41., 42., 43., 44., 45., 46., 47., 48., 49.]], dtype=float32)"
87+
]
88+
},
89+
"execution_count": 3,
90+
"metadata": {},
91+
"output_type": "execute_result"
92+
}
93+
],
94+
"source": [
95+
"shape = (5, 10)\n",
96+
"x = blosc2.zeros(shape, dtype=np.float32)\n",
97+
"expr = blosc2.lazyudf(kernel_index_ramp, (x,), dtype=np.float32)\n",
98+
"res = expr[:]\n",
99+
"res"
100+
]
101+
},
102+
{
103+
"cell_type": "code",
104+
"execution_count": 4,
105+
"id": "3bcf440eef3435f4",
106+
"metadata": {
107+
"ExecuteTime": {
108+
"end_time": "2026-02-16T05:32:36.250173Z",
109+
"start_time": "2026-02-16T05:32:36.234923Z"
110+
}
111+
},
112+
"outputs": [
113+
{
114+
"data": {
115+
"text/plain": [
116+
"array([[ 0., 1., 2., 3., 4.],\n",
117+
" [10., 11., 12., 13., 14.]], dtype=float32)"
118+
]
119+
},
120+
"execution_count": 4,
121+
"metadata": {},
122+
"output_type": "execute_result"
123+
}
124+
],
125+
"source": [
126+
"# Optional: request miniexpr JIT backend for this DSL kernel\n",
127+
"expr_jit = blosc2.lazyudf(\n",
128+
" kernel_index_ramp,\n",
129+
" (x,),\n",
130+
" dtype=x.dtype,\n",
131+
" jit=True,\n",
132+
" jit_backend=\"tcc\",\n",
133+
")\n",
134+
"res_jit = expr_jit.compute()\n",
135+
"res_jit[:2, :5]"
136+
]
137+
},
138+
{
139+
"cell_type": "markdown",
140+
"id": "2539c7b3c5c828e3",
141+
"metadata": {},
142+
"source": [
143+
"## 2. Preflight Validation (`validate_dsl`)\n",
144+
"\n",
145+
"You can validate a kernel and inspect diagnostics without executing it."
146+
]
147+
},
148+
{
149+
"cell_type": "code",
150+
"execution_count": 5,
151+
"id": "e408f3ced12bb48e",
152+
"metadata": {
153+
"ExecuteTime": {
154+
"end_time": "2026-02-16T05:32:36.435536Z",
155+
"start_time": "2026-02-16T05:32:36.402775Z"
156+
}
157+
},
158+
"outputs": [
159+
{
160+
"data": {
161+
"text/plain": [
162+
"{'valid': True,\n",
163+
" 'dsl_source': 'def kernel_index_ramp(x):\\n # _i* and _n* are reserved DSL index/shape symbols, so disable linter warnings\\n return _i0 * _n1 + _i1 # noqa: F821',\n",
164+
" 'input_names': ['x'],\n",
165+
" 'error': None}"
166+
]
167+
},
168+
"execution_count": 5,
169+
"metadata": {},
170+
"output_type": "execute_result"
171+
}
172+
],
173+
"source": [
174+
"report_ok = blosc2.validate_dsl(kernel_index_ramp)\n",
175+
"report_ok"
176+
]
177+
},
178+
{
179+
"cell_type": "markdown",
180+
"id": "f62d5a74a417eb12",
181+
"metadata": {},
182+
"source": [
183+
"## 3. Invalid Syntax Example\n",
184+
"\n",
185+
"Python ternary expressions are not part of the DSL subset.\n",
186+
"`validate_dsl` reports the issue, and `lazyudf(...)` raises early with a detailed message."
187+
]
188+
},
189+
{
190+
"cell_type": "code",
191+
"execution_count": 6,
192+
"id": "2cfb6d28ee3cf2d8",
193+
"metadata": {
194+
"ExecuteTime": {
195+
"end_time": "2026-02-16T05:32:36.497700Z",
196+
"start_time": "2026-02-16T05:32:36.475885Z"
197+
}
198+
},
199+
"outputs": [
200+
{
201+
"name": "stdout",
202+
"output_type": "stream",
203+
"text": [
204+
"False\n",
205+
"Ternary expressions are not supported in DSL; use where(cond, a, b) at line 2, column 14\n",
206+
"\n",
207+
"DSL kernel source:\n",
208+
"1 | def kernel_invalid_ternary(x):\n",
209+
"2 | return 1 if x else 0\n",
210+
" | ^\n",
211+
"\n",
212+
"See: https://github.com/Blosc/miniexpr/blob/main/doc/dsl-usage.md\n"
213+
]
214+
}
215+
],
216+
"source": [
217+
"@blosc2.dsl_kernel\n",
218+
"def kernel_invalid_ternary(x):\n",
219+
" return 1 if x else 0\n",
220+
"\n",
221+
"\n",
222+
"report_bad = blosc2.validate_dsl(kernel_invalid_ternary)\n",
223+
"print(report_bad[\"valid\"])\n",
224+
"print(report_bad[\"error\"])"
225+
]
226+
},
227+
{
228+
"cell_type": "markdown",
229+
"id": "d8c345f8091b1078",
230+
"metadata": {},
231+
"source": [
232+
"## 4. Advanced Example: Mandelbrot DSL\n",
233+
"\n",
234+
"For a more advanced real-world DSL kernel, see:\n",
235+
"\n",
236+
"- `examples/ndarray/mandelbrot-dsl.ipynb`\n",
237+
"\n",
238+
"GitHub link:\n",
239+
"\n",
240+
"- https://github.com/Blosc/python-blosc2/blob/main/examples/ndarray/mandelbrot-dsl.ipynb"
241+
]
242+
}
243+
],
244+
"metadata": {
245+
"kernelspec": {
246+
"display_name": "Python 3",
247+
"language": "python",
248+
"name": "python3"
249+
},
250+
"language_info": {
251+
"codemirror_mode": {
252+
"name": "ipython",
253+
"version": 3
254+
},
255+
"file_extension": ".py",
256+
"mimetype": "text/x-python",
257+
"name": "python",
258+
"nbconvert_exporter": "python",
259+
"pygments_lexer": "ipython3",
260+
"version": "3.11"
261+
}
262+
},
263+
"nbformat": 4,
264+
"nbformat_minor": 5
265+
}

doc/reference/lazyarray.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ You can get an object following the LazyArray API with any of the following ways
1010
* Any expression that involves one or more NDArray objects. e.g. ``a + b``, where ``a`` and ``b`` are NDArray objects (see `this tutorial <../getting_started/tutorials/03.lazyarray-expressions.html>`_).
1111
* Using the ``lazyexpr`` constructor.
1212
* Using the ``lazyudf`` constructor (see `a tutorial <../getting_started/tutorials/03.lazyarray-udf.html>`_).
13+
* Using ``@dsl_kernel`` and ``lazyudf`` for miniexpr-backed DSL kernels (see `this tutorial <../getting_started/tutorials/03.lazyarray-udf-kernels.html>`_).
1314

1415
The LazyArray object is a thin wrapper around the expression or user-defined function that allows for lazy computation. This means that the expression is not computed until the ``compute`` or ``__getitem__`` methods are called. The ``compute`` method will return a new NDArray object with the result of the expression evaluation. The ``__getitem__`` method will return an NumPy object instead.
1516

@@ -53,3 +54,16 @@ For getting a LazyUDF object (which is LazyArray-compliant) from a user-defined
5354
This object follows the `LazyArray`_ API for computation, although storage is not supported yet.
5455

5556
.. autofunction:: lazyudf
57+
58+
.. _DSLKernelReference:
59+
60+
DSL Kernels
61+
-----------
62+
63+
For miniexpr-backed kernels, see `the dedicated tutorial <../getting_started/tutorials/03.lazyarray-udf-kernels.html>`_.
64+
65+
.. autofunction:: dsl_kernel
66+
67+
.. autofunction:: validate_dsl
68+
69+
.. autoclass:: DSLSyntaxError

src/blosc2/__init__.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -503,7 +503,7 @@ def _raise(exc):
503503

504504
from .c2array import c2context, C2Array, URLPath
505505

506-
from .dsl_kernel import DSLKernel, dsl_kernel
506+
from .dsl_kernel import DSLSyntaxError, DSLKernel, dsl_kernel, validate_dsl
507507
from .lazyexpr import (
508508
LazyExpr,
509509
lazyudf,
@@ -687,6 +687,7 @@ def _raise(exc):
687687
"Filter",
688688
"LazyArray",
689689
"DSLKernel",
690+
"DSLSyntaxError",
690691
"LazyExpr",
691692
"LazyUDF",
692693
"NDArray",
@@ -805,6 +806,7 @@ def _raise(exc):
805806
"jit",
806807
"lazyexpr",
807808
"dsl_kernel",
809+
"validate_dsl",
808810
"lazyudf",
809811
"lazywhere",
810812
"less",

0 commit comments

Comments
 (0)