Skip to content

Commit 5aa808c

Browse files
committed
documentation for rank estimation
1 parent d6b61a2 commit 5aa808c

File tree

4 files changed

+266
-11
lines changed

4 files changed

+266
-11
lines changed

docs/make.jl

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,8 @@ makedocs(
2424
"tutorial/constraints.md",
2525
"tutorial/blockupdateorder.md",
2626
"tutorial/iterationstats.md",
27-
"tutorial/multiscale.md"
27+
"tutorial/multiscale.md",
28+
"tutorial/rankestimation.md"
2829
],
2930
"Reference" => [
3031
"reference/types.md",

docs/src/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Block Tensor Decomposition
1+
# BlockTensorFactorization.jl
22

33
## About this package
44
`BlockTensorFactorization.jl` is a package to factorize tensors. The main feature is its flexibility at decomposing input tensors according to many common tensor models (ex. CP, Tucker) with a number of constraints (ex. nonnegative, simplex).
Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
# Rank Estimation
2+
3+
## Overview
4+
5+
In some applications, we may have a tenor we believe to be closely approximated by a low rank tenor, but do not know what rank to use. In this setting, use `rank_detect_factorize` rather than `factorize` without a `rank=R` keyword. The estimated rank will be added to `kwargs`, and the `decomposition` at this rank returned.
6+
7+
```julia
8+
R = 3
9+
T = Tucker1((10, 10, 10), R)
10+
Y = array(T)
11+
options = (model=Tucker1, curvature_method=splines) # default values
12+
decomposition, stats, kwargs, final_rel_errors = rank_detect_factorize(Y; options...)
13+
14+
@assert kwargs[:rank] == R
15+
```
16+
17+
The rank will be estimated by factorizing the input tensor at every possible rank from 1 up to the maximum rank and recording the relative error. We can pick the rank which balances small error (good fit) and small rank (low complexity). Our criteria is the rank which maximizes the curvature of the error vs. rank function, or in the case of the `:breakpoints` method, a proxy for the curvature.
18+
19+
Note `rank_detect_factorize` also returns the stats and final relative errors of each rank tested.
20+
21+
```@docs; canonical=false
22+
rank_detect_factorize
23+
max_possible_rank
24+
```
25+
26+
!!! tip
27+
If you run `rank_detect_factorize` with different methods and get the same rank, you can feel more confident in the detected rank.
28+
29+
30+
## Estimating Curvature
31+
32+
This package has implemented a few methods for estimating curvature. These are the following list of methods.
33+
34+
```julia
35+
:finite_differences
36+
:circles
37+
:splines
38+
```
39+
40+
We can also use the following curvature-proxy methods.
41+
42+
```julia
43+
:breakpoints
44+
```
45+
46+
They can be passed to `curvature` and `standard_curvature` as a `method` keyword, or `rank_detect_factorize` with the `curvature_method` keyword.
47+
48+
```@docs; canonical=false
49+
curvature
50+
standard_curvature
51+
```
52+
53+
These methods have been specialized to typical relative error vs. rank curves. We assume a relative error of ``1`` at rank ``0`` since the only rank ``0`` tenor is the ``0`` tenor. And that the relative error at the maximum possible rank is ``0``, and stays zero if we try to use a larger rank.
54+
55+
## How do these methods work?
56+
57+
### Finite Differences
58+
59+
We approximate the first and second derivatives separately with three point finite differences and calculate the curvature with the formula
60+
61+
```math
62+
k(x) = \frac{y''(x)}{(1 + y'(x)^2)^{3/2}}.
63+
```
64+
65+
The derivatives are approximated using centred three point finite differences,[^1]
66+
67+
[^1]: M. Abramowitz and I. A. Stegun, "Handbook of mathematical functions: with formulas, graphs and mathematical tables", Unabridged, Unaltered and corr. Republ. of the 1964 ed. in Dover books on advanced mathematics. New York: Dover publ, 1972. (Table 25.2, p. 914)
68+
69+
```math
70+
y'(x_i) \approx \frac{1}{2\Delta x}(y_{i+1} - y_{i-1})\quad\text{and}\quad y''(x_i)\approx \frac{1}{\Delta x^2}(y_{i+1} - 2y_i + y_{i-1}).
71+
```
72+
73+
The end-points use forward/back-differences. For the first derivative, this is
74+
75+
```math
76+
y'(x_1) \approx \frac{1}{2\Delta x}(-3y_{1} + 4y_2 - y_{3})\quad\text{and}\quad y'(x_I) \approx \frac{1}{2\Delta x}(-y_{I-2} + 4y_{I-1} - 3y_{I}),
77+
```
78+
79+
and for the second derivative, this is
80+
81+
```math
82+
y''(x_1) \approx \frac{1}{\Delta x^2}(y_{1} - 2y_2 + y_{3})\quad\text{and}\quad y''(x_I) \approx \frac{1}{\Delta x^2}(y_{I-2} - 2y_{I-1} + y_{I}).
83+
```
84+
85+
```@docs; canonical=false
86+
BlockTensorFactorization.Core.d_dx
87+
BlockTensorFactorization.Core.d2_dx2
88+
```
89+
90+
91+
### Splines
92+
93+
We calculate a third order [spline](https://en.wikipedia.org/wiki/Spline_(mathematics))
94+
95+
```math
96+
g_i(x) = a_i(x-x_i)^3+b_i(x-x_i)^2+c_i(x-x_i)+d_i,\quad x_1 \leq x \leq x_{i+1}
97+
```
98+
99+
and use the coefficients to calculate the curvature of the spline
100+
101+
```math
102+
k(x_i) = \frac{2b_i}{(1+c_i^2)^{3/2}}.
103+
```
104+
105+
We assume the following boundary conditions;
106+
1) ``g_{1}(x_{1}-\Delta x)=1``, ``y``-intercept ``(0,1)``
107+
2) ``g_{I}(x_{I}+\Delta x)=y_{I}``, repeated right end-point (``y_{I+1}=y_{I}``)
108+
3) ``g_{I}''(x_{I}+\Delta x)=0``, flat right end-point,
109+
110+
in addition to the usual continuity and smoothness assumptions of a third order spline:
111+
1) ``g_{i}(x_{i})=y_{i}``
112+
2) ``g_{i}(x_{i+1})=g_{i+1}(x_{i+1})``
113+
3) ``g_{i}'(x_{i+1})=g_{i+1}'(x_{i+1})``
114+
4) ``g_{i}''(x_{i+1})=g_{i+1}''(x_{i+1})``.
115+
116+
We can solve for the coefficients by first solving the tri-diagonal system ``Mb=v``, where
117+
118+
```math
119+
M=\begin{bmatrix}
120+
6 & 0 & & & \\
121+
1 & 4 & 1 & & \\
122+
& \ddots& \ddots& \ddots& \\
123+
& & 1 & 4 & 1& \\
124+
& & & 0 & 1
125+
\end{bmatrix},\quad
126+
b =\begin{bmatrix}
127+
b_{1} \\
128+
b_{2}\\
129+
\vdots \\
130+
b_{I} \\
131+
b_{I+1}
132+
\end{bmatrix},\quad \text{and} \quad
133+
v = \frac{3}{\Delta x^2}\begin{bmatrix}
134+
1-2y_{1}+y_{2} \\
135+
y_{3}-y_{1} \\
136+
\vdots \\
137+
y_{I+1}-y_{I-1} \\
138+
0
139+
\end{bmatrix},
140+
```
141+
142+
and then calculate
143+
144+
```math
145+
a_{i} = \frac{1}{3\Delta x}(b_{i+1}-b_{i})\quad \text{and}\quad c_{i}=\frac{{y_{i+1}-y_{i}}}{\Delta x}-\frac{\Delta x}{3}(b_{i+1}-2b_{i}).
146+
```
147+
148+
```@docs
149+
BlockTensorFactorization.Core.cubic_spline_coefficients
150+
BlockTensorFactorization.Core.spline_mat
151+
BlockTensorFactorization.Core.make_spline
152+
BlockTensorFactorization.Core.d_dx_and_d2_dx2_spline
153+
```
154+
155+
### Circles
156+
157+
We compute the radius ``r_i`` of a circle passing through three neighbouring points ``(x_{i-1}, y_{i-1})``, ``(x_{i}, y_{i})``, and ``(x_{i+1}, y_{i+1})``. The curvature magnitude is approximately the inverse of this radius;
158+
159+
```math
160+
\lvert k(x_i)\rvert \approx \frac{1}{r_i}.
161+
```
162+
163+
We assume two additional points so we can estimate the curvature at the boundary. They are
164+
165+
```math
166+
(x_{0}, y_{0}) = (0, 1)\quad\text{and}\quad (x_{I+1}, y_{I+1}) = (x_{I}+\Delta x, 0).
167+
```
168+
169+
To obtain a the signed curvature ``k(x_i)``, we check if the middle point ``(x_{i}, y_{i})`` is above (negative curvature) or below (positive curvature) the line segment connecting ``(x_{i-1}, y_{i-1})`` and ``(x_{i+1}, y_{i+1})``.
170+
171+
```@docs
172+
BlockTensorFactorization.Core.three_point_circle
173+
BlockTensorFactorization.Core.circle_curvature
174+
BlockTensorFactorization.Core.signed_circle_curvature
175+
```
176+
177+
178+
### Two-line Breakpoint
179+
180+
This method does **not** approximate the curvature of a function passing through the points ``\{(x_i, y_i)\}``, but picks the best rank based on the optimal breakpoint. The optimal breakpoint ``r`` is the breakpoint of segmented linear model that minimizes the model error.[^2] This means
181+
182+
[^2]: J. E. Saylor, K. E. Sundell, and G. R. Sharman, "Characterizing sediment sources by non-negative matrix factorization of detrital geochronological data," Earth and Planetary Science Letters, vol. 512, pp. 46–58, Apr. 2019, doi: [10.1016/j.epsl.2019.01.044](https://doi.org/10.1016/j.epsl.2019.01.044). (Section 3.2)
183+
184+
```math
185+
r = \argmin_{z \in \{1,\dots, R\}} \sum_{i=1}^I (f_z(x_i) - y_i)^2,
186+
```
187+
188+
where
189+
190+
```math
191+
f_z(x; a_z, b_z, c_z) = a_z + b_z(\min(x,z) - z) + c_z(\max(x,z) - z)
192+
```
193+
194+
and the coefficients ``(a_z, b_z, c_z)`` are selected to minimize the error between the model and the data,
195+
196+
```math
197+
(a_z, b_z, c_z) = \argmin_{a,b,c\in\mathbb{R}} \sum_{i=1}^I (f_z(x_i;a,b,c) - y_i)^2.
198+
```
199+
200+
These coefficients have the closed form solution
201+
202+
```math
203+
\begin{bmatrix}
204+
a_{z} \\
205+
b_{z} \\
206+
c_{z}
207+
\end{bmatrix} =(M^\top M)^{-1}M^\top \begin{bmatrix}
208+
y_{1} \\
209+
\vdots \\
210+
y_{I}
211+
\end{bmatrix},\quad\text{where}\quad M = \begin{bmatrix}
212+
1 & \min(x_{1},z) - z&\max(x_{1},z) - z \\
213+
\vdots & \vdots & \vdots\\
214+
1 & \min(x_{I},z) - z&\max(x_{I},z) - z
215+
\end{bmatrix}.
216+
```
217+
218+
Geometrically, the breakpoint model is two half-infinite lines on ``(-\infty, z]`` and ``[z,\infty)`` that meet continuously at ``(z,a)`` with slopes ``b`` and ``c`` respectively.
219+
220+
```@docs
221+
BlockTensorFactorization.Core.best_breakpoint
222+
BlockTensorFactorization.Core.breakpoint_error
223+
BlockTensorFactorization.Core.breakpoint_model
224+
BlockTensorFactorization.Core.breakpoint_model_coefficients
225+
BlockTensorFactorization.Core.breakpoint_curvature
226+
```

src/Core/curvaturetools.jl

Lines changed: 37 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,11 @@
33
"""
44
d_dx(y::AbstractVector{<:Real})
55
6-
Approximate first derivative with finite elements. Assumes y[i] = y(x_i) are samples with unit spaced inputs x_{i+1} - x_i = 1.
6+
Approximate first derivative with finite elements.
7+
8+
``\\frac{d}{dx}y[i] \\approx \\frac{1}{2\\Delta x}(y[i+1] - y[i-1])``
9+
10+
Assumes `y[i] = y(x[i])` are samples with unit spaced inputs `Δx = x[i+1] - x[i] = 1`.
711
"""
812
function d_dx(y::AbstractVector{<:Real})
913
if length(y) < 3
@@ -30,7 +34,11 @@ end
3034
"""
3135
d2_dx2(y::AbstractVector{<:Real})
3236
33-
Approximate second derivative with finite elements. Assumes y[i] = y(x_i) are samples with unit spaced inputs x_{i+1} - x_i = 1.
37+
Approximate second derivative with finite elements.
38+
39+
``\\frac{d^2}{dx^2}y[i] \\approx \\frac{1}{\\Delta x^2}(y[i-1] - 2y[i] + y[i+1])``
40+
41+
Assumes `y[i] = y(x[i])` are samples with unit spaced inputs `Δx = x[i+1] - x[i] = 1`.
3442
"""
3543
function d2_dx2(y::AbstractVector{<:Real})
3644
if length(y) < 3
@@ -50,16 +58,16 @@ end
5058
"""
5159
cubic_spline_coefficients(y::AbstractVector{<:Real}; h=1)
5260
53-
Calculates the list of coefficients `a`, `b`, `c`, `d` for an interpolating spline.
61+
Calculates the list of coefficients `a`, `b`, `c`, `d` for an interpolating spline of `y[i]=y(x[i])`.
5462
5563
The spline is defined as ``f(x) = g_i(x)`` on ``x[i] \\leq x \\leq x[i+1]`` where
5664
5765
``g_i(x) = a[i](x-x[i])^3 + b[i](x-x[i])^2 + c[i](x-x[i]) + d[i]``
5866
5967
Uses the following boundary conditions
60-
- ``g_1(x[1]-h) = 1`` (i.e. the ``y``-intercept is ``(0,1)`` for uniform spaced `x=1:n`)
61-
- ``g_n(x[n]+h) = x[n]`` (i.e. repeated right end-point)
62-
- ``g_n''(x[n]+h) = 0`` (i.e. flat/no-curvature one spacing after end-point)
68+
- ``g_1(x[1]-h) = 1`` (i.e. the ``y``-intercept is ``(0,1)`` for uniform spaced `x=1:I`)
69+
- ``g_I(x[I]+h) = y[I]`` (i.e. repeated right end-point)
70+
- ``g_I''(x[I]+h) = 0`` (i.e. flat/no-curvature one spacing after end-point)
6371
"""
6472
function cubic_spline_coefficients(y::AbstractVector{<:Real}; h=1)
6573
# Set up variables
@@ -242,10 +250,10 @@ Calculates radius `r` and center point `(p, q)` of the circle passing through th
242250
in the xy-plane.
243251
244252
# Example
245-
```
253+
```julia
246254
r, (p, q) = three_point_circle((1,2), (2,1), (5,2))
247-
r, (p, q) == (√5, (3, 3))
248-
````
255+
(r, (p, q)) == (√5, (3, 3))
256+
```
249257
"""
250258
function three_point_circle((a,f),(b,g),(c,h))
251259
fg = f-g
@@ -300,16 +308,36 @@ function breakpoint_model_coefficients(xs, ys, z)
300308
a, b, c = M \ ys
301309
return a, b, c
302310
end
311+
# TODO add an option to compute a cheaper but less accurate model
312+
# a = z; b = (1 - y[z]) / z ; c = y[z] / (R_max - z)
313+
# This fixes the model to the three points (0,1), (z, y[z]), and (R_max, 0)
314+
315+
"""
316+
breakpoint_model(a, b, c, z)
303317
318+
Returns a function `x -> a + b*(min(x, z) - z) + c*(max(x, z) - z)`.
319+
"""
304320
breakpoint_model(a, b, c, z) = x -> a + b*(min(x, z) - z) + c*(max(x, z) - z)
305321

322+
"""
323+
breakpoint_error(xs, ys, z)
324+
325+
Squared L2 error between the best breakpoint model (with breakpoint `z`) evaluated at `xs` and the data `ys`.
326+
327+
See [`breakpoint_model`](@ref) and [`breakpoint_model_coefficients`](@ref).
328+
"""
306329
function breakpoint_error(xs, ys, z)
307330
a, b, c = breakpoint_model_coefficients(xs, ys, z)
308331
f = breakpoint_model(a, b, c, z)
309332
return norm2(@. f(xs) - ys)
310333
# equivalent to sum(((x, y),) -> (f(x) - y)^2, zip(xs, ys))
311334
end
312335

336+
"""
337+
best_breakpoint(xs, ys; breakpoints=xs)
338+
339+
Breakpoint `z in breakpoints` that minimizes the [`breakpoint_error`](@ref).
340+
"""
313341
best_breakpoint(xs, ys; breakpoints=xs) = argmin(z -> breakpoint_error(xs, ys, z), breakpoints)
314342

315343
"""

0 commit comments

Comments
 (0)