Clustering#
Clustering seeks to group data into clusters based on their properties and then allow us to predict which cluster a new member belongs.
import numpy as np
import matplotlib.pyplot as plt
We’ll use a dataset generator that is part of scikit-learn called make_moons
. This generates data that falls into 2 different sets with a shape that looks like half-moons.
from sklearn import datasets
def generate_data():
xvec, val = datasets.make_moons(200, noise=0.2)
# encode the output to be 2 elements
x = []
v = []
for xv, vv in zip(xvec, val):
x.append(np.array(xv))
v.append(vv)
return np.array(x), np.array(v)
x, v = generate_data()
Let’s look at a point and it’s value
print(f"x = {x[0]}, value = {v[0]}")
x = [-0.63374935 1.11804864], value = 0
Now let’s plot the data
def plot_data(x, v):
xpt = [q[0] for q in x]
ypt = [q[1] for q in x]
fig, ax = plt.subplots()
ax.scatter(xpt, ypt, s=40, c=v, cmap="viridis")
ax.set_aspect("equal")
return fig
fig = plot_data(x, v)

We want to partition this domain into 2 regions, such that when we come in with a new point, we know which group it belongs to.
First we setup and train our network
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Input
from keras.optimizers import RMSprop
2025-06-13 15:27:28.659943: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-06-13 15:27:28.663093: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-06-13 15:27:28.671649: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1749828448.685495 3542 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1749828448.689776 3542 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1749828448.701337 3542 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1749828448.701350 3542 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1749828448.701352 3542 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1749828448.701353 3542 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-06-13 15:27:28.705304: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
model = Sequential()
model.add(Input(shape=(2,)))
model.add(Dense(50, activation="relu"))
model.add(Dense(20, activation="relu"))
model.add(Dense(1, activation="sigmoid"))
2025-06-13 15:27:30.515136: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
rms = RMSprop()
model.compile(loss='binary_crossentropy',
optimizer=rms, metrics=['accuracy'])
model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ dense (Dense) │ (None, 50) │ 150 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_1 (Dense) │ (None, 20) │ 1,020 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_2 (Dense) │ (None, 1) │ 21 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 1,191 (4.65 KB)
Trainable params: 1,191 (4.65 KB)
Non-trainable params: 0 (0.00 B)
We seem to need a lot of epochs here to get a good result
epochs = 100
results = model.fit(x, v, batch_size=50, epochs=epochs, verbose=2)
Epoch 1/100
4/4 - 0s - 113ms/step - accuracy: 0.4800 - loss: 0.6920
Epoch 2/100
4/4 - 0s - 7ms/step - accuracy: 0.5900 - loss: 0.6443
Epoch 3/100
4/4 - 0s - 6ms/step - accuracy: 0.6900 - loss: 0.6120
Epoch 4/100
4/4 - 0s - 6ms/step - accuracy: 0.7350 - loss: 0.5855
Epoch 5/100
4/4 - 0s - 6ms/step - accuracy: 0.7750 - loss: 0.5617
Epoch 6/100
4/4 - 0s - 6ms/step - accuracy: 0.7950 - loss: 0.5397
Epoch 7/100
4/4 - 0s - 6ms/step - accuracy: 0.8150 - loss: 0.5195
Epoch 8/100
4/4 - 0s - 6ms/step - accuracy: 0.8150 - loss: 0.5008
Epoch 9/100
4/4 - 0s - 6ms/step - accuracy: 0.8200 - loss: 0.4817
Epoch 10/100
4/4 - 0s - 6ms/step - accuracy: 0.8350 - loss: 0.4645
Epoch 11/100
4/4 - 0s - 6ms/step - accuracy: 0.8350 - loss: 0.4493
Epoch 12/100
4/4 - 0s - 6ms/step - accuracy: 0.8500 - loss: 0.4344
Epoch 13/100
4/4 - 0s - 6ms/step - accuracy: 0.8550 - loss: 0.4215
Epoch 14/100
4/4 - 0s - 6ms/step - accuracy: 0.8550 - loss: 0.4089
Epoch 15/100
4/4 - 0s - 6ms/step - accuracy: 0.8550 - loss: 0.3973
Epoch 16/100
4/4 - 0s - 6ms/step - accuracy: 0.8600 - loss: 0.3866
Epoch 17/100
4/4 - 0s - 6ms/step - accuracy: 0.8600 - loss: 0.3760
Epoch 18/100
4/4 - 0s - 6ms/step - accuracy: 0.8600 - loss: 0.3666
Epoch 19/100
4/4 - 0s - 6ms/step - accuracy: 0.8600 - loss: 0.3592
Epoch 20/100
4/4 - 0s - 6ms/step - accuracy: 0.8600 - loss: 0.3502
Epoch 21/100
4/4 - 0s - 6ms/step - accuracy: 0.8600 - loss: 0.3428
Epoch 22/100
4/4 - 0s - 6ms/step - accuracy: 0.8600 - loss: 0.3362
Epoch 23/100
4/4 - 0s - 6ms/step - accuracy: 0.8600 - loss: 0.3300
Epoch 24/100
4/4 - 0s - 6ms/step - accuracy: 0.8600 - loss: 0.3255
Epoch 25/100
4/4 - 0s - 6ms/step - accuracy: 0.8600 - loss: 0.3200
Epoch 26/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.3162
Epoch 27/100
4/4 - 0s - 6ms/step - accuracy: 0.8600 - loss: 0.3133
Epoch 28/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.3094
Epoch 29/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.3059
Epoch 30/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.3041
Epoch 31/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.3012
Epoch 32/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2986
Epoch 33/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2967
Epoch 34/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2959
Epoch 35/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2924
Epoch 36/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2909
Epoch 37/100
4/4 - 0s - 6ms/step - accuracy: 0.8750 - loss: 0.2903
Epoch 38/100
4/4 - 0s - 6ms/step - accuracy: 0.8750 - loss: 0.2873
Epoch 39/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2860
Epoch 40/100
4/4 - 0s - 6ms/step - accuracy: 0.8750 - loss: 0.2851
Epoch 41/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2835
Epoch 42/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2813
Epoch 43/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2817
Epoch 44/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2785
Epoch 45/100
4/4 - 0s - 6ms/step - accuracy: 0.8750 - loss: 0.2772
Epoch 46/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2754
Epoch 47/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2749
Epoch 48/100
4/4 - 0s - 6ms/step - accuracy: 0.8750 - loss: 0.2728
Epoch 49/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2707
Epoch 50/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2702
Epoch 51/100
4/4 - 0s - 6ms/step - accuracy: 0.8750 - loss: 0.2678
Epoch 52/100
4/4 - 0s - 6ms/step - accuracy: 0.8750 - loss: 0.2665
Epoch 53/100
4/4 - 0s - 6ms/step - accuracy: 0.8750 - loss: 0.2652
Epoch 54/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2640
Epoch 55/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2628
Epoch 56/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2625
Epoch 57/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2603
Epoch 58/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2590
Epoch 59/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2591
Epoch 60/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2568
Epoch 61/100
4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2552
Epoch 62/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2538
Epoch 63/100
4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2538
Epoch 64/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2514
Epoch 65/100
4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2521
Epoch 66/100
4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2503
Epoch 67/100
4/4 - 0s - 6ms/step - accuracy: 0.9000 - loss: 0.2474
Epoch 68/100
4/4 - 0s - 6ms/step - accuracy: 0.9000 - loss: 0.2464
Epoch 69/100
4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2453
Epoch 70/100
4/4 - 0s - 6ms/step - accuracy: 0.9000 - loss: 0.2441
Epoch 71/100
4/4 - 0s - 6ms/step - accuracy: 0.9000 - loss: 0.2433
Epoch 72/100
4/4 - 0s - 6ms/step - accuracy: 0.9000 - loss: 0.2415
Epoch 73/100
4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2418
Epoch 74/100
4/4 - 0s - 6ms/step - accuracy: 0.9050 - loss: 0.2401
Epoch 75/100
4/4 - 0s - 6ms/step - accuracy: 0.9000 - loss: 0.2396
Epoch 76/100
4/4 - 0s - 6ms/step - accuracy: 0.9050 - loss: 0.2370
Epoch 77/100
4/4 - 0s - 6ms/step - accuracy: 0.9050 - loss: 0.2357
Epoch 78/100
4/4 - 0s - 6ms/step - accuracy: 0.9050 - loss: 0.2349
Epoch 79/100
4/4 - 0s - 6ms/step - accuracy: 0.9100 - loss: 0.2343
Epoch 80/100
4/4 - 0s - 6ms/step - accuracy: 0.9100 - loss: 0.2354
Epoch 81/100
4/4 - 0s - 6ms/step - accuracy: 0.9100 - loss: 0.2317
Epoch 82/100
4/4 - 0s - 6ms/step - accuracy: 0.9050 - loss: 0.2320
Epoch 83/100
4/4 - 0s - 6ms/step - accuracy: 0.9100 - loss: 0.2283
Epoch 84/100
4/4 - 0s - 6ms/step - accuracy: 0.9100 - loss: 0.2295
Epoch 85/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2267
Epoch 86/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2259
Epoch 87/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2249
Epoch 88/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2239
Epoch 89/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2228
Epoch 90/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2221
Epoch 91/100
4/4 - 0s - 6ms/step - accuracy: 0.9100 - loss: 0.2231
Epoch 92/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2183
Epoch 93/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2196
Epoch 94/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2172
Epoch 95/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2154
Epoch 96/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2131
Epoch 97/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2128
Epoch 98/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2107
Epoch 99/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2092
Epoch 100/100
4/4 - 0s - 6ms/step - accuracy: 0.9150 - loss: 0.2085
score = model.evaluate(x, v, verbose=0)
print(f"score = {score[0]}")
print(f"accuracy = {score[1]}")
score = 0.20589673519134521
accuracy = 0.9150000214576721
Let’s look at a prediction. We need to feed in a single point as an array of shape (N, 2)
, where N
is the number of points
res = model.predict(np.array([[-2, 2]]))
res
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 30ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 40ms/step
array([[8.030023e-06]], dtype=float32)
We see that we get a floating point number. We will need to convert this to 0 or 1 by rounding.
Let’s plot the partitioning
M = 128
N = 128
xmin = -1.75
xmax = 2.5
ymin = -1.25
ymax = 1.75
xpt = np.linspace(xmin, xmax, M)
ypt = np.linspace(ymin, ymax, N)
To make the prediction go faster, we want to feed in a vector of these points, of the form:
[[xpt[0], ypt[0]],
[xpt[1], ypt[1]],
...
]
We can see that this packs them into the vector
pairs = np.array(np.meshgrid(xpt, ypt)).T.reshape(-1, 2)
pairs[0]
array([-1.75, -1.25])
Now we do the prediction. We will get a vector out, which we reshape to match the original domain.
res = model.predict(pairs, verbose=0)
res.shape = (M, N)
Finally, round to 0 or 1
domain = np.where(res > 0.5, 1, 0)
and we can plot the data
fig, ax = plt.subplots()
ax.imshow(domain.T, origin="lower",
extent=[xmin, xmax, ymin, ymax], alpha=0.25)
xpt = [q[0] for q in x]
ypt = [q[1] for q in x]
ax.scatter(xpt, ypt, s=40, c=v, cmap="viridis")
<matplotlib.collections.PathCollection at 0x7f06dc197990>
