Clustering#
Clustering seeks to group data into clusters based on their properties and then allow us to predict which cluster a new member belongs.
import numpy as np
import matplotlib.pyplot as plt
We’ll use a dataset generator that is part of scikit-learn called make_moons
. This generates data that falls into 2 different sets with a shape that looks like half-moons.
from sklearn import datasets
def generate_data():
xvec, val = datasets.make_moons(200, noise=0.2)
# encode the output to be 2 elements
x = []
v = []
for xv, vv in zip(xvec, val):
x.append(np.array(xv))
v.append(vv)
return np.array(x), np.array(v)
x, v = generate_data()
Let’s look at a point and it’s value
print(f"x = {x[0]}, value = {v[0]}")
x = [ 1.72491437 -0.36291177], value = 1
Now let’s plot the data
def plot_data(x, v):
xpt = [q[0] for q in x]
ypt = [q[1] for q in x]
fig, ax = plt.subplots()
ax.scatter(xpt, ypt, s=40, c=v, cmap="viridis")
ax.set_aspect("equal")
return fig
fig = plot_data(x, v)

We want to partition this domain into 2 regions, such that when we come in with a new point, we know which group it belongs to.
First we setup and train our network
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Input
from keras.optimizers import RMSprop
2025-07-18 15:13:23.610868: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-07-18 15:13:23.613961: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-07-18 15:13:23.622426: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1752851603.636483 3644 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752851603.640588 3644 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1752851603.652196 3644 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1752851603.652207 3644 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1752851603.652209 3644 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1752851603.652211 3644 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-07-18 15:13:23.656224: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
model = Sequential()
model.add(Input(shape=(2,)))
model.add(Dense(50, activation="relu"))
model.add(Dense(20, activation="relu"))
model.add(Dense(1, activation="sigmoid"))
2025-07-18 15:13:25.475968: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
rms = RMSprop()
model.compile(loss='binary_crossentropy',
optimizer=rms, metrics=['accuracy'])
model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ dense (Dense) │ (None, 50) │ 150 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_1 (Dense) │ (None, 20) │ 1,020 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_2 (Dense) │ (None, 1) │ 21 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 1,191 (4.65 KB)
Trainable params: 1,191 (4.65 KB)
Non-trainable params: 0 (0.00 B)
We seem to need a lot of epochs here to get a good result
epochs = 100
results = model.fit(x, v, batch_size=50, epochs=epochs, verbose=2)
Epoch 1/100
4/4 - 0s - 115ms/step - accuracy: 0.5100 - loss: 0.7133
Epoch 2/100
4/4 - 0s - 6ms/step - accuracy: 0.6050 - loss: 0.6667
Epoch 3/100
4/4 - 0s - 6ms/step - accuracy: 0.7550 - loss: 0.6351
Epoch 4/100
4/4 - 0s - 6ms/step - accuracy: 0.8000 - loss: 0.6112
Epoch 5/100
4/4 - 0s - 6ms/step - accuracy: 0.8100 - loss: 0.5894
Epoch 6/100
4/4 - 0s - 6ms/step - accuracy: 0.8200 - loss: 0.5708
Epoch 7/100
4/4 - 0s - 6ms/step - accuracy: 0.8250 - loss: 0.5523
Epoch 8/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.5341
Epoch 9/100
4/4 - 0s - 6ms/step - accuracy: 0.8400 - loss: 0.5171
Epoch 10/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.5010
Epoch 11/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.4856
Epoch 12/100
4/4 - 0s - 6ms/step - accuracy: 0.8400 - loss: 0.4715
Epoch 13/100
4/4 - 0s - 6ms/step - accuracy: 0.8350 - loss: 0.4575
Epoch 14/100
4/4 - 0s - 6ms/step - accuracy: 0.8400 - loss: 0.4447
Epoch 15/100
4/4 - 0s - 6ms/step - accuracy: 0.8400 - loss: 0.4329
Epoch 16/100
4/4 - 0s - 6ms/step - accuracy: 0.8400 - loss: 0.4209
Epoch 17/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.4097
Epoch 18/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.4003
Epoch 19/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3906
Epoch 20/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3818
Epoch 21/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3738
Epoch 22/100
4/4 - 0s - 6ms/step - accuracy: 0.8400 - loss: 0.3681
Epoch 23/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3609
Epoch 24/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3547
Epoch 25/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3496
Epoch 26/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3443
Epoch 27/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3401
Epoch 28/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3357
Epoch 29/100
4/4 - 0s - 8ms/step - accuracy: 0.8400 - loss: 0.3313
Epoch 30/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3269
Epoch 31/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3247
Epoch 32/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3214
Epoch 33/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3176
Epoch 34/100
4/4 - 0s - 6ms/step - accuracy: 0.8500 - loss: 0.3152
Epoch 35/100
4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3125
Epoch 36/100
4/4 - 0s - 6ms/step - accuracy: 0.8500 - loss: 0.3107
Epoch 37/100
4/4 - 0s - 6ms/step - accuracy: 0.8550 - loss: 0.3081
Epoch 38/100
4/4 - 0s - 6ms/step - accuracy: 0.8550 - loss: 0.3061
Epoch 39/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.3032
Epoch 40/100
4/4 - 0s - 6ms/step - accuracy: 0.8600 - loss: 0.3021
Epoch 41/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.3003
Epoch 42/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2985
Epoch 43/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2965
Epoch 44/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2941
Epoch 45/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2931
Epoch 46/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2931
Epoch 47/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2903
Epoch 48/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2882
Epoch 49/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2874
Epoch 50/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2855
Epoch 51/100
4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2850
Epoch 52/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2834
Epoch 53/100
4/4 - 0s - 6ms/step - accuracy: 0.8750 - loss: 0.2823
Epoch 54/100
4/4 - 0s - 7ms/step - accuracy: 0.8700 - loss: 0.2827
Epoch 55/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2804
Epoch 56/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2803
Epoch 57/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2774
Epoch 58/100
4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2779
Epoch 59/100
4/4 - 0s - 7ms/step - accuracy: 0.8700 - loss: 0.2760
Epoch 60/100
4/4 - 0s - 7ms/step - accuracy: 0.8700 - loss: 0.2772
Epoch 61/100
4/4 - 0s - 7ms/step - accuracy: 0.8750 - loss: 0.2739
Epoch 62/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2727
Epoch 63/100
4/4 - 0s - 7ms/step - accuracy: 0.8750 - loss: 0.2718
Epoch 64/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2716
Epoch 65/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2705
Epoch 66/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2709
Epoch 67/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2677
Epoch 68/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2680
Epoch 69/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2675
Epoch 70/100
4/4 - 0s - 7ms/step - accuracy: 0.8850 - loss: 0.2666
Epoch 71/100
4/4 - 0s - 7ms/step - accuracy: 0.8800 - loss: 0.2662
Epoch 72/100
4/4 - 0s - 7ms/step - accuracy: 0.8800 - loss: 0.2642
Epoch 73/100
4/4 - 0s - 7ms/step - accuracy: 0.8850 - loss: 0.2628
Epoch 74/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2624
Epoch 75/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2626
Epoch 76/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2611
Epoch 77/100
4/4 - 0s - 7ms/step - accuracy: 0.8800 - loss: 0.2608
Epoch 78/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2598
Epoch 79/100
4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2586
Epoch 80/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2583
Epoch 81/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2589
Epoch 82/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2555
Epoch 83/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2550
Epoch 84/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2557
Epoch 85/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2530
Epoch 86/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2554
Epoch 87/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2542
Epoch 88/100
4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2513
Epoch 89/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2506
Epoch 90/100
4/4 - 0s - 7ms/step - accuracy: 0.8900 - loss: 0.2513
Epoch 91/100
4/4 - 0s - 7ms/step - accuracy: 0.8900 - loss: 0.2492
Epoch 92/100
4/4 - 0s - 7ms/step - accuracy: 0.8900 - loss: 0.2480
Epoch 93/100
4/4 - 0s - 7ms/step - accuracy: 0.8900 - loss: 0.2476
Epoch 94/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2473
Epoch 95/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2457
Epoch 96/100
4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2447
Epoch 97/100
4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2438
Epoch 98/100
4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2431
Epoch 99/100
4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2430
Epoch 100/100
4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2423
score = model.evaluate(x, v, verbose=0)
print(f"score = {score[0]}")
print(f"accuracy = {score[1]}")
score = 0.23966573178768158
accuracy = 0.8949999809265137
Let’s look at a prediction. We need to feed in a single point as an array of shape (N, 2)
, where N
is the number of points
res = model.predict(np.array([[-2, 2]]))
res
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 30ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 40ms/step
array([[6.217183e-10]], dtype=float32)
We see that we get a floating point number. We will need to convert this to 0 or 1 by rounding.
Let’s plot the partitioning
M = 128
N = 128
xmin = -1.75
xmax = 2.5
ymin = -1.25
ymax = 1.75
xpt = np.linspace(xmin, xmax, M)
ypt = np.linspace(ymin, ymax, N)
To make the prediction go faster, we want to feed in a vector of these points, of the form:
[[xpt[0], ypt[0]],
[xpt[1], ypt[1]],
...
]
We can see that this packs them into the vector
pairs = np.array(np.meshgrid(xpt, ypt)).T.reshape(-1, 2)
pairs[0]
array([-1.75, -1.25])
Now we do the prediction. We will get a vector out, which we reshape to match the original domain.
res = model.predict(pairs, verbose=0)
res.shape = (M, N)
Finally, round to 0 or 1
domain = np.where(res > 0.5, 1, 0)
and we can plot the data
fig, ax = plt.subplots()
ax.imshow(domain.T, origin="lower",
extent=[xmin, xmax, ymin, ymax], alpha=0.25)
xpt = [q[0] for q in x]
ypt = [q[1] for q in x]
ax.scatter(xpt, ypt, s=40, c=v, cmap="viridis")
<matplotlib.collections.PathCollection at 0x7fca8c7ce210>
