Clustering

Clustering#

Clustering seeks to group data into clusters based on their properties and then allow us to predict which cluster a new member belongs.

import numpy as np
import matplotlib.pyplot as plt

We’ll use a dataset generator that is part of scikit-learn called make_moons. This generates data that falls into 2 different sets with a shape that looks like half-moons.

from sklearn import datasets

def generate_data():
    xvec, val = datasets.make_moons(200, noise=0.2)

    # encode the output to be 2 elements
    x = []
    v = []
    for xv, vv in zip(xvec, val):
        x.append(np.array(xv))
        v.append(vv)

    return np.array(x), np.array(v)

x, v = generate_data()

Let’s look at a point and it’s value

print(f"x = {x[0]}, value = {v[0]}")

x = [ 1.72491437 -0.36291177], value = 1

Now let’s plot the data

def plot_data(x, v):
    xpt = [q[0] for q in x]
    ypt = [q[1] for q in x]

    fig, ax = plt.subplots()
    ax.scatter(xpt, ypt, s=40, c=v, cmap="viridis")
    ax.set_aspect("equal")
    return fig

fig = plot_data(x, v)

../_images/af792e48731785fb44eb7a706daad8cf87f14bafe26c17112f9da6fcf9377581.png

We want to partition this domain into 2 regions, such that when we come in with a new point, we know which group it belongs to.

First we setup and train our network

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Input
from keras.optimizers import RMSprop

2025-07-18 15:13:23.610868: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-07-18 15:13:23.613961: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-07-18 15:13:23.622426: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1752851603.636483    3644 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752851603.640588    3644 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1752851603.652196    3644 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1752851603.652207    3644 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1752851603.652209    3644 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1752851603.652211    3644 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-07-18 15:13:23.656224: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

model = Sequential()
model.add(Input(shape=(2,)))
model.add(Dense(50, activation="relu"))
model.add(Dense(20, activation="relu"))
model.add(Dense(1, activation="sigmoid"))

2025-07-18 15:13:25.475968: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)

rms = RMSprop()
model.compile(loss='binary_crossentropy',
              optimizer=rms, metrics=['accuracy'])

model.summary()

Model: "sequential"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense (Dense)                   │ (None, 50)             │           150 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 20)             │         1,020 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (None, 1)              │            21 │
└─────────────────────────────────┴────────────────────────┴───────────────┘

 Total params: 1,191 (4.65 KB)

 Trainable params: 1,191 (4.65 KB)

 Non-trainable params: 0 (0.00 B)

We seem to need a lot of epochs here to get a good result

epochs = 100
results = model.fit(x, v, batch_size=50, epochs=epochs, verbose=2)

Epoch 1/100

4/4 - 0s - 115ms/step - accuracy: 0.5100 - loss: 0.7133

Epoch 2/100

4/4 - 0s - 6ms/step - accuracy: 0.6050 - loss: 0.6667

Epoch 3/100

4/4 - 0s - 6ms/step - accuracy: 0.7550 - loss: 0.6351

Epoch 4/100

4/4 - 0s - 6ms/step - accuracy: 0.8000 - loss: 0.6112

Epoch 5/100

4/4 - 0s - 6ms/step - accuracy: 0.8100 - loss: 0.5894

Epoch 6/100

4/4 - 0s - 6ms/step - accuracy: 0.8200 - loss: 0.5708

Epoch 7/100

4/4 - 0s - 6ms/step - accuracy: 0.8250 - loss: 0.5523

Epoch 8/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.5341

Epoch 9/100

4/4 - 0s - 6ms/step - accuracy: 0.8400 - loss: 0.5171

Epoch 10/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.5010

Epoch 11/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.4856

Epoch 12/100

4/4 - 0s - 6ms/step - accuracy: 0.8400 - loss: 0.4715

Epoch 13/100

4/4 - 0s - 6ms/step - accuracy: 0.8350 - loss: 0.4575

Epoch 14/100

4/4 - 0s - 6ms/step - accuracy: 0.8400 - loss: 0.4447

Epoch 15/100

4/4 - 0s - 6ms/step - accuracy: 0.8400 - loss: 0.4329

Epoch 16/100

4/4 - 0s - 6ms/step - accuracy: 0.8400 - loss: 0.4209

Epoch 17/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.4097

Epoch 18/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.4003

Epoch 19/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3906

Epoch 20/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3818

Epoch 21/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3738

Epoch 22/100

4/4 - 0s - 6ms/step - accuracy: 0.8400 - loss: 0.3681

Epoch 23/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3609

Epoch 24/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3547

Epoch 25/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3496

Epoch 26/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3443

Epoch 27/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3401

Epoch 28/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3357

Epoch 29/100

4/4 - 0s - 8ms/step - accuracy: 0.8400 - loss: 0.3313

Epoch 30/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3269

Epoch 31/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3247

Epoch 32/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3214

Epoch 33/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3176

Epoch 34/100

4/4 - 0s - 6ms/step - accuracy: 0.8500 - loss: 0.3152

Epoch 35/100

4/4 - 0s - 6ms/step - accuracy: 0.8450 - loss: 0.3125

Epoch 36/100

4/4 - 0s - 6ms/step - accuracy: 0.8500 - loss: 0.3107

Epoch 37/100

4/4 - 0s - 6ms/step - accuracy: 0.8550 - loss: 0.3081

Epoch 38/100

4/4 - 0s - 6ms/step - accuracy: 0.8550 - loss: 0.3061

Epoch 39/100

4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.3032

Epoch 40/100

4/4 - 0s - 6ms/step - accuracy: 0.8600 - loss: 0.3021

Epoch 41/100

4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.3003

Epoch 42/100

4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2985

Epoch 43/100

4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2965

Epoch 44/100

4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2941

Epoch 45/100

4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2931

Epoch 46/100

4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2931

Epoch 47/100

4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2903

Epoch 48/100

4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2882

Epoch 49/100

4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2874

Epoch 50/100

4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2855

Epoch 51/100

4/4 - 0s - 6ms/step - accuracy: 0.8650 - loss: 0.2850

Epoch 52/100

4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2834

Epoch 53/100

4/4 - 0s - 6ms/step - accuracy: 0.8750 - loss: 0.2823

Epoch 54/100

4/4 - 0s - 7ms/step - accuracy: 0.8700 - loss: 0.2827

Epoch 55/100

4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2804

Epoch 56/100

4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2803

Epoch 57/100

4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2774

Epoch 58/100

4/4 - 0s - 6ms/step - accuracy: 0.8700 - loss: 0.2779

Epoch 59/100

4/4 - 0s - 7ms/step - accuracy: 0.8700 - loss: 0.2760

Epoch 60/100

4/4 - 0s - 7ms/step - accuracy: 0.8700 - loss: 0.2772

Epoch 61/100

4/4 - 0s - 7ms/step - accuracy: 0.8750 - loss: 0.2739

Epoch 62/100

4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2727

Epoch 63/100

4/4 - 0s - 7ms/step - accuracy: 0.8750 - loss: 0.2718

Epoch 64/100

4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2716

Epoch 65/100

4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2705

Epoch 66/100

4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2709

Epoch 67/100

4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2677

Epoch 68/100

4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2680

Epoch 69/100

4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2675

Epoch 70/100

4/4 - 0s - 7ms/step - accuracy: 0.8850 - loss: 0.2666

Epoch 71/100

4/4 - 0s - 7ms/step - accuracy: 0.8800 - loss: 0.2662

Epoch 72/100

4/4 - 0s - 7ms/step - accuracy: 0.8800 - loss: 0.2642

Epoch 73/100

4/4 - 0s - 7ms/step - accuracy: 0.8850 - loss: 0.2628

Epoch 74/100

4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2624

Epoch 75/100

4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2626

Epoch 76/100

4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2611

Epoch 77/100

4/4 - 0s - 7ms/step - accuracy: 0.8800 - loss: 0.2608

Epoch 78/100

4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2598

Epoch 79/100

4/4 - 0s - 6ms/step - accuracy: 0.8800 - loss: 0.2586

Epoch 80/100

4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2583

Epoch 81/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2589

Epoch 82/100

4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2555

Epoch 83/100

4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2550

Epoch 84/100

4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2557

Epoch 85/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2530

Epoch 86/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2554

Epoch 87/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2542

Epoch 88/100

4/4 - 0s - 6ms/step - accuracy: 0.8850 - loss: 0.2513

Epoch 89/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2506

Epoch 90/100

4/4 - 0s - 7ms/step - accuracy: 0.8900 - loss: 0.2513

Epoch 91/100

4/4 - 0s - 7ms/step - accuracy: 0.8900 - loss: 0.2492

Epoch 92/100

4/4 - 0s - 7ms/step - accuracy: 0.8900 - loss: 0.2480

Epoch 93/100

4/4 - 0s - 7ms/step - accuracy: 0.8900 - loss: 0.2476

Epoch 94/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2473

Epoch 95/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2457

Epoch 96/100

4/4 - 0s - 6ms/step - accuracy: 0.8900 - loss: 0.2447

Epoch 97/100

4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2438

Epoch 98/100

4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2431

Epoch 99/100

4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2430

Epoch 100/100

4/4 - 0s - 6ms/step - accuracy: 0.8950 - loss: 0.2423

score = model.evaluate(x, v, verbose=0)
print(f"score = {score[0]}")
print(f"accuracy = {score[1]}")

score = 0.23966573178768158
accuracy = 0.8949999809265137

Let’s look at a prediction. We need to feed in a single point as an array of shape (N, 2), where N is the number of points

res = model.predict(np.array([[-2, 2]]))
res

1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 30ms/step


1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 40ms/step

array([[6.217183e-10]], dtype=float32)

We see that we get a floating point number. We will need to convert this to 0 or 1 by rounding.

Let’s plot the partitioning

M = 128
N = 128

xmin = -1.75
xmax = 2.5
ymin = -1.25
ymax = 1.75

xpt = np.linspace(xmin, xmax, M)
ypt = np.linspace(ymin, ymax, N)

To make the prediction go faster, we want to feed in a vector of these points, of the form:

[[xpt[0], ypt[0]],
 [xpt[1], ypt[1]],
 ...
]

We can see that this packs them into the vector

pairs = np.array(np.meshgrid(xpt, ypt)).T.reshape(-1, 2)
pairs[0]

array([-1.75, -1.25])

Now we do the prediction. We will get a vector out, which we reshape to match the original domain.

res = model.predict(pairs, verbose=0)
res.shape = (M, N)

Finally, round to 0 or 1

domain = np.where(res > 0.5, 1, 0)

and we can plot the data

fig, ax = plt.subplots()
ax.imshow(domain.T, origin="lower",
          extent=[xmin, xmax, ymin, ymax], alpha=0.25)
xpt = [q[0] for q in x]
ypt = [q[1] for q in x]

ax.scatter(xpt, ypt, s=40, c=v, cmap="viridis")

<matplotlib.collections.PathCollection at 0x7fca8c7ce210>

../_images/680b812a7ef3bb160fd376794380cda5fb64438c2c137ab51320e029468bffae.png