2025/08/29

Supabase vecs + OpenAI CLIPで画像のベクトル検索してみる

SupabasePython

Supabase vecs使用してベクトル検索を行ってみます。

まずはpythonプログラムからvecsに接続してみます。

Supabase vecsの準備

pipでvecsをインストールします。

pip install vecs

接続するためのプログラムを作成します。

今回は画像をベクトル化して保存したいのでimage_vectorsという名前でコレクションを作成します。

dimensionはベクトル化するために使用するモデルの次元数にあわせます。

import vecs

# vecsに接続
vx = vecs.create_client("<your-db-url>")
image_collection = vx.get_or_create_collection(name="image_vectors", dimension=512)

モデルをロードする

次にモデルのロードを行うプログラムを作成します。

今回はHugging Faceで公開されているline-corporation/clip-japanese-baseを使用させていただきます。

まずはモデルロードに必要なライブラリをインストールします。

pip install transformers

transformersライブラリでモデルをロードします。

import vecs
import torch
from transformers import AutoModel, AutoTokenizer, AutoImageProcessor

# SupabaseVecs接続
vx = vecs.create_client("postgresql://postgres:postgres@127.0.0.1:54322/postgres")
image_collection = vx.get_or_create_collection(name="image_vectors", dimension=512)

# モデルのロード
model_name = "line-corporation/clip-japanese-base"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = AutoModel.from_pretrained(model_name, trust_remote_code=True).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
image_processor = AutoImageProcessor.from_pretrained(model_name, trust_remote_code=True)

画像をベクトル化する

次に画像を読み込んでベクトル化します。

画像を読み込むためにPillowをインストールします。

pip install Pillow

インストールできたらプログラムを作成します。

import vecs
import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer, AutoImageProcessor

# SupabaseVecs接続
vx = vecs.create_client("postgresql://postgres:postgres@127.0.0.1:54322/postgres")
image_collection = vx.get_or_create_collection(name="image_vectors", dimension=512)

# モデルのロード
model_name = "line-corporation/clip-japanese-base"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = AutoModel.from_pretrained(model_name, trust_remote_code=True).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
image_processor = AutoImageProcessor.from_pretrained(model_name, trust_remote_code=True)

# 画像を読み込む
image = Image.open("sample.png")

# 画像をベクトル化
image_tensor = image_processor(image, return_tensors="pt").to(device)
with torch.no_grad():
    image_features = model.get_image_features(**image_tensor)

embedding = image_features.cpu().numpy().tolist()

print(embedding)

ここまでできたら一度実行してみましょう。

python vecs_sample.py

数字の配列が出力されていたら成功です。

vecsに保存する

画像のベクトル化ができたので、vecsに保存してみます。

IDの作成のためにuuidをインストールします。

pip install uuid

vecsに保存するプログラムを作成します。

import vecs
import torch
import uuid
import time
from PIL import Image
from transformers import AutoModel, AutoTokenizer, AutoImageProcessor

# SupabaseVecs接続
vx = vecs.create_client("postgresql://postgres:postgres@127.0.0.1:54322/postgres")
image_collection = vx.get_or_create_collection(name="image_vectors", dimension=512)

# モデルのロード
model_name = "line-corporation/clip-japanese-base"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = AutoModel.from_pretrained(model_name, trust_remote_code=True).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
image_processor = AutoImageProcessor.from_pretrained(model_name, trust_remote_code=True)

# 画像を読み込む
image = Image.open("sample.png")

# 画像をベクトル化
image_tensor = image_processor(image, return_tensors="pt").to(device)
with torch.no_grad():
    image_features = model.get_image_features(**image_tensor)

embedding = image_features.cpu().numpy().tolist()

embedding_data = embedding[0]

# ベクトルを保存
image_collection.upsert(
    records=[
        (
            uuid.uuid4(),     # id
            embedding_data,   # ベクトル化したデータ
            {
                "created_at": time.time() # メタデータはkey valueで登録できます
            }
        )
    ],
)
image_collection.create_index()

vx.disconnect()

実行するとvecsにデータが登録されていると思います！

今回は画像をベクトル化したデータを登録するところまで紹介しました。

次回はデータの検索を行うところを紹介できればと思います。

最後に

株式会社Robbitsでは一緒に働く仲間を募集しています！
ご興味のある方は是非一度ホームページをご覧ください！

ホームページを見てみる