Qlean Dataset、「日本語・1話者・音楽テーマトーク音声コーパスとトランスクリプト」を提供開始

〜GENIAC採択企業のVisual Bank、音楽分野の音声・テキストデータでASR／NLP／LLM開発を支援〜

Visual Bank株式会社

2026年1月28日 12時00分

Visual Bank株式会社（東京都港区、代表取締役CEO 永井真之）は、傘下の株式会社アマナイメージズを通じて展開するAI学習用データソリューション「Qlean Dataset（キュリンデータセット）」において、ASR（自動音声認識）、NLP（自然言語処理）、LLM（大規模言語モデル）などの音声・言語系AI開発に向けた「日本語・1話者・音楽テーマトーク音声コーパスとトランスクリプト」の提供を開始しました。

本データセットは、Qlean Datasetが展開する機械学習用データセットラインナップ『AIデータレシピ』に新たに加わるもので、音楽やアーティスト、楽曲、音楽体験などをテーマに、日本人の話者が一人語り形式で語る日本語音声と、その発話内容を忠実に書き起こしたトランスクリプトを収録しています。作品やアーティストへの考察、音楽にまつわる体験談、ジャンルや時代背景に関する解説など、音楽領域に即した話題が連続的な発話として展開されます。

収録は台本による厳密な制御を行わず、話者が自身の言葉で内容を整理しながら語る形式を前提としています。対話ではなく一話者によるまとまった発話構造となっているため、説明的な語りや文脈の持続、語彙の使われ方を含めた音声・テキストデータとして、音声認識や言語理解、長めの入力を前提とした処理系AIの研究・開発用途での利用を想定しています。

Qlean Datasetでは、研究用途から商用開発までを見据え、権利処理や利用条件を整理したAI開発用データを提供しています。本データセットもその一環として、音楽領域に関連する日本語音声・テキストデータを用いた検証環境の整備を目的に提供されます。

今回から提供を開始する「日本語・1話者・音楽テーマトーク音声コーパスとトランスクリプト」の概要

データ種別	音声、テキスト
被写体属性	日本人、20代〜50代の男女
データ形式	音声データ：mp3,wav テキストデータ：txt,json,csv
収録時間	計約210時間（1音声約5分〜60分）
音声レート	44.1kHz / 48kHz
対象のシーン	話者が音楽や音楽に関連するテーマについて連続的に説明・解説するシーン
サンプル詳細	https://qleandataset.visual-bank.co.jp/lineup/pn-012

「日本語・1話者・音楽テーマトーク音声コーパスとトランスクリプト」のユースケースイメージ

【研究用途（アカデミア）】

ドメイン固有語彙を含む日本語音声認識モデルの検証
音楽・漫画・映画などのカルチャー領域に関する固有名詞や作品名を含む一話者の連続発話音声を用い、ASRモデルが説明的・評価的な語りをどの程度安定して認識できるかを検証する研究に利用できます。

【産業用途】

レビュー・解説型音声コンテンツを想定した言語理解モデルの評価
作品レビューやアーティスト解説など、個人の視点で語られる音声コンテンツを前提に、音声認識後のテキスト理解、要点抽出、要約生成など、NLP／LLM機能の検証に利用できます。
音声入力型レコメンド・検索機能の検証
発話内に含まれる作品名や人物名、評価表現をもとに、関連コンテンツの抽出や分類を行う音声入力型検索・推薦機能の検証データとして利用できます。

【その他実需要】

カルチャー系音声コンテンツの字幕生成・要約検証
映画・漫画・音楽に関する解説音声を想定し、字幕生成や概要文生成など、教育・情報提供用途の音声処理機能の検証に利用できます。

『Qlean Dataset（キュリンデータセット）』について

『Qlean Dataset』は、Visual Bank傘下の株式会社アマナイメージズが提供する商用利用可能なAI学習用データソリューションです。
画像・動画・音声・3D・テキストなど、多様な形式のデータに対応し、研究・商用いずれの用途でも安全に利用できる環境を整備しています。

また、株式会社千葉ロッテマリーンズや株式会社東洋経済新報社をはじめとするデータパートナーとの協業を通じ、業界特化・最新トレンドに即したデータラインナップ『AIデータレシピ』を継続的に拡充しています。

Qlean Datasetは、AI開発現場におけるデータ収集・整備の負荷を軽減し、権利クリアで法的リスクのないAI開発環境の構築を支援します。

▶ Qlean Datasetサイト：https://qleandataset.visual-bank.co.jp/

▶ AIデータレシピ：https://qleandataset.visual-bank.co.jp/lineup

『Qlean Dataset』の提供するデータセット『AIデータレシピ』の特徴

すべての被写体から同意取得
既存データは最短1日で納品可能
カスタム撮影・収録・収集による独自データ構築にも対応

お問い合せ

Visual Bank株式会社

AI開発力を最大化する次世代型データインフラを構築・提供するスタートアップ企業として、「あらゆるデータの可能性を解き放つ」をミッションに掲げ事業活動を展開。漫画家の「もっと描きたい！」をサポートするAI補助ツールを提供する『THE PEN』の他、AI学習用データセット開発サービス『Qlean Dataset（キュリンデータセット）』を提供する株式会社アマナイメージズを100%子会社に持つ。

また、Visual Bankは国の研究開発プログラム「GENIAC」にも採択され、社会実装に向けた取り組みを加速させています。

代表取締役CEO：永井真之

所在地：〒107-0062 東京都港区南青山7-1-7 C-Cube南青山ビル6F

Visual Bank企業URL：https://visual-bank.co.jp/

アマナイメージズ企業URL：https://amanaimages.com/about/

【Translation】

Qlean Dataset Launches a Japanese Single-Speaker Music-Themed Audio Dataset with Transcripts

Long-Form Spoken Content for ASR, NLP, and LLM Evaluation

Visual Bank Inc. (Minato-ku, Tokyo; CEO: Saneyuki Nagai), through its subsidiary amanaimages Inc., has launched a new dataset under its AI training data solution, Qlean Dataset: the Japanese Single-Speaker Music-Themed Audio Corpus with Transcripts.

The dataset is designed to support the development and evaluation of speech- and language-based AI systems, including ASR, NLP, and LLMs.

This dataset is a new addition to Qlean Dataset’s machine learning lineup, AI Data Recipe.
It features Japanese audio recordings in which a single speaker delivers extended, monologue-style speech on music-related topics such as artists, songs, musical experiences, genres, and cultural background. Each recording is paired with accurate transcripts that reflect the spoken content.

All recordings are conducted without strict scripting, allowing speakers to express their thoughts naturally in continuous speech.
As a result, the dataset is well suited for evaluating speech recognition, discourse continuity, vocabulary usage, and language understanding in AI systems that process long-form spoken input.

Qlean Dataset provides AI development data for both research and commercial use, with rights clearance and usage conditions carefully organized.
This dataset is offered to support reliable evaluation environments using Japanese speech and text data in the music and cultural content domain.

Dataset Overview:Japanese Single-Speaker Music-Themed Audio Corpus with Transcripts

Data Types	Audio, Text
Speaker Attributes	Japanese speakers, male and female, aged 20s to 50s
Data Formats	Audio: mp3 / wav Text: txt /json /csv
Total Duration	Approximately 210 hours (each recording ranges from approximately 5 to 60 minutes)
Audio Sampling Rate	44.1kHz / 48kHz
Recorded Scenarios	Single-speaker scenes in which the speaker continuously explains or discusses music-related topics
Sample Details	https://qleandataset.visual-bank.co.jp/en/lineup/pn-012

Use Case Examples for the Japanese Single-Speaker Music-Themed Audio Corpus with Transcripts

Evaluation of Japanese ASR models with domain-specific vocabulary

This dataset can be used to evaluate ASR models on continuous single-speaker speech that includes domain-specific terms, proper nouns, and titles related to cultural fields such as music, comics, and film. It enables assessment of how consistently models recognize explanatory and evaluative speech over extended segments.

- Research Use Cases

Evaluation of language understanding models for review-style audio content

Assuming audio content such as music reviews and artist commentary spoken from an individual perspective, the dataset can be used to evaluate downstream NLP and LLM tasks after speech recognition, including content understanding, key point extraction, and summary generation.
Validation of voice-based recommendation and search systems

Based on titles, artist names, and evaluative expressions contained within speech, the dataset can serve as evaluation data for voice-input search and recommendation systems that extract, classify, and relate cultural content.

- Industrial Use Cases

Subtitle generation and summarization for cultural audio content

The dataset can be applied to the evaluation of speech processing functions for educational and informational use cases, such as subtitle generation and overview creation for explanatory audio related to music, film, and comics.

About Qlean Dataset

Qlean Dataset is a commercial-use-ready AI training data solution provided by Amana Images Inc., a subsidiary of Visual Bank Inc.
It supports a wide range of data types, including images, videos, audio, 3D assets, and text, enabling both research and commercial AI development in a legally safe environment.

Through collaborations with data partners such as Chiba Lotte Marines Co., Ltd. and Toyo Keizai Inc., Qlean Dataset continues to expand its specialized, industry-focused lineup known as the “AI Data Recipe.”

By reducing the operational burden of data collection and preparation, Qlean Dataset helps organizations establish AI development environments that are both legally compliant and risk-free.

▶ Qlean Dataset: https://qleandataset.visual-bank.co.jp/en
▶ AI Data Recipe: https://qleandataset.visual-bank.co.jp/en/lineup

Key Features of Qlean Dataset

Existing datasets deliverable within one business day
Custom data collection and recording services available

Contact

About Visual Bank Inc.

Visual Bank Inc. is a Tokyo-based startup building Next-Generation Data infrastructure to enhance AI development capabilities under the mission “Unlocking Data Accessibility.”
The company operates THE PEN, an AI-assisted creative tool for manga artists and the Qlean Dataset service.

Its subsidiaries include Amana Images Inc., one of Japan’s largest photostock providers; Qlean Dataset, which leads research and development in AI data; and THE PEN Inc., an AI-assisted creative tool for manga artists.

CEO: Saneyuki Nagai
Address: 6F, C-Cube Minami Aoyama Building, 7-1-7 Minami-Aoyama, Minato-ku, Tokyo
Corporate Site: https://visual-bank.co.jp/en
Amana Images: https://qleandataset.visual-bank.co.jp/en/company-overview

このプレスリリースには、メディア関係者向けの情報があります

メディアユーザーログイン

メディアユーザー新規登録無料

メディアユーザー登録を行うと、企業担当者の連絡先や、イベント・記者会見の情報など様々な特記情報を閲覧できます。※内容はプレスリリースにより異なります。

すべての画像