Qlean Dataset、感情音声データセット「9感情×2強度・日本語感情発話」を提供開始

〜9種類の感情×強弱2段階ラベル付き、6,800ファイルの商用利用対応コーパス。SER・TTS・マルチモーダルAI開発に活用可能。GENIAC採択企業のVisual Bankが提供〜

Visual Bank株式会社

2025年7月14日 10時30分

Visual Bank株式会社（東京都港区、代表取締役CEO 永井真之）は、傘下の株式会社アマナイメージズを通じて展開するAI学習用データソリューション「Qlean Dataset（キュリンデータセット）」において、「感情を込めた発話データセット」の提供を開始します。

■ 感情音声データセットとは

感情音声データセットとは、感情ラベルが付与された音声コーパスです。音声感情認識（SER）モデルの学習・評価、感情表現を伴うTTSモデルの構築、マルチモーダルAIの感情推定タスクに活用されます。本データセットは、先行研究に基づく9感情×強弱2段階という細粒度のラベル設計が特徴で、感情強度を考慮したモデル開発や、強度別の精度検証が可能です。

■ 「感情を込めた発話データセット」の概要

10代〜50代の日本人男女100名が、感情分析の先行研究に基づく9種類の感情を「通常」「強い」の2強度で発話した音声を収録しています。固定セリフ形式のため話者間の比較・対照実験に適しており、感情ラベルの精度検証やベンチマーク評価に最適な設計です。

データ種別：	音声（1話者・感情発話形式）
収録話者：	日本人男女100名（10代〜50代）
感情種類：	9種類（普通・落ち着き・喜び・悲しみ・怒り・恐れ・嫌悪・驚き・焦り）
強度ラベル：	2段階（通常・強い）
収録セリフ：	「ラジオからこんなニュースが流れてきた」「テーブルに携帯電話が置かれている」
ファイル数：	6,800ファイル
データ形式：	wav
利用用途：	商用利用可能

サンプルデータはこちら：https://qleandataset.visual-bank.co.jp/lineup/ds-001

■ よくある質問（FAQ）

Q. 感情音声データセットはSER（音声感情認識）開発にどう使えますか？

A. 9感情×2強度の分類済み音声を用いて、F0・MFCCなどの音響特徴量から感情を推定するアルゴリズムの学習・評価に利用できます。固定セリフ形式のため話者間の対照実験が可能で、感情強度がモデル精度に与える影響の定量評価にも対応します。

Q. TTS（音声合成）への活用は可能ですか？

A. VITS・StyleTTSなどへのファインチューニングに活用可能です。9感情×2強度の多様なプロソディにより、感情表現力の高いAIキャラクター・バーチャルアシスタントの音声生成エンジンを構築できます。

Q. マルチモーダルAI開発への活用は？

A. 音声・セリフ・話者属性（年代・性別）を組み合わせ、VTuberやアバターの感情推定を行うマルチモーダルモデルの学習・評価データとして活用できます。

Q. コンタクトセンター向け感情分析への活用は？

A. 怒り・焦りなど9感情のラベル付き音声を教師データとして、リアルタイム感情検知モデルを構築。Google STT・Amazon Transcribeと組み合わせたエスカレーション判断システムの実装に対応します。

Q. カスタム収録・追加対応は可能ですか？

A. はい。感情種類・強度・年齢層・収録条件など、開発要件に応じたカスタムデータ収集に対応しています。

■ 「感情を込めた発話データセット」のユースケースイメージ

▷ SER（音声感情認識）モデルの学習・精度検証

9感情×2強度のラベル付き音声でF0・MFCCベースの感情推定モデルを学習・評価。固定セリフ設計により話者間の対照実験・ベンチマーク測定が可能です。

▷ 表現力豊かなTTS・対話AIのファインチューニング

9感情×2強度の多様なプロソディをVITS・StyleTTSへのファインチューニングに活用。感情強度に応じた発話スタイルを再現するAIキャラクター・バーチャルアシスタントを構築できます。

▷ マルチモーダルAI・感情推定モデルの開発

音声・テキスト・話者属性（年代・性別）の三要素を組み合わせ、VTuberやアバター向けの感情推定マルチモーダルモデルの学習・評価データとして活用できます。

▷ コンタクトセンター向けリアルタイム感情分析

怒り・焦りなどの感情ラベル付き音声を教師データとして、SERモデルによるリアルタイム感情検知システムを構築。Google STT・Amazon Transcribeと組み合わせたエスカレーション自動判断に対応します。

▷ モデル評価・ベンチマーク用途

感情強度の異なる2段階ラベルにより、感情分類AIの強度別精度評価やABテスト、WER測定など多角的なベンチマークに活用できます。

『Qlean Dataset』の提供するデータセットの特徴

『Qlean Dataset（キュリンデータセット）』について『Qlean Dataset』は、Visual Bank傘下の株式会社アマナイメージズが提供する権利クリア・商用利用可能なAI学習用データソリューションです。音声・画像・動画・3D・テキストなど多様な形式に対応し、基盤モデル開発者をはじめとするAI開発者が、法的リスクなく高品質なデータを調達・活用できる環境を提供しています。国内外のデータホルダーや、ラジオ・新聞社・通信社等のメディアとの協業により、業界特化・トレンド直結のデータラインナップ『AIデータレシピ』を随時追加中です。既存データは最短2営業日で納品し、カスタム収録・収集にも対応しています。

Qlean Datasetサイト：https://qleandataset.visual-bank.co.jp/

AIデータレシピ：https://qleandataset.visual-bank.co.jp/lineup

Visual Bank株式会社

AI開発力を最大化する次世代型データインフラを構築・提供するスタートアップ企業として、「あらゆるデータの可能性を解き放つ」をミッションに掲げ事業活動を展開。漫画家の「もっと描きたい！」をサポートするAI補助ツールを提供する『THE PEN』、AI学習用データセット開発サービス『Qlean Dataset（キュリンデータセット）』を提供する株式会社アマナイメージズを100%子会社に持つ。

また、Visual Bankは国の研究開発プログラム「GENIAC」にも採択され、社会実装に向けた取り組みを加速させています。

代表取締役CEO：永井真之

所在地：〒107-0062 東京都港区南青山7-1-7 C-Cube南青山ビル6F

Visual Bank企業URL：https://visual-bank.co.jp/

アマナイメージズ企業URL：https://amanaimages.com/about/

Qlean Dataset Launches Japanese Emotional Speech Dataset: 9 Emotions × 2 Intensity Levels for SER, TTS, and Multimodal AI

~ 6,800-file, commercially licensed corpus with fine-grained emotion and intensity labels. Applicable to SER model training, expressive TTS fine-tuning, and multimodal emotion recognition. Released by Visual Bank, a GENIAC-selected AI company. ~

Visual Bank, Inc. (Minato, Tokyo; CEO: Masayuki Nagai), through its subsidiary amanaimages Inc., has released the Emotionally Expressive Japanese Speech Dataset under its AI training data solution Qlean Dataset.

■ What Is a Japanese Emotional Speech Dataset?

A speech corpus with fine-grained emotion labels — 9 emotion categories × 2 intensity levels — recorded by native Japanese speakers. Used as ML data for Speech Emotion Recognition (SER) model training and evaluation, expressive TTS development, and multimodal emotion recognition tasks. The fixed-utterance design enables controlled cross-speaker experiments and intensity-level benchmarking unavailable in free-speech corpora.

■ Dataset Specifications

100 native Japanese speakers (ages 10s–50s, male and female) each recorded two fixed utterances across 9 emotion categories at two intensity levels (normal and strong), yielding 6,800 wav files with fine-grained emotion and intensity labels.

Data Type:	Audio (single-speaker, emotion-labeled utterances)
Speakers:	100 native Japanese speakers (ages 10s–50s, gender-balanced)
Emotions:	9 categories (Neutral, Calm, Joy, Sadness, Anger, Fear, Disgust, Surprise, Impatience)
Intensity:	2 levels (Normal / Strong)
Utterances:	"I just heard this news on the radio." "There's a mobile phone on the table."
Files:	6,800
Format:	wav
License:	Commercially licensed

Sample data & full details:: https://qleandataset.visual-bank.co.jp/en/lineup/ds-001

■ FAQ

Q: How can this dataset be used for SER development?
A: Train and evaluate 9-class × 2-intensity emotion classifiers using F0, MFCC, and spectral features. Fixed utterances enable controlled cross-speaker experiments and intensity-level accuracy analysis.

Q: Can this data be used for TTS fine-tuning?

A: Yes. Fine-tune VITS or StyleTTS on 9-emotion × 2-intensity prosody for intensity-aware expressive voice synthesis in AI characters and virtual assistants.

Q: How does this dataset support multimodal AI?

A: Combine audio, transcribed text, and speaker attributes (age, gender) to train multimodal emotion recognition models for VTubers or conversational agents.

Q: Is this dataset applicable to contact center sentiment analysis?

A: Yes. Use anger/impatience-labeled audio as ground truth for real-time SER models integrated with Google STT or Amazon Transcribe for escalation detection.

Q: Is custom recording available?

A: Yes. Additional emotions, intensity levels, age groups, or recording conditions available on request.

■ Use Cases

▷ SER Model Training & Intensity-Level Benchmarking

Train and evaluate 9-class emotion classifiers using F0, MFCC, and spectral features. Fixed utterances enable controlled cross-speaker comparison and intensity-stratified accuracy evaluation.

▷ Expressive TTS & Conversational AI Fine-Tuning

Fine-tune VITS/StyleTTS on 9-emotion × 2-intensity prosody to build voice synthesis engines capable of intensity-aware emotional expression for AI characters or virtual assistants.

▷ Multimodal Emotion Recognition Development

Combine audio, utterance text, and speaker attributes to train multimodal models for VTuber/avatar emotion inference or affective computing research.

▷ Contact Center Real-Time Sentiment Analysis

Ground-truth SER training data for anger/impatience detection. Integrate with Google STT or Amazon Transcribe custom vocabulary for real-time escalation alert systems.

▷ Model Evaluation & Benchmarking

9-emotion × 2-intensity labeled audio enables multi-class emotion classifier evaluation, intensity-level A/B testing, and WER measurement across emotional conditions.

About Qlean Dataset

Qlean Dataset is a commercially licensed AI training data solution provided by amanaimages Inc., a wholly owned subsidiary of Visual Bank. All datasets are rights-cleared for commercial use, giving AI developers a legally secure environment to source and deploy high-quality training data. The platform covers audio, image, video, 3D, and text modalities, serving foundation model developers and applied AI teams alike. Through partnerships with domestic and international data holders, broadcasters, newspapers, and newswire agencies, Qlean Dataset continuously expands its AI Data Recipe lineup. Existing datasets ship within 2 business days; custom recording and data collection also available.
URL: https://qleandataset.visual-bank.co.jp/en

URL: https://qleandataset.visual-bank.co.jp/en/lineup

About Visual Bank Inc.

Visual Bank Group is a technology company developing data infrastructure and AI solutions that support advanced AI development. The company operates THE PEN, an AI tool for manga creators, and its subsidiary, amanaimages Inc., provides commercial digital content and AI training data solutions, including Qlean Dataset. Visual Bank is also a selected participant in GENIAC, a Japanese government initiative supporting the advancement of next generation AI technologies.

CEO: Saneyuki Nagai
Website:https://visual-bank.co.jp/en

このプレスリリースには、メディア関係者向けの情報があります

メディアユーザーログイン

メディアユーザー新規登録無料

メディアユーザー登録を行うと、企業担当者の連絡先や、イベント・記者会見の情報など様々な特記情報を閲覧できます。※内容はプレスリリースにより異なります。

すべての画像