Qlean Dataset、「子ども版・日本語・日常会話音声コーパスデータセット」を提供開始

〜GENIAC採択企業のVisual Bank、子ども音声理解・教育AI・対話モデル開発を支援〜

Visual Bank株式会社

2025年10月28日 13時00分

Visual Bank株式会社（東京都港区、代表取締役CEO 永井真之、以下「Visual Bank」）は、傘下の株式会社アマナイメージズを通じて、あらゆる研究・商業AI開発に対応するAI学習用データソリューション『Qlean Dataset（キュリンデータセット）*』の提供を推進しています。
このたび『子ども版・日本語・日常会話音声コーパスデータセット』をラインナップに追加し、独自に構築したAI開発用データのラインナップ『AIデータレシピ*』の拡充を進めています。

*Qlean Dataset（キュリンデータセット）：https://qleandataset.visual-bank.co.jp/

*AIデータレシピ：https://qleandataset.visual-bank.co.jp/lineup

『Qlean Dataset（キュリンデータセット）』の「AIデータレシピ」について

『AIデータレシピ』は、『Qlean Dataset』における商用利用可能なオリジナルデータラインナップです。

用途や精度・納期に応じて、すぐに使えるデータ素材を柔軟に組み合わせられる構成が特長で、一部アノテーション済み／未付与のデータや、個別要件に応じた構成変更・拡張にも対応可能です。

また、株式会社千葉ロッテマリーンズや株式会社東洋経済新報社とのパートナーシップ、国内外のネットワーク、新規収録などを通じて、ラインナップの拡充を進めています。

これにより、AI開発現場でのデータ収集・整備にかかる負荷を大幅に軽減し、開発の加速に貢献します。

今回提供を開始する「子ども版・日本語・日常会話音声コーパスデータセット」の概要

データ種別：音声
被写体属性：日本人の子供
データ形式：音声データ：wav
備考

[収録時間] 1音声20分ほど

[対象のシーン] 日常会話など
サンプル詳細URL：https://qleandataset.visual-bank.co.jp/lineup/pn-030

「子ども版・日本語・日常会話音声コーパスデータセット」のユースケースイメージ

子ども音声認識AIの精度向上

日本語話者の子ども同士による自然な日常会話を収録したコーパスで、発音の揺らぎや語尾変化など年少話者特有の特徴を網羅しています。
この日本語音声データは、年齢別の音声認識（ASR）モデルや子ども向け日本語音声アシスタントの精度向上にご活用いただけます。
教育・発達支援AIの研究開発

日常的な日本語会話の発話構造をコーパス化しており、発達段階ごとの言語理解・応答傾向を定量的に分析できます。
教育AI、読み聞かせAI、児童発達支援AIなど、日本語音声理解を伴う教育系モデルの研究開発に最適です。
子ども向け会話アシスタント・教育ロボットの開発

子ども同士の自然な会話テンポ・イントネーションを含む音声コーパスを活用することで、
子どもが親しみやすい会話体験を実現する日本語対話AIや教育ロボットの開発が可能です。
自然なやり取りを再現する音声UX開発に役立ちます。
音声感情認識・共感応答AIの学習データ

笑い声・声の高さ・間（ま）など、子ども特有の感情的表現を含む日本語会話音声データを利用することで、
感情音声認識AIや共感応答型会話AIのトレーニングに活用できます。
教育現場や家庭用AIにおける「心理的にやさしい対話体験」設計を支援します。
日本語会話モデル・音声LLMの学習素材

本コーパスは、子どもらしい日本語の文法・語彙・発話構造を豊富に含むため、
日本語会話モデルやマルチモーダル音声LLMのチューニングデータとして有用です。
子ども対応チャットボットや音声生成モデルの自然対話精度向上に寄与します。
言語発達・社会言語学研究への応用（アカデミア用途）

子ども同士の日本語会話を体系的に収録したコーパスとして、言語発達や社会言語学の研究に貴重な基盤を提供します。
年齢ごとの語彙多様性や会話構造の発達的変化を分析する教育・研究分野での利用が期待されます。

『Qlean Dataset』の提供するデータセットの特徴

研究開発、商用利用に対応

Qlean Datasetの提供するデータセットは、データ取得およびAI開発への利用に関する同意書を「すべての被写体」から取得しており、各国のプライバシーポリシー等にも対応しているため安心して研究・商用利用いただくことが可能です。
「AIデータレシピ」からデータセットを提供するため、スピーディーかつROIを最大化

AIデータレシピというQlean Dataset独自の提供形態を取ることにより、初期投資を抑えたデータ調達を行っていただくことが可能です。
「AIデータレシピ」のラインナップにないデータセットは、個別要件に従った作成・構築も可能

独自性の高いデータについても『Qlean Dataset』のケイパビリティを活用し、個別最適化された要件のデータセットをご提供可能です。

Qlean Dataset お問い合わせフォーム：https://qleandataset.visual-bank.co.jp/contact

Qlean Dataset サービスサイトURL：https://qleandataset.visual-bank.co.jp/

ともに、AI開発を支えるデータパートナーを募集

Visual Bankでは、AI開発を支える多様なデータ提供体制を強化するため、音声・画像・動画・3Dなどの各領域でデータパートナーシップの拡大を進めています。

Qlean Datasetは、信頼できるパートナーとの連携を通じて、AI時代に対応した知的財産保護とデータの価値最大化の両立を目指しています。

研究機関・企業・クリエイターの皆様と共に、安心してデータを活用できる環境を築いてまいります。

Qlean Dataset パートナー詳細URL：https://qleandataset.visual-bank.co.jp/partner

Visual Bank株式会社

AI開発力を最大化する次世代型データインフラを構築・提供するスタートアップ企業として、「あらゆるデータの可能性を解き放つ」をミッションに掲げ事業活動を展開。漫画家の「もっと描きたい！」をサポートするAI補助ツールを提供する『THE PEN』の他、AI学習用データセット開発サービス『Qlean Dataset（キュリンデータセット）』を提供する株式会社アマナイメージズを100%子会社に持つ。

また、Visual Bankは国の研究開発プログラム「GENIAC」にも採択され、社会実装に向けた取り組みを加速させています。

代表取締役CEO：永井真之

所在地：〒107-0062 東京都港区南青山7-1-7C-Cube南青山ビル6F

Visual Bank企業URL：https://visual-bank.co.jp/

アマナイメージズ企業URL：　https://amanaimages.com/about/

【Translation】

New “Children’s Japanese Speech Corpus” Joins Qlean Dataset Lineup

Expanding AI Data Recipes for Speech Recognition, Emotion Analysis, and Educational AI

Visual Bank Inc. (Minato-ku, Tokyo; CEO: Saneyuki Nagai) promotes the provision of its AI training data solution “Qlean Dataset,” developed through its subsidiary Amana Images Inc. Designed to support both research and commercial AI development, Qlean Dataset offers a diverse range of original datasets—collectively called “AI Data Recipes”—that enable flexible, scalable, and rights-cleared data sourcing.

The new addition, “Japanese Children’s Conversational Speech Corpus,” further expands the lineup of AI Data Recipe with data specifically tailored for speech recognition, language development, and educational AI applications.
▶ AI Data Recipes: https://qleandataset.visual-bank.co.jp/en/lineup

▶ Learn more about Qlean Dataset: https://qleandataset.visual-bank.co.jp/en

About “AI Data Recipe” in Qlean Dataset

“AI Data Recipe” are original, commercially usable datasets provided under Qlean Dataset.
They can be flexibly combined according to project goals, accuracy requirements, and delivery schedules—available both with and without annotation.

The lineup continues to expand through partnerships with organizations such as Chiba Lotte Marines and Toyo Keizai Inc., along with newly recorded materials and international collaborations. This structure greatly reduces the burden of data preparation in AI development while accelerating project execution.

Overview of “Japanese Children’s Conversational Speech Corpus”

Data type: Audio (WAV format)
Subjects: Japanese children
Recording length: Approx. 20 minutes per audio
Scene type: Everyday conversations among children
Audio Rate:48kHz / 16-bit
Sample details: https://qleandataset.visual-bank.co.jp/lineup/pn-030

Use Cases of the Dataset

Improving ASR for Child Speech
This corpus captures natural daily conversations among Japanese-speaking children, including pronunciation variations and age-specific phonetic traits—ideal for training Automatic Speech Recognition (ASR) models or voice assistants targeting young users.
Research on Educational and Developmental AI
The dataset enables quantitative analysis of linguistic comprehension and response tendencies by age, supporting models for educational AI, reading assistants, and developmental support AI.
Conversational AI and Educational Robots
By leveraging children’s natural tempo and intonation, developers can build Japanese dialogue AI and educational robots that deliver more engaging and child-friendly conversational experiences.
Emotion and Empathy AI Training
Containing laughter, pitch variations, and pauses unique to children’s emotional expression, the corpus is suitable for training emotion recognition or empathetic response AI systems—useful in educational and household AI environments.
Japanese Speech and Multimodal LLM Training
Rich in child-specific grammar, vocabulary, and conversational patterns, this dataset can be used for tuning Japanese dialogue models and speech-based LLMs.
Linguistic and Sociolinguistic Research
For academic use, the corpus provides a valuable foundation for studying vocabulary diversity, grammar development, and conversational patterns in child language acquisition.

Features of Qlean Dataset

Research and commercial use supported:
All data subjects have provided explicit consent for data collection and AI use, ensuring compliance with global privacy standards.
Speed and ROI through modular “AI Data Recipe”:
The unique structure of AI Data Recipe enables fast, cost-efficient data acquisition and integration.
Custom datasets available:
Qlean Dataset can create tailored datasets to meet specific requirements, leveraging its full data production and annotation capabilities.

Contact form: https://qleandataset.visual-bank.co.jp/en/contact
Service site: https://qleandataset.visual-bank.co.jp/en/

About Visual Bank Inc.

Visual Bank Inc. is a next-generation data infrastructure company committed to “unleashing the potential of all data.”
The company operates THE PEN, an AI-powered assistance tool for manga artists, and wholly owns Amana Images Inc., which provides the AI training data service Qlean Dataset.

Visual Bank has been recognized in national R&D programs and continues to advance initiatives toward real-world AI implementation.

CEO: Saneyuki Nagai
Address: C-Cube Minami Aoyama Bldg. 6F, 7-1-7 Minami Aoyama, Minato-ku, Tokyo 107-0062
Corporate website: https://visual-bank.co.jp/en/
Amana Images overview: https://qleandataset.visual-bank.co.jp/en/company-overview

このプレスリリースには、メディア関係者向けの情報があります

メディアユーザーログイン

メディアユーザー新規登録無料

メディアユーザー登録を行うと、企業担当者の連絡先や、イベント・記者会見の情報など様々な特記情報を閲覧できます。※内容はプレスリリースにより異なります。

すべての画像