6. LangChain4j + 多模态视觉理解详细说明

阙忆然 发表于 2025-9-4 18:22:47

LangChain4j + 多模态视觉理解详细说明

@
目录

[*]LangChain4j + 多模态视觉理解详细说明

[*]LangChain4j进行图像理解

[*]LangChain4j 多模态实战

[*]结合LangChain4j进行图像理解，其支持视觉-语言的多模态任务
[*]结合阿里巴巴通义万相进行图像生成(文本生成图像)

[*]最后：

LangChain4j进行图像理解

多模态之视觉理解：

[*]https://docs.langchain4j.dev/tutorials/chat-and-language-models#multimodality

[*]TextContent 文本交流大模型
[*]ImageContent图像交流大模型
[*]AudioContent 音频交流大模型
[*]VideoContent 视频交流大模型
[*]PdfFileContent PDF 交流大模型

eg：

LangChain4j 多模态实战

首先：准备工作：切换阿里百炼大模型选择选择模型qwen-vl-max，能支持图像的qwen-vl-max。
阿里百炼地址：https://bailian.console.aliyun.com/

[*]https://help.aliyun.com/zh/model-studio/models#850732b1aabs0

[*]https://help.aliyun.com/zh/model-studio/models#3f1f1c8913fvo

结合LangChain4j进行图像理解，其支持视觉-语言的多模态任务

就是让大模型，解读我们图像信息。
编写 Moudle 项目：

[*]导入 pom.xml 依赖。

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
   <groupId>com.rainbowsea</groupId>
   langchain4j-studys</artifactId>
   <version>1.0-SNAPSHOT</version>
</parent>

langchain4j-04chat-image</artifactId>
<packaging>jar</packaging>

<name>langchain4j-04chat-image</name>
<url>http://maven.apache.org</url>

<properties>
   <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>

<dependencies>

   <dependency>
         <groupId>org.springframework.boot</groupId>
         spring-boot-starter-web</artifactId>
   </dependency>


   <dependency>
         <groupId>dev.langchain4j</groupId>
         langchain4j-open-ai</artifactId>
   </dependency>

   <dependency>
         <groupId>dev.langchain4j</groupId>
         langchain4j</artifactId>
   </dependency>

   <dependency>
         <groupId>org.projectlombok</groupId>
         lombok</artifactId>
         <optional>true</optional>
   </dependency>

   <dependency>
         <groupId>cn.hutool</groupId>
         hutool-all</artifactId>
         <version>5.8.22</version>
   </dependency>
   <dependency>
         <groupId>junit</groupId>
         junit</artifactId>
         <version>3.8.1</version>
         <scope>test</scope>
   </dependency>
</dependencies>
</project>
[*]编写大模型基本配置三件套(大模型 key，大模型 name，大模型 url)的配置类

package com.rainbowsea.langchain4j04chatimage.config;

import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.openai.OpenAiChatModel;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
* https://docs.langchain4j.dev/tutorials/chat-and-language-models/#image-content
*/
@Configuration
public class LLMConfig {
@Bean
public ChatModel ImageModel() {
   return OpenAiChatModel.builder()
            .apiKey(System.getenv("aliQwen_api"))
            //qwen-vl-max 是一个多模态大模型，支持图片和文本的结合输入，适用于视觉-语言任务。
            .modelName("qwen-vl-max")
            .baseUrl("https://dashscope.aliyuncs.com/compatible-mode/v1")
            .build();
}

}
[*]业务类的编写：
1）resources目录下放入mi.jpg图片。该图片就是用于让大模型读取的图片资料。

[*]https://docs.langchain4j.dev/tutorials/chat-and-language-models/#multimodality

package com.rainbowsea.langchain4j04chatimage.controller;

import dev.langchain4j.data.message.ImageContent;
import dev.langchain4j.data.message.TextContent;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.chat.response.ChatResponse;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.core.io.Resource;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;

import java.io.IOException;
import java.util.Base64;

/**
*
*/
@RestController
@Slf4j
public class ImageModelController {
@Autowired
private ChatModel chatModel;

@Value("classpath:static/images/mi.jpg")
// 图片也是一种资源，也可以用 @Value 进行赋值注入
// classpath表示 resources 根目录
private Resource resource; //import org.springframework.core.io.Resource;

/**
* @Description: 通过Base64编码将图片转化为字符串，结合ImageContent和TextContent
* 一起发送到模型进行处理。
* 测试地址：http://localhost:9004/image/call
*/
@GetMapping(value = "/image/call")
public String readImageContent() throws IOException {

   // 注意：这里我们的计算机还是大模型是无法直接识别，传输图片的
   // 我们需要将图片转换为 byte[] 二进制比特数据才能传输，才能识别
   byte[] byteArray = resource.getContentAsByteArray();
   String base64Data = Base64.getEncoder().encodeToString(byteArray);

   // 将发送给大模型的信息，封装到 UserMessage 对象当中
   UserMessage userMessage = UserMessage.from(
            TextContent.from("从以下图片中获取来源网站名称，股价走势和5月30号股价"),
            // mimeType 指明让大模型解读的文件类型是::image/jpg ，让大模型更容易解读
            ImageContent.from(base64Data, "image/jpg")
   );

   ChatResponse chatResponse = chatModel.chat(userMessage);
   String result = chatResponse.aiMessage().text();

   System.out.println(result);

   return result;
}
}运行测试：

结合阿里巴巴通义万相进行图像生成(文本生成图像)

[*]https://docs.langchain4j.dev/integrations/language-models/

LangChain4J引l入第3方平台和自己整合：
注意：这里我们的 DashScope 是 Qwen 通义千问。

[*]https://docs.langchain4j.dev/integrations/language-models/dashscope

官方说明，新增 Maven 配置

[*]**导入对应大模型依赖的 xml **
这里我们统一一下，将配置放到我们的顶级 pom.xml 当中。


   <langchain4j-community.version>1.0.1-beta6</langchain4j-community.version>
         <dependency>
            <groupId>dev.langchain4j</groupId>
            langchain4j-community-bom</artifactId>
            <version>${langchain4j-community.version}</version>
            <type>pom</type>
            <scope>import</scope>
         </dependency>阿里巴巴通义万相 WanxlmageModel:

[*]https://docs.langchain4j.dev/integrations/language-models/dashscope#configurable-parameters

切换通义万相-文生图模型 wanx2.1-t2i-turbo ，它支持通过一句话生成图像

[*]https://help.aliyun.com/zh/model-studio/text-to-image

[*]在我们对应的子模块的 pom.xml 当中导入我们 '通义千问文图' 的依赖 jak 包
版本信息，从顶级 pom.xml 当中继承获取

[*]切换我们配置类当中的大模型为 “wanx2.1-t2i-turbo”可以文生图的大模型

package com.rainbowsea.langchain4j04chatimage.config;

import dev.langchain4j.community.model.dashscope.WanxImageModel;
import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.openai.OpenAiChatModel;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
* https://docs.langchain4j.dev/tutorials/chat-and-language-models/#image-content
*/
@Configuration
public class LLMConfig {

/**
* @Description: 测试通义万象来实现图片生成，
* 知识出处，https://help.aliyun.com/zh/model-studio/text-to-image
* @Auther: zzyybs@126.com
*/
@Bean
public WanxImageModel wanxImageModel()
{
   return WanxImageModel.builder()
            .apiKey(System.getenv("aliQwen_api"))
            .modelName("wanx2.1-t2i-turbo") //图片生成 https://help.aliyun.com/zh/model-studio/text-to-image
            .build();
}
}
[*]编写文生图的 controller 类方法

package com.rainbowsea.langchain4j04chatimage.controller;

import com.alibaba.dashscope.aigc.imagesynthesis.ImageSynthesis;
import com.alibaba.dashscope.aigc.imagesynthesis.ImageSynthesisParam;
import com.alibaba.dashscope.aigc.imagesynthesis.ImageSynthesisResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.JsonUtils;
import dev.langchain4j.community.model.dashscope.WanxImageModel;
import dev.langchain4j.data.image.Image;
import dev.langchain4j.model.output.Response;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;

import java.io.IOException;

/**
*/
@RestController
@Slf4j
public class WanxImageModelController
{
@Autowired
private WanxImageModel wanxImageModel;

// http://localhost:9006/image/create2
@GetMapping(value = "/image/create2")
public String createImageContent2() throws IOException
{
   System.out.println(wanxImageModel);
   Response<Image> imageResponse = wanxImageModel.generate("小兔子");

   System.out.println(imageResponse.content().url());

   return imageResponse.content().url().toString();

}

// http://localhost:9006/image/create3
@GetMapping(value = "/image/create3")
public String createImageContent3() throws IOException
{

   String prompt = "近景镜头，18岁的中国女孩，古代服饰，圆脸，正面看着镜头，" +
            "民族优雅的服装，商业摄影，室外，电影级光照，半身特写，精致的淡妆，锐利的边缘。";
   ImageSynthesisParam param =
            ImageSynthesisParam.builder()
                        .apiKey(System.getenv("aliQwen-api"))
                        .model(ImageSynthesis.Models.WANX_V1)
                        .prompt(prompt)
                        .style("<watercolor>")
                        .n(1)
                        .size("1024*1024")
                     .build();

   ImageSynthesis imageSynthesis = new ImageSynthesis();
   ImageSynthesisResult result = null;

   try {
         System.out.println("---sync call, please wait a moment----");
         result = imageSynthesis.call(param);
   } catch (ApiException | NoApiKeyException e){
         throw new RuntimeException(e.getMessage());
   }

   System.out.println(JsonUtils.toJson(result));

   return JsonUtils.toJson(result);
}
}
[*]运行测试：

最后：

“在这个最后的篇章中，我要表达我对每一位读者的感激之情。你们的关注和回复是我创作的动力源泉，我从你们身上吸取了无尽的灵感与勇气。我会将你们的鼓励留在心底，继续在其他的领域奋斗。感谢你们，我们总会在某个时刻再次相遇。”

来源：程序园用户自行投稿发布，如果侵权，请联系站长删除
免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！

蝌棚煌 发表于 2025-11-12 13:42:14

感谢分享，学习下。

岳娅纯 发表于 2025-12-4 07:43:55

很好很强大我过来先占个楼待编辑

豌畔丛 发表于 2025-12-5 13:35:00

谢谢分享，辛苦了

跟尴发表于 2025-12-10 05:53:34

谢谢分享，试用一下

撙仿发表于 7 天前

感谢发布原创作品，程序园因你更精彩

页: [1]

程序园's Archiver

6. LangChain4j + 多模态视觉理解详细说明