Windows本地安装LLaMA-Factory

周濡霈 发表于 2025-9-4 01:37:12

以下是LLaMA-Factory官方推荐的依赖组件及其版本，如果在linux上安装建议使用表格中的推荐版本，但是在windows上安装时，由于各组件提供的windows版本没有linux版本完备，为了兼容性考虑可节省时间（使用发布的wheel包而不是本地编译），这里并没有完全采用官方推荐的版本。

以下为window本地安装LLaMA-Factory的详细步骤
1、更新显卡驱动（推荐使用nvidia显卡）

[*]访问 NVIDIA 驱动程序下载。
[*]选择你的显卡型号，下载最新的 Game Ready Driver 或 Studio Driver。
[*]运行安装程序，选择“自定义安装”和“执行清洁安装”，完成后重启电脑。
在windows上安装LLaMA-Factory，需要安装windows版本的PyTorch 、bitsandbytes 和FlashAttention
2、安装 CUDA Toolkit

[*]根据准备使用的PyTorch 、bitsandbytes 和FlashAttention的版本来决定CUDA的版本，不同版本可能存在不兼容的情况，例如不同版本的bitsandbytes 需要指定版本的PyTorch和CUDA toolkit，并且不同版本的PyTorch对CUDA toolkit的版本也有要求，因此不要盲目安装最新版 CUDA。（本文中使用CUDA12.1，https://developer.nvidia.com/cuda-12-1-0-download-archive?target_os=Windows&target_arch=x86_64&target_version=11&target_type=exe_local）
[*]访问 CUDA Toolkit 下载，选择与 PyTorch 匹配的版本（例如 12.1）、系统（Windows）、架构（x86_64）和安装类型（exe ）。（https://developer.nvidia.com/cuda-toolkit-archive：下载历史CUDA版本）
[*]运行安装程序，选择“自定义”安装，组件保持默认全选即可

3、安装Conda

LLaMA-Factory的安装需要安装大量的python包和其他组件，使用Conda可以有效避免python版本冲突带来的问题

[*]下载Conda，Distribution Installers，Miniconda Installers均可（Download Success | Anaconda）
[*]初始化环境变量conda init
[*]创建conda虚拟环境，python使用3.10版本# 创建 Python 3.10 环境
conda create -n llama-factory python=3.10

# 激活环境
conda activate llama-factory
4、安装Visual Studio Build Tools

如果安装了visual studio，则不需要再单独安装
5、安装PyTorch

查看 PyTorch 支持的版本：访问 PyTorch 官网。安装与CUDA版本兼容的PyTorch版本（PyTorch Version: 2.5.1+cu121）
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
使用以下脚本验证PyTorch是否成功安装
import torch

# 1. 打印PyTorch版本
print(f"PyTorch Version: {torch.__version__}")

# 2. 打印PyTorch构建所用的CUDA版本（这里显示12.1是正常的）
print(f"PyTorch CUDA Version: {torch.version.cuda}")

# 3. 最关键的一步：检查CUDA是否可用
print(f"CUDA Available: {torch.cuda.is_available()}")

# 4. 如果可用，打印GPU信息
if torch.cuda.is_available():
print(f"Number of GPUs: {torch.cuda.device_count()}")
print(f"Current GPU Name: {torch.cuda.get_device_name(0)}")
print(f"Current GPU Index: {torch.cuda.current_device()}")

# 5. 做一个简单的张量运算来测试功能
x = torch.tensor().cuda()
y = torch.tensor().cuda()
z = x + y
print(f"Tensor computation on GPU: {z}")
print(f"Tensor device: {z.device}")6、安装bitsandbytes （如果不需要启用量化LoRA，可跳过此步）

访问Release Wheels · jllllll/bitsandbytes-windows-webui · GitHub查看release的wheel文件，根据安装的CUDA toolkit版本（12.1）和PyTorch版本（2.5.1+cu121）选择与之兼容的bitsandbytes版本，下载wheel文件并安装
pip install bitsandbytes-0.41.1-py3-none-win_amd64.whl使用以下脚本验证bitsandbytes是否成功安装
import bitsandbytes as bnb
# 这个操作会触发bitsandbytes加载CUDA库，并显示其编译/链接的CUDA版本。
# 通常如果成功导入且无报错，就说明它找到了匹配的CUDA环境。

# 更直接的方法：创建一个量化层，看是否报错
try:
# 尝试创建一个4bit量化层，这会用到CUDA kernel
linear = bnb.nn.Linear4bit(10, 20)
print("✅ bitsandbytes 安装成功，并且CUDA运行正常！")
print(f" 它正在使用与PyTorch相同的CUDA环境。")
except Exception as e:
print(f"❌ 错误: {e}")7、安装flash-attention（lldacing/flash-attention-windows-wheel · Hugging Face）（如果不需要启用 FlashAttention-2，可跳过此步）

首先查看 Releases · kingbri1/flash-attention 上有没有编辑好的兼容本地CUDA toolkit版本（12.1）和PyTorch版本（2.5.1+cu121）的wheel包，有的话直接下载安装即可，没有的话则需要按照以下步骤在本地build wheel包：

[*]clone flash-attention 的源码到本地，Dao-AILab/flash-attention: Fast and memory-efficient exact attention
[*]根据实际情况（例如CUDA toolkit版本和PyTorch版本）选择使用的代码版本，这里使用了 v2.7.0.post2
[*]使用 lldacing/flash-attention-windows-wheel · Hugging Face 中提供的WindowsWhlBuilder_cuda.bat文件buildwheel包，其中‘CUDA_ARCH’ 参数要根据本地显卡型号做设置，可通过以下命令获取，不同 NVIDIA 显卡对应不同的数值（格式为主版本.次版本，通常简化为整数，如 8.9 简写为 89）nvidia-smi --query-gpu=name,compute_cap --format=csv
https://img2024.cnblogs.com/blog/109287/202509/109287-20250903162559331-557543274.png
　　4. 在‘Native Tools Command Prompt for Visual Studio’中执行脚本，注意需要激活创建的conda虚拟环境（llama-factory），编译过程中会使用虚拟环境中安装的CUDA、PyTorch和Python版本
WindowsWhlBuilder_cuda.bat CUDA_ARCH="89" FORCE_CXX11_ABI=TRUE编译过程根据机器性能可能花费几十分钟到几小时不等（本人用了7小时），编译好的wheel包，例如‘flash_attn-2.7.0.post2+cu121torch2.5.1cxx11abiFALSE-cp310-cp310-win_amd64.whl’，代表flash-attention的版本是2.7.0.post2，CUDA的版本是12.1，torch的版本是2.5.1，python的版本是3.10
最后使用编译好的wheel包安装flash-attention
pip install flash_attn-2.7.0.post2+cu121torch2.5.1cxx11abiFALSE-cp310-cp310-win_amd64.whl使用以下脚本验证flash-attention是否成功安装
import torch
import flash_attn

print("="*50)
print("验证环境配置")
print("="*50)
print(f"PyTorch 版本: {torch.__version__}")
print(f"PyTorch CUDA 版本: {torch.version.cuda}")
print(f"CUDA 是否可用: {torch.cuda.is_available()}")
print(f"GPU 设备: {torch.cuda.get_device_name(0)}")

print(f"\nFlashAttention 版本: {flash_attn.__version__}")
print("\n✅ 验证成功！FlashAttention 已安装并可正常导入。")
print(" 它正在使用您PyTorch环境中的CUDA 12.1。")

# 可选：进行一个简单的前向计算测试（如果担心运行时错误）
print("\n进行简单计算测试...")
try:
dim = 64
q = torch.randn(1, 8, 128, dim, device='cuda', dtype=torch.float16)
k = torch.randn(1, 8, 128, dim, device='cuda', dtype=torch.float16)
v = torch.randn(1, 8, 128, dim, device='cuda', dtype=torch.float16)

output = flash_attn.flash_attn_func(q, k, v, causal=True)
print("✅ 计算测试通过！FlashAttention CUDA kernel 工作正常。")
except Exception as e:
print(f"❌ 计算测试失败: {e}") 7、安装LLaMA-Factory

Clone LLama-Factory源码（hiyouga/LLaMA-Factory: Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)），根据提供的文档安装即可（安装 - LLaMA Factory），核心安装命令
pip install -e "."启动webui
llamafactory-cli webui访问webui：http://localhost:7860/，大功告成！！！！！！！！

来源：程序园用户自行投稿发布，如果侵权，请联系站长删除
免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！

硫辨姥 发表于 2025-12-14 15:20:08

这个有用。

方方仪 发表于 2025-12-20 08:39:00

喜欢鼓捣这些软件，现在用得少，谢谢分享！

劳怡月 发表于 2026-1-7 20:25:43

分享、互助让互联网精神温暖你我

韶侪发表于 2026-1-9 11:21:11

懂技术并乐意极积无私分享的人越来越少。珍惜

创蟀征 发表于 2026-1-10 05:48:38

热心回复！

寨重发表于 2026-1-18 04:25:55

懂技术并乐意极积无私分享的人越来越少。珍惜

僭墙覆 发表于 2026-1-21 16:28:11

这个好，看起来很实用

倡遍竽 发表于 2026-1-22 07:30:54

分享、互助让互联网精神温暖你我

能氐吨 发表于 2026-1-22 14:06:36

感谢，下载保存了

宁觅波 发表于 2026-1-22 21:52:10

这个有用。

玲液发表于 2026-1-23 06:31:21

感谢，下载保存了

廖雯华 发表于 2026-1-23 11:37:41

谢谢楼主提供！

佟棠华 发表于 2026-1-24 04:31:20

感谢分享，下载保存了，貌似很强大

饨篦发表于 2026-1-24 11:20:42

喜欢鼓捣这些软件，现在用得少，谢谢分享！

喳谍发表于 2026-1-27 02:26:29

这个有用。

甘子萱 发表于 2026-2-2 02:53:13

新版吗？好像是停更了吧。

峰襞副 发表于 2026-2-5 04:37:02

感谢分享

兑谓发表于 2026-2-5 08:51:28

很好很强大我过来先占个楼待编辑

印萍发表于 2026-2-6 04:23:11

分享、互助让互联网精神温暖你我

页: [1] 2

程序园's Archiver

Windows本地安装LLaMA-Factory