经验首页 前端设计 程序设计 Java相关 移动开发 数据库/运维 软件/图像 大数据/云计算 其他经验
当前位置:技术经验 » 大数据/云/AI » 人工智能基础 » 查看文章
大模型工具KTransformer的安装
来源:cnblogs  作者:DECHIN  时间:2025/2/20 10:41:09  对本文有异议

技术背景

前面写过几篇关于DeepSeek的文章,里面包含了通过Ollama来加载模型,以及通过llama.cpp来量化模型(实际上Llama.cpp也可以用来加载模型,功能类似于Ollama)。这里再介绍一个国产的高性能大模型加载工具:KTransformer。但是本文仅介绍KTransformer的安装方法,由于本地的GPU太老,导致无法正常的运行KTransformer,但是编译安装的过程没什么问题。

源码安装KTransformer

首先从Github克隆下来KTransformer的仓库,然后按照官方指导流程进行构建(不要使用0.2.1!!!按照官方的说法,0.2.1版本会导致大模型降智,在最新版本中已经修复,所以最好是从最新的源代码进行安装):

  1. $ git clone https://github.com/kvcache-ai/ktransformers.git
  2. 正克隆到 'ktransformers'...
  3. remote: Enumerating objects: 1866, done.
  4. remote: Counting objects: 100% (655/655), done.
  5. remote: Compressing objects: 100% (300/300), done.
  6. remote: Total 1866 (delta 440), reused 359 (delta 355), pack-reused 1211 (from 2)
  7. 接收对象中: 100% (1866/1866), 9.33 MiB | 7.65 MiB/s, 完成.
  8. 处理 delta 中: 100% (990/990), 完成.
  9. $ cd ktransformers/
  10. $ git submodule init
  11. 子模组 'third_party/llama.cpp'https://github.com/ggerganov/llama.cpp.git)已对路径 'third_party/llama.cpp' 注册
  12. 子模组 'third_party/pybind11'https://github.com/pybind/pybind11.git)已对路径 'third_party/pybind11' 注册
  13. $ git submodule update
  14. 正克隆到 '/datb/DeepSeek/ktransformers/third_party/llama.cpp'...
  15. 正克隆到 '/datb/DeepSeek/ktransformers/third_party/pybind11'...
  16. 子模组路径 'third_party/llama.cpp':检出 'a94e6ff8774b7c9f950d9545baf0ce35e8d1ed2f'
  17. 子模组路径 'third_party/pybind11':检出 'bb05e0810b87e74709d9f4c4545f1f57a1b386f5'
  18. $ bash install.sh
  19. Successfully built ktransformers
  20. Installing collected packages: wcwidth, zstandard, tomli, tenacity, sniffio, six, pyproject_hooks, pydantic-core, psutil, propcache, orjson, ninja, multidict, jsonpointer, h11, greenlet, frozenlist, exceptiongroup, colorlog, click, attrs, async-timeout, annotated-types, aiohappyeyeballs, yarl, uvicorn, SQLAlchemy, requests-toolbelt, pydantic, jsonpatch, httpcore, build, blessed, anyio, aiosignal, starlette, httpx, aiohttp, langsmith, fastapi, accelerate, langchain-core, langchain-text-splitters, langchain, ktransformers
  21. Successfully installed SQLAlchemy-2.0.38 accelerate-1.3.0 aiohappyeyeballs-2.4.6 aiohttp-3.11.12 aiosignal-1.3.2 annotated-types-0.7.0 anyio-4.8.0 async-timeout-4.0.3 attrs-25.1.0 blessed-1.20.0 build-1.2.2.post1 click-8.1.8 colorlog-6.9.0 exceptiongroup-1.2.2 fastapi-0.115.8 frozenlist-1.5.0 greenlet-3.1.1 h11-0.14.0 httpcore-1.0.7 httpx-0.28.1 jsonpatch-1.33 jsonpointer-3.0.0 ktransformers-0.2.1+cu128torch26fancy langchain-0.3.18 langchain-core-0.3.35 langchain-text-splitters-0.3.6 langsmith-0.3.8 multidict-6.1.0 ninja-1.11.1.3 orjson-3.10.15 propcache-0.2.1 psutil-7.0.0 pydantic-2.10.6 pydantic-core-2.27.2 pyproject_hooks-1.2.0 requests-toolbelt-1.0.0 six-1.17.0 sniffio-1.3.1 starlette-0.45.3 tenacity-9.0.0 tomli-2.2.1 uvicorn-0.34.0 wcwidth-0.2.13 yarl-1.18.3 zstandard-0.23.0
  22. Installation completed successfully

到这里就是安装成功了。如果安装过程有出现报错,如下内容可以协助进行debug。

报错处理

如果出现报错:

  1. FileNotFoundError: [Errno 2] No such file or directory: '/xxx/nvcc'

这是由于CUDA_HOME设置错误导致的,可以先使用which nvcc查看一些nvcc的路径:

  1. $ which nvcc
  2. /home/llama/bin/nvcc

然后再把这个路径(按照自己的环境来配置)配置到环境变量中:

  1. $ export CUDA_HOME=/home/llama/targets/x86_64-linux

如果出现报错:

  1. sh: 1: cicc: not found

是因为cicc不在系统路径里面,可以把cicc可执行文件路径添加到环境变量中:

  1. $ export PATH=$PATH:/home/llama/nvvm/bin

有一个比较坑的问题是:

  1. make[2]: *** 没有规则可制作目标“/home/lib64/libcudart.so

改了半天PATHCUDA_HOMELD_LIBRARY_PATH,最后发现这个路径写死了lib64,而本地的CUDA路径一般是lib,我觉得可以直接把这些lib拷贝一遍:

  1. $ cp -r /home/lib/ /home/lib64/

当然,因为涉及到动态链接库的调用问题,这个方法可能不是那么优雅,但是确实解决了这个报错,避免了重装CUDA的操作。

安装测试

安装成功后,可以用如下指令测试一下是否安装成功:

  1. $ python -m ktransformers.local_chat --help
  2. NAME
  3. local_chat.py
  4. SYNOPSIS
  5. local_chat.py <flags>
  6. FLAGS
  7. --model_path=MODEL_PATH
  8. Type: Optional[str | None]
  9. Default: None
  10. -o, --optimize_rule_path=OPTIMIZE_RULE_PATH
  11. Type: Optional[str]
  12. Default: None
  13. -g, --gguf_path=GGUF_PATH
  14. Type: Optional[str | None]
  15. Default: None
  16. --max_new_tokens=MAX_NEW_TOKENS
  17. Type: int
  18. Default: 300
  19. -c, --cpu_infer=CPU_INFER
  20. Type: int
  21. Default: 10
  22. -u, --use_cuda_graph=USE_CUDA_GRAPH
  23. Type: bool
  24. Default: True
  25. -p, --prompt_file=PROMPT_FILE
  26. Type: Optional[str | None]
  27. Default: None
  28. --mode=MODE
  29. Type: str
  30. Default: 'normal'
  31. -f, --force_think=FORCE_THINK
  32. Type: bool
  33. Default: False

KTransformer加载模型

要执行local_chat,还需要安装一个flash-attn:

  1. $ python3 -m pip install flash-attn --no-build-isolation

建议的模型列表:

如果要使用KTransformer的Chat功能,需要把模型路径放在KTransformer安装的根目录上,也就是git clone之后cd进去的那一个路径。使用KTransformer加载模型的话,safetensor模型和gguf模型都要。例如从ModelScope下载DeepSeek-V2.5-IQ4_XS这个模型(在KTransformer根目录执行这个操作):

  1. $ git clone https://www.modelscope.cn/deepseek-ai/DeepSeek-V2.5.git
  2. 正克隆到 'DeepSeek-V2.5'...
  3. remote: Enumerating objects: 81, done.
  4. remote: Counting objects: 100% (81/81), done.
  5. remote: Compressing objects: 100% (80/80), done.
  6. remote: Total 81 (delta 6), reused 0 (delta 0), pack-reused 0
  7. 展开对象中: 100% (81/81), 1.47 MiB | 1017.00 KiB/s, 完成.
  8. 过滤内容: 100% (55/55), 3.10 GiB | 60.00 KiB/s, 完成.
  9. Encountered 55 file(s) that may not have been copied correctly on Windows:
  10. model-00013-of-000055.safetensors
  11. model-00052-of-000055.safetensors
  12. model-00050-of-000055.safetensors
  13. model-00027-of-000055.safetensors
  14. model-00053-of-000055.safetensors
  15. model-00016-of-000055.safetensors
  16. model-00026-of-000055.safetensors
  17. model-00051-of-000055.safetensors
  18. model-00028-of-000055.safetensors
  19. model-00039-of-000055.safetensors
  20. model-00015-of-000055.safetensors
  21. model-00014-of-000055.safetensors
  22. model-00049-of-000055.safetensors
  23. model-00040-of-000055.safetensors
  24. model-00038-of-000055.safetensors
  25. model-00021-of-000055.safetensors
  26. model-00020-of-000055.safetensors
  27. model-00025-of-000055.safetensors
  28. model-00024-of-000055.safetensors
  29. model-00037-of-000055.safetensors
  30. model-00046-of-000055.safetensors
  31. model-00047-of-000055.safetensors
  32. model-00045-of-000055.safetensors
  33. model-00012-of-000055.safetensors
  34. model-00011-of-000055.safetensors
  35. model-00023-of-000055.safetensors
  36. model-00044-of-000055.safetensors
  37. model-00048-of-000055.safetensors
  38. model-00010-of-000055.safetensors
  39. model-00035-of-000055.safetensors
  40. model-00022-of-000055.safetensors
  41. model-00033-of-000055.safetensors
  42. model-00036-of-000055.safetensors
  43. model-00043-of-000055.safetensors
  44. model-00034-of-000055.safetensors
  45. model-00031-of-000055.safetensors
  46. model-00032-of-000055.safetensors
  47. model-00054-of-000055.safetensors
  48. model-00042-of-000055.safetensors
  49. model-00030-of-000055.safetensors
  50. model-00019-of-000055.safetensors
  51. model-00002-of-000055.safetensors
  52. model-00008-of-000055.safetensors
  53. model-00018-of-000055.safetensors
  54. model-00009-of-000055.safetensors
  55. model-00007-of-000055.safetensors
  56. model-00001-of-000055.safetensors
  57. model-00041-of-000055.safetensors
  58. model-00055-of-000055.safetensors
  59. model-00005-of-000055.safetensors
  60. model-00006-of-000055.safetensors
  61. model-00004-of-000055.safetensors
  62. model-00003-of-000055.safetensors
  63. model-00017-of-000055.safetensors
  64. model-00029-of-000055.safetensors
  65. See: `git lfs help smudge` for more details.

然后在KTransformer根目录创建一个GGUF模型路径:DeepSeek-V2.5-GGUF,进入该文件夹下载相应的GGUF模型:

  1. $ modelscope download --model bartowski/DeepSeek-V2.5-GGUF DeepSeek-V2.5-IQ4_XS/DeepSeek-V2.5-IQ4_XS-00001-of-00004.gguf --local_dir ./
  2. $ modelscope download --model bartowski/DeepSeek-V2.5-GGUF DeepSeek-V2.5-IQ4_XS/DeepSeek-V2.5-IQ4_XS-00002-of-00004.gguf --local_dir ./
  3. $ modelscope download --model bartowski/DeepSeek-V2.5-GGUF DeepSeek-V2.5-IQ4_XS/DeepSeek-V2.5-IQ4_XS-00003-of-00004.gguf --local_dir ./
  4. $ modelscope download --model bartowski/DeepSeek-V2.5-GGUF DeepSeek-V2.5-IQ4_XS/DeepSeek-V2.5-IQ4_XS-00004-of-00004.gguf --local_dir ./

模型文件下载完成后,回到KTransformer根目录,使用KTransformer加载模型:

  1. $ python -m ktransformers.local_chat --model_path ./DeepSeek-V2.5/ --gguf_path ./DeepSeek-V2.5-GGUF/

运行日志大概长这样:

如果本地硬件条件正常、软件安装也正常的话,在一大串的日志之后,就可以进入到Chat的界面。如果有遇到报错,可以参考下面的这些debug内容。

报错处理

第一个有可能出现的问题是GPU管理问题:

我这里本地的硬件是2张8G的显卡,这个模型需要10GB的显存,理论上应该是够用的,但是遇到OOM报错:

  1. Traceback (most recent call last):
  2. File "/home/dechin/anaconda3/envs/llama/lib/python3.10/runpy.py", line 196, in _run_module_as_main
  3. return _run_code(code, main_globals, None,
  4. File "/home/dechin/anaconda3/envs/llama/lib/python3.10/runpy.py", line 86, in _run_code
  5. exec(code, run_globals)
  6. File "/datb/DeepSeek/ktransformers/ktransformers/local_chat.py", line 179, in <module>
  7. fire.Fire(local_chat)
  8. File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
  9. component_trace = _Fire(component, args, parsed_flag_args, context, name)
  10. File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
  11. component, remaining_args = _CallAndUpdateTrace(
  12. File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
  13. component = fn(*varargs, **kwargs)
  14. File "/datb/DeepSeek/ktransformers/ktransformers/local_chat.py", line 110, in local_chat
  15. optimize_and_load_gguf(model, optimize_rule_path, gguf_path, config)
  16. File "/datb/DeepSeek/ktransformers/ktransformers/optimize/optimize.py", line 129, in optimize_and_load_gguf
  17. load_weights(module, gguf_loader)
  18. File "/datb/DeepSeek/ktransformers/ktransformers/util/utils.py", line 85, in load_weights
  19. load_weights(child, gguf_loader, prefix+name+".")
  20. File "/datb/DeepSeek/ktransformers/ktransformers/util/utils.py", line 87, in load_weights
  21. module.load()
  22. File "/datb/DeepSeek/ktransformers/ktransformers/operators/base_operator.py", line 60, in load
  23. utils.load_weights(child, self.gguf_loader, self.key+".")
  24. File "/datb/DeepSeek/ktransformers/ktransformers/util/utils.py", line 85, in load_weights
  25. load_weights(child, gguf_loader, prefix+name+".")
  26. File "/datb/DeepSeek/ktransformers/ktransformers/util/utils.py", line 85, in load_weights
  27. load_weights(child, gguf_loader, prefix+name+".")
  28. File "/datb/DeepSeek/ktransformers/ktransformers/util/utils.py", line 85, in load_weights
  29. load_weights(child, gguf_loader, prefix+name+".")
  30. File "/datb/DeepSeek/ktransformers/ktransformers/util/utils.py", line 87, in load_weights
  31. module.load()
  32. File "/datb/DeepSeek/ktransformers/ktransformers/operators/base_operator.py", line 60, in load
  33. utils.load_weights(child, self.gguf_loader, self.key+".")
  34. File "/datb/DeepSeek/ktransformers/ktransformers/util/utils.py", line 85, in load_weights
  35. load_weights(child, gguf_loader, prefix+name+".")
  36. File "/datb/DeepSeek/ktransformers/ktransformers/util/utils.py", line 87, in load_weights
  37. module.load()
  38. File "/datb/DeepSeek/ktransformers/ktransformers/operators/linear.py", line 432, in load
  39. self.generate_linear.load(w=w)
  40. File "/datb/DeepSeek/ktransformers/ktransformers/operators/linear.py", line 215, in load
  41. w_ref, marlin_q_w, marlin_s, g_idx, sort_indices, _ = marlin_quantize(
  42. File "/datb/DeepSeek/ktransformers/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/marlin_utils.py", line 93, in marlin_quantize
  43. w_ref, q_w, s, g_idx, rand_perm = quantize_weights(w, num_bits, group_size,
  44. File "/datb/DeepSeek/ktransformers/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/quant_utils.py", line 84, in quantize_weights
  45. q_w = reshape_w(q_w)
  46. File "/datb/DeepSeek/ktransformers/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/quant_utils.py", line 81, in reshape_w
  47. w = w.reshape((size_k, size_n)).contiguous()
  48. torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 320.00 MiB. GPU 0 has a total capacity of 7.78 GiB of which 243.12 MiB is free. Including non-PyTorch memory, this process has 6.96 GiB memory in use. Of the allocated memory 6.25 GiB is allocated by PyTorch, and 623.76 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

这是因为KTransformer默认只用一张卡的显存,按照官方的说法,多卡PCIE通信开销较大,性能有可能劣化。但是架不住有些人想用低性能的显卡尝尝鲜,其实官方仓库的optimize路径下有一些关于multi-gpu的json配置文件,主要是V3和R1的,但是我发现似乎V2.5的可以直接复用V2的optimize-rule,可以参考这样来运行:

  1. $ python -m ktransformers.local_chat --model_path ./DeepSeek-V2.5/ --gguf_path ./DeepSeek-V2.5-GGUF/ -f true --port 11434 --optimize_rule_path ktransformers/optimize/optimize_rules/DeepSeek-V2-Chat-multi-gpu.yaml --cpu_infer 45

这样就手动把模型的一部分层数放到另一张显卡上面了,没有OOM的问题。

HF镜像配置

我这里本地使用的模型都是从modelscope下载的,但是KTransformer如果没有找到本地的模型文件,会从Hugging Face上面去搜索相应的模型进行下载。而由于国内的一些网络问题,可能会导致直接下载模型失败。有一种方法是配置HF的镜像地址:

  1. $ export HF_ENDPOINT=https://hf-mirror.com

这种方法我没有自己测试过,仅做一个记录,有兴趣的朋友可以自己试试。

Docker安装

如果实在是不想去配置环境,或者解决各种编译过程的问题,可以使用docker安装,已经有人做好了docker镜像approachingai/ktransformers:0.2.1。但是国内从dockerhub拉取镜像经常也会遇到网络上的问题,可以自己去找一找相关的国内镜像源,例如我这里用的是docker.1ms.run的一个临时镜像:

  1. $ docker pull docker.1ms.run/approachingai/ktransformers:0.2.1
  2. 0.2.1: Pulling from approachingai/ktransformers
  3. 43f89b94cd7d: Pull complete
  4. dd4939a04761: Pull complete
  5. b0d7cc89b769: Pull complete
  6. 1532d9024b9c: Pull complete
  7. 04fc8a31fa53: Pull complete
  8. a14a8a8a6ebc: Pull complete
  9. 7d61afc7a3ac: Pull complete
  10. 8bd2762ffdd9: Pull complete
  11. 2a5ee6fadd42: Pull complete
  12. 22ba0fb08ae2: Pull complete
  13. 4d37a6bba88f: Pull complete
  14. 4bc954eb910a: Pull complete
  15. 165aad6fabd0: Pull complete
  16. c49c630af66e: Pull complete
  17. 4f4fb700ef54: Pull complete
  18. ad3e53217d5f: Pull complete
  19. 4e5bba86d044: Pull complete
  20. 30e549924dd6: Pull complete
  21. d01b314170e5: Pull complete
  22. 819a99ec0367: Pull complete
  23. Digest: sha256:7de7da1cca07197aed37aff6e25daeb56370705f0ee05dbe3ad4803c51cc50a3
  24. Status: Downloaded newer image for docker.1ms.run/approachingai/ktransformers:0.2.1
  25. docker.1ms.run/approachingai/ktransformers:0.2.1

拉取完成后可以查看一下正在运行的容器:

  1. $ docker ps -a

现在是一个空的状态,再查看一下本地的镜像:

  1. $ docker images
  2. REPOSITORY TAG IMAGE ID CREATED SIZE
  3. docker.1ms.run/approachingai/ktransformers 0.2.1 614daa66a726 3 days ago 18.5GB

使用镜像构建一个容器,并且把本地需要用到的目录映射进去:

  1. $ docker run --gpus all -v /datb/DeepSeek/:/models --name ktransformers -itd approachingai/ktransformers:0.2.1

再次查看正在运行的容器:

  1. $ docker ps -a
  2. CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
  3. 0d3fb9d57002 docker.1ms.run/approachingai/ktransformers:0.2.1 "tail -f /dev/null" 3 minutes ago Up 3 minutes ktransformers

这里我们的容器就启动成功了,下一步就是进入到容器环境中:

  1. $ docker exec -it ktransformers /bin/bash

查看当前目录和映射目录:

  1. root@0d3fb9d57002:/workspace# ll
  2. total 12
  3. drwxr-xr-x 1 root root 4096 Feb 15 05:45 ./
  4. drwxr-xr-x 1 root root 4096 Feb 19 00:54 ../
  5. drwxrwxr-x 8 1004 1004 4096 Feb 15 06:40 ktransformers/
  6. root@0d3fb9d57002:/workspace# ll /models/
  7. total 72
  8. drwxr-xr-x 7 1001 1001 4096 Feb 17 06:19 ./
  9. drwxr-xr-x 1 root root 4096 Feb 19 00:54 ../
  10. drwxrwxr-x 3 1001 1001 4096 Feb 6 09:19 DeepSeek32B/
  11. drwxrwxr-x 12 1001 1001 4096 Feb 17 09:14 ktransformers/
  12. drwxrwxr-x 4 1001 1001 4096 Feb 12 02:31 llama/
  13. drwxrwxr-x 7 1001 1001 4096 Feb 18 06:46 models/
  14. -rwxrwxr-x 1 1001 1001 13389 Feb 5 04:16 ollama_install.sh*
  15. drwxrwxr-x 7 1001 1001 4096 Feb 14 08:31 page-share-app/
  16. -rw-rw-r-- 1 1001 1001 22239 Feb 12 02:14 running.log
  17. -rw-rw-r-- 1 1001 1001 183 Feb 12 02:09 test_llama_cpp.py

类似于上面的使用方法,直接启动Chat,只不过需要相应的修改一下路径配置:

  1. root@0d3fb9d57002:/models# python3 -m ktransformers.local_chat --gguf_path ktransformers/DeepSeek-V2.5-GGUF --model_path ktransformers/DeepSeek-V2.5 --cpu_infer 45 -f true --port 11434 --optimize_rule_path ktransformers/ktransformers/optimize/optimize_rules/DeepSeek-V2-Chat-multi-gpu.yaml

如果本地的硬件条件没有问题,这里应该会直接进入到Chat界面。

TORCH_USE_CUDA_DSA报错

接下来说一个无解的问题,在执行local_chat的编译过程,有可能弹出来的一个报错:

  1. loading token_embd.weight to cpu
  2. /opt/conda/lib/python3.10/site-packages/ktransformers/util/custom_gguf.py:644: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1716905979055/work/torch/csrc/utils/tensor_numpy.cpp:206.)
  3. data = torch.from_numpy(data)
  4. Traceback (most recent call last):
  5. File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
  6. return _run_code(code, main_globals, None,
  7. File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
  8. exec(code, run_globals)
  9. File "/opt/conda/lib/python3.10/site-packages/ktransformers/local_chat.py", line 179, in <module>
  10. fire.Fire(local_chat)
  11. File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
  12. component_trace = _Fire(component, args, parsed_flag_args, context, name)
  13. File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
  14. component, remaining_args = _CallAndUpdateTrace(
  15. File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
  16. component = fn(*varargs, **kwargs)
  17. File "/opt/conda/lib/python3.10/site-packages/ktransformers/local_chat.py", line 110, in local_chat
  18. optimize_and_load_gguf(model, optimize_rule_path, gguf_path, config)
  19. File "/opt/conda/lib/python3.10/site-packages/ktransformers/optimize/optimize.py", line 129, in optimize_and_load_gguf
  20. load_weights(module, gguf_loader)
  21. File "/opt/conda/lib/python3.10/site-packages/ktransformers/util/utils.py", line 85, in load_weights
  22. load_weights(child, gguf_loader, prefix+name+".")
  23. File "/opt/conda/lib/python3.10/site-packages/ktransformers/util/utils.py", line 87, in load_weights
  24. module.load()
  25. File "/opt/conda/lib/python3.10/site-packages/ktransformers/operators/base_operator.py", line 60, in load
  26. utils.load_weights(child, self.gguf_loader, self.key+".")
  27. File "/opt/conda/lib/python3.10/site-packages/ktransformers/util/utils.py", line 85, in load_weights
  28. load_weights(child, gguf_loader, prefix+name+".")
  29. File "/opt/conda/lib/python3.10/site-packages/ktransformers/util/utils.py", line 85, in load_weights
  30. load_weights(child, gguf_loader, prefix+name+".")
  31. File "/opt/conda/lib/python3.10/site-packages/ktransformers/util/utils.py", line 85, in load_weights
  32. load_weights(child, gguf_loader, prefix+name+".")
  33. File "/opt/conda/lib/python3.10/site-packages/ktransformers/util/utils.py", line 87, in load_weights
  34. module.load()
  35. File "/opt/conda/lib/python3.10/site-packages/ktransformers/operators/base_operator.py", line 60, in load
  36. utils.load_weights(child, self.gguf_loader, self.key+".")
  37. File "/opt/conda/lib/python3.10/site-packages/ktransformers/util/utils.py", line 85, in load_weights
  38. load_weights(child, gguf_loader, prefix+name+".")
  39. File "/opt/conda/lib/python3.10/site-packages/ktransformers/util/utils.py", line 87, in load_weights
  40. module.load()
  41. File "/opt/conda/lib/python3.10/site-packages/ktransformers/operators/linear.py", line 430, in load
  42. self.generate_linear.load(w=w)
  43. File "/opt/conda/lib/python3.10/site-packages/ktransformers/operators/linear.py", line 215, in load
  44. w_ref, marlin_q_w, marlin_s, g_idx, sort_indices, _ = marlin_quantize(
  45. File "/opt/conda/lib/python3.10/site-packages/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/marlin_utils.py", line 93, in marlin_quantize
  46. w_ref, q_w, s, g_idx, rand_perm = quantize_weights(w, num_bits, group_size,
  47. File "/opt/conda/lib/python3.10/site-packages/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/quant_utils.py", line 61, in quantize_weights
  48. w = w.reshape((group_size, -1))
  49. RuntimeError: CUDA error: no kernel image is available for execution on the device
  50. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
  51. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
  52. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

或者是在进入到Chat界面后,输入任意的问题,有可能弹出来的一个报错:

  1. Chat: who are you?
  2. Traceback (most recent call last):
  3. File "/home/dechin/anaconda3/envs/llama/lib/python3.10/runpy.py", line 196, in _run_module_as_main
  4. return _run_code(code, main_globals, None,
  5. File "/home/dechin/anaconda3/envs/llama/lib/python3.10/runpy.py", line 86, in _run_code
  6. exec(code, run_globals)
  7. File "/datb/DeepSeek/ktransformers/ktransformers/local_chat.py", line 179, in <module>
  8. fire.Fire(local_chat)
  9. File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
  10. component_trace = _Fire(component, args, parsed_flag_args, context, name)
  11. File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
  12. component, remaining_args = _CallAndUpdateTrace(
  13. File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
  14. component = fn(*varargs, **kwargs)
  15. File "/datb/DeepSeek/ktransformers/ktransformers/local_chat.py", line 173, in local_chat
  16. generated = prefill_and_generate(
  17. File "/datb/DeepSeek/ktransformers/ktransformers/util/utils.py", line 154, in prefill_and_generate
  18. logits = model(
  19. File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
  20. return self._call_impl(*args, **kwargs)
  21. File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
  22. return forward_call(*args, **kwargs)
  23. File "/datb/DeepSeek/ktransformers/ktransformers/models/modeling_deepseek.py", line 1731, in forward
  24. outputs = self.model(
  25. File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
  26. return self._call_impl(*args, **kwargs)
  27. File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
  28. return forward_call(*args, **kwargs)
  29. File "/datb/DeepSeek/ktransformers/ktransformers/operators/models.py", line 722, in forward
  30. layer_outputs = decoder_layer(
  31. File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
  32. return self._call_impl(*args, **kwargs)
  33. File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
  34. return forward_call(*args, **kwargs)
  35. File "/datb/DeepSeek/ktransformers/ktransformers/models/modeling_deepseek.py", line 1238, in forward
  36. hidden_states, self_attn_weights, present_key_value = self.self_attn(
  37. File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
  38. return self._call_impl(*args, **kwargs)
  39. File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
  40. return forward_call(*args, **kwargs)
  41. File "/datb/DeepSeek/ktransformers/ktransformers/operators/attention.py", line 407, in forward
  42. return self.forward_windows(
  43. File "/datb/DeepSeek/ktransformers/ktransformers/operators/attention.py", line 347, in forward_windows
  44. return self.forward_chunck(
  45. File "/datb/DeepSeek/ktransformers/ktransformers/operators/attention.py", line 85, in forward_chunck
  46. q = self.q_b_proj(self.q_a_layernorm(self.q_a_proj(hidden_states)))
  47. File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
  48. return self._call_impl(*args, **kwargs)
  49. File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
  50. return forward_call(*args, **kwargs)
  51. File "/datb/DeepSeek/ktransformers/ktransformers/models/modeling_deepseek.py", line 113, in forward
  52. hidden_states = hidden_states.to(torch.float32)
  53. RuntimeError: CUDA error: invalid device function
  54. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
  55. For debugging consider passing CUDA_LAUNCH_BLOCKING=1
  56. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

类似于这种TORCH_USE_CUDA_DSA的问题,按照官方的说法,很大概率是由于显卡型号老旧导致的。所以,这个就很无解,只能通过换显卡的方式来解决了。

总结概要

本文主要介绍的是国产高性能大模型加载工具KTransformer的安装方法。之所以是使用方法,是因为该工具对本地的硬件条件还是有一定的要求。如果是型号过于老旧的显卡,有可能出现TORCH_USE_CUDA_DSA相关的一个报错。而这个问题只能通过换显卡来解决,所以作者本地并未完全测试成功,只是源码安装方法和Docker安装方法经过确认没有问题。

版权声明

本文首发链接为:https://www.cnblogs.com/dechinphy/p/ktransformer.html

作者ID:DechinPhy

更多原著文章:https://www.cnblogs.com/dechinphy/

请博主喝咖啡:https://www.cnblogs.com/dechinphy/gallery/image/379634.html

原文链接:https://www.cnblogs.com/dechinphy/p/18719866/ktransformer

 友情链接:直通硅谷  点职佳  北美留学生论坛

本站QQ群:前端 618073944 | Java 606181507 | Python 626812652 | C/C++ 612253063 | 微信 634508462 | 苹果 692586424 | C#/.net 182808419 | PHP 305140648 | 运维 608723728

W3xue 的所有内容仅供测试,对任何法律问题及风险不承担任何责任。通过使用本站内容随之而来的风险与本站无关。
关于我们  |  意见建议  |  捐助我们  |  报错有奖  |  广告合作、友情链接(目前9元/月)请联系QQ:27243702 沸活量
皖ICP备17017327号-2 皖公网安备34020702000426号