无需GPU无需网络“本地部署chatGPT”(更新中文模型)

来源： 无需GPU无需网络“本地部署chatGPT”(更新中文模型)_Pangaroo的博客-CSDN博客

想当初图像生成从DELL到stable diffusion再到苹果的移动部署过了两三年吧
聊天bot才发展几个月就可以边缘部署了，如果苹果更新silicon，npu和运存翻倍，争取apple watch也能本地内置，最快ios18 mac、ipad、iPhone能内置吧
又是一个平民百姓都高兴的开源项目，chatGPT这种级别的模型甚至能部署到树莓派上运行，然后在操作的过程中也遇到一些问题，这篇就是记录步数的这个过程。
最近github那边更新了，我踩坑遇到的问题已经不痛用了，但我暂时又没时间研究这个，干脆好人做到底，把未更新的代码贴在下面。
已经为最新版的github更新了（2023.4.7），可以放心食用，不用下载下面的老代码链接了。

链接: https://pan.baidu.com/s/1J9FBxSDhmBcqAnHx3rGhEQ 提取码: q5xv
–来自百度网盘超级会员v6的分享
然后配合下面的模型百度云链接，大家应该就可以自己搭建语言模型了。

大佬的网址：https://github.com/ggerganov/llama.cpp

下载及生成
打开命令行输入下面的指令

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

#对于Windows和CMake，使用下面的方法构建:
cd <path_to_llama_folder>
mkdir build
cd build
cmake ..
cmake –build . –config Release
1
2
3
4
5
6
7
8
9
10

模型下载
我觉得模型下载是最麻烦的，还好有别人给了

git clone https://huggingface.co/nyanko7/LLaMA-7B
1
好吧我直接给百度云
链接: https://pan.baidu.com/s/1ZC2SCG9X8jZ-GysavQl29Q 提取码: 4ret
–来自百度网盘超级会员v6的分享

然后安装python依赖，然后转换模型到FP16格式。然后第一个小bug会出现。

python3 -m pip install torch numpy sentencepiece

# convert the 7B model to ggml FP16 format
python3 convert-pth-to-ggml.py models/7B/ 1
1
2
3
4

他会报找不到文件。

打开convert-pth-to-ggml.py文件，修改”/tokenizer.model”的路径，再运行python3 convert-pth-to-gaml.py ./models/7B 1，我顺便名字也改了。

文件找到了，然后出现第二个bug。。。。。

我一开始找不出问题，后来对比原网址和7B文件夹里的文件，才发现文件大小根本都不一样，我说几十个G的东西怎么git这么。
打开网站下图这个网址，点红色框的那两个下载。替换掉7B文件夹里的那两个文件。

将模型再转换成4位格式

# quantize the model to 4-bits
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2
1
2

推理
# run the inference
./main -m ./models/7B/ggml-model-q4_0.bin -n 128
1
2

想和chatGPT一样对话的话用下面这个指令,-n 控制回复生成的最大长度, –color是区分ai和人类的颜色，-i 作为参数在交互模式下运行， -r 是一种反向提示，-f 是一整段提示, –repeat_penalty 控制生成回复中对重复文本的惩罚力度,–temp 温度系数，值越低回复的随机性越小，反之越大。
更新了之后速度快了很多。

./main -m ./models/7B/ggml-model-q4_0.bin -n 256 –repeat_penalty 1.0 –color -i -r “User:” -f prompts/chat-with-bob.txt
1
让我们打开prompts/chat-with-bob.txt来看一下。

我们可以看到这相当于给了ai模型一个场景话题，然后你和ai之间就可以接着这个话题聊天。

我英文名叫zale，然后我把这个机器人叫作kangaroo，这样的身份和他聊天，你可以按自己的喜欢自己修改下面的代码。

./main -m ./models/7B/ggml-model-q4_0.bin -n 256 –repeat_penalty 1.0 –color -i -r “Zale:” \
1
写一个txt文件

“Transcript of a dialog, where the Zale interacts with an Assistant named Kangaroo. Kangaroo is helpful, kind, honest, good at writing, and never fails to answer the Zale’s requests immediately and with precision.

Zale: Hello, Kangaroo.
Kangaroo: Hello. How may I help you today?
Zale: Please tell me the largest city in Europe.
Kangaroo: Sure. The largest city in Europe is Moscow, the capital of Russia.
Zale:”

1
2
3
4
5
6
7
8

有点呆呆的，不过也算边缘部署的巨大进步了！
一个蛮有意思的发现，明明看得懂中文却跟我说不懂中文。。。。。

分享一段有意思的对话

中文部署
哈工大的github
https://github.com/ymcui/Chinese-LLaMA-Alpaca

git clone https://github.com/ymcui/Chinese-LLaMA-Alpaca.git
1
下载中文模型，但这不是llama.cpp要输入的模型，官方的说明是llama的中文补丁模型，需要和原版的llama/alpaca模型合并才能使用。

安装依赖

pip install git+https://github.com/huggingface/transformers
pip install sentencepiece
pip install peft
1
2
3
为了方便起见，我把llama原文件也放到了这里

还有一些注意事项

查看sha256，每个平台查看方式略微不同，可以上网搜一下如何查看sha256

整理一下llama原文件的路径

我是将transformers下载到conda里了，路径有点长。你就是找到你的convert_llama_weights_to_hf.py文件的路径就好。

python /Users/kangaroo/miniconda3/envs/pytorch/lib/python3.10/site-packages/transformers/models/llama/convert_llama_weights_to_hf.py \
–input_dir ./llama_7b \
–model_size 7B \
–output_dir ./llama_hf
1
2
3
4

合并模型

python scripts/merge_llama_with_chinese_lora.py \
–base_model ./llama_hf \
–lora_model ./chinese_llama_lora_7b \
–output_dir ./cn_llama
1
2
3
4

再把这个文件夹复制到llama.cpp/models 中

回到llama.cpp里重新量化

python convert-pth-to-ggml.py models/cn_llama/ 1

./quantize ./models/cn_llama/ggml-model-f16.bin ./models/cn_llama/ggml-model-q4_0.bin 2
1
2
3
有点话痨，我直接掐掉了，之后再看看

./main -m ./models/cn_llama/ggml-model-q4_0.bin -n 48 –repeat_penalty 1.0 –color -i -r “Zale:” -f prompts/chat-with-zale.txt
1

./main -m models/cn_llama/ggml-model-q4_0.bin –color -f ./prompts/alpaca.txt -ins -c 2048 –temp 0.2 -n 256 –repeat_penalty 1.3
1

————————————————
版权声明：本文为CSDN博主「Pangaroo」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/weixin_45569617/article/details/129553293

无需GPU无需网络“本地部署chatGPT”(更新中文模型)_Pangaroo的博客-CSDN博客

相关推荐

热门标签

分类

链接表

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏