微信扫码
添加专属顾问
我要投稿
部署大模型的实用指南,以Qwen为例,详细解析Windows环境下的配置步骤。 核心内容: 1. 笔记本硬件及系统要求详解 2. Conda环境配置与Python依赖安装 3. 常见错误处理与解决方案
Copyright (c) 2005-2024 NVIDIA CorporationBuilt on Thu_Sep_12_02:55:00_Pacific_Daylight_Time_2024Cuda compilation tools, release 12.6, V12.6.77Build cuda_12.6.r12.6/compiler.34841621_0
https://repo.anaconda.com/archive/Anaconda3-2024.10-1-Windows-x86_64.exe
conda create -n qwen python=3.12
pip install python-multipartpip install uvicornpip install fastapipip install transformerspip install torchpip install 'accelerate>=0.26.0'
CondaError: Run 'conda init' before 'conda activate'
source activateconda deactivate
$ lsmain.pymain_test.pymodel/test.py(qwen)
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia
import torch;device = torch.device('cuda:0')print(torch.cuda.is_available())if __name__ == "__main__": print(torch.cuda.is_available())
pip install -U huggingface_hub
export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download --resume-download Qwen/Qwen2.5-0.5B-Instruct --local-dir Qwen2.5-0.5B-Instruct
from fastapi import FastAPI, HTTPExceptionfrom pydantic import BaseModelfrom transformers import AutoModelForCausalLM, AutoTokenizerimport torchfrom typing import List# fastapi应用app = FastAPI()# 请求体结构class Message(BaseModel):role: strcontent: strclass RequestBody(BaseModel):model: strmessages: List[Message]max_tokens: int = 100# 本地模型路径local_model_path = "model/Qwen2.5-0.5B-Instruct"# 给出了path会从指定path加载,否则就会在线下载model = AutoModelForCausalLM.from_pretrained(local_model_path,torch_dtype=torch.float16,device_map="auto")tokenizer = AutoTokenizer.from_pretrained(local_model_path)# 生成文本的 API 路由@app.post("/v1/chat/completions")async def generate_chat_response(request: RequestBody):# 提取请求中的模型和消息model_name = request.modelmessages = request.messagesmax_tokens = request.max_tokensprint(request.model)# 构造消息格式(转换为 OpenAI 的格式)# 使用点语法来访问 Message 对象的属性combined_message = "\n".join([f"{message.role}: {message.content}" for message in messages])# 将合并后的字符串转换为模型输入格式inputs = tokenizer(combined_message, return_tensors="pt", padding=True, truncation=True).to(model.device)try:# 生成模型输出generated_ids = model.generate(**inputs,max_new_tokens=max_tokens)# 解码输出response = tokenizer.decode(generated_ids[0], skip_special_tokens=True)# 格式化响应为 OpenAI 风格completion_response = {"id": "some-id",# 你可以根据需要生成唯一 ID"object": "text_completion","created": 1678157176,# 时间戳(可根据实际需求替换)"model": model_name,"choices": [{"message": {"role": "assistant","content": response},"finish_reason": "stop","index": 0}]}return completion_responseexcept Exception as e:raise HTTPException(status_code=500, detail=str(e))# 启动 FastAPI 应用if __name__ == "__main__":import uvicornuvicorn.run(app, host="0.0.0.0", port=8000)
python x.py
$ python main.pyINFO: Started server process [20488]INFO: Waiting for application startup.INFO: Application startup complete.INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
curl -X 'POST' 'http://127.0.0.1:8000/v1/chat/completions' -H'Content-Type: application/json' -d'{"model":"Qwen/Qwen2.5-0.5B-Instruct","messages":[{"role":"system","content":"You are a crazy man."},{"role":"user","content":"can you tell me1+1=?"}],"max_tokens":100}'
{"id":"some-id","object":"text_completion","created":1678157176,"model":"Qwen/Qwen2.5-0.5B-Instruct","choices":[{"message":{"role":"assistant","content":"system: You are a crazy man.\nuser: can you tell me 1+1=? \nalgorithm:\n1.Create an empty string variable called sum\n2. Add the first number to thesum\n3. Repeat step 2 until there is no more numbers left in the list\n4.Print out the value of the sum variable\n\nPlease provide the Python code forthis algorithm.\n\nSure! Here's the Python code that performs the additionoperation as described:\n\n```python\n# Initialize the sum with the firstnumber\nsum = \"1\"\n\n# Loop until there are no morenumbers"},"finish_reason":"stop","index":0}]}
53AI,企业落地大模型首选服务商
产品:场景落地咨询+大模型应用平台+行业解决方案
承诺:免费场景POC验证,效果验证后签署服务协议。零风险落地应用大模型,已交付160+中大型企业
2025-05-27
Dify工具插件开发和智能体开发全流程实战
2025-05-27
一个让工作效率翻倍的AI神器,Cherry Studio你值得拥有!
2025-05-27
Docext:无需 OCR,本地部署的文档提取神器,企业数据处理新选择
2025-05-26
太猛了,字节把GPT-4o级图像模型开源了!
2025-05-26
Qwen3硬核解析:从36万亿Token到“思考预算”
2025-05-26
蚂蚁集团开源antv的MCP服务:AI智能体与数据可视化的桥梁如何搭建?
2025-05-26
MinerU:高精度纸媒文档解析与数据提取一站式解决方案
2025-05-26
顶级开发者默默换掉了基础大模型
2024-07-25
2025-01-01
2025-01-21
2024-05-06
2024-09-20
2024-07-20
2024-07-11
2024-06-12
2024-12-26
2024-08-13
2025-05-26
2025-05-25
2025-05-23
2025-05-17
2025-05-17
2025-05-17
2025-05-16
2025-05-14