使用 HuggingFace + LangChain 本地调用 DeepSeek

前言

DeepSeek 是近年来备受关注的大语言模型之一。通过 HuggingFace Transformers 和 LangChain，可以在本地高效调用 DeepSeek，进行自然语言处理、问答等任务。本文以 macOS 环境为例，介绍如何用 Python 本地部署和调用 DeepSeek。

一、环境准备

确保已安装 Python 3.8+
注册 HuggingFace 账号：
- 访问 HuggingFace 官网注册账号。
- 获取 API Token，进入 Settings > Access Tokens 创建新 Token。
  在终端执行以下命令登录 HuggingFace：
1
huggingface-cli login
根据提示输入 API Token。

建议使用虚拟环境：

1 2	python3 -m venv venv source venv/bin/activate

安装依赖包：

1	pip install torch transformers langchain

安装 PyTorch：
apple silicon 用户可使用以下命令安装 PyTorch：
1
pip3 install torch --index-url https://download.pytorch.org/whl/metal.html
这里没有安装 torchvision 和 torchaudio，目前 arm64 架构下这两个包还不支持。

测试是否能够正常调用 Metal 加速的 PyTorch：
1
2
3
import torch
x = torch.rand(5, 3)
print(x)
如果输出类似以下内容，说明安装成功：
1
2
3
4
5
tensor([[0.1234, 0.5678, 0.9101],
[0.1122, 0.3344, 0.5566],
[0.7788, 0.9900, 0.1234],
[0.2345, 0.6789, 0.0123],
[0.4567, 0.8901, 0.2345]])
如遇到 PyTorch 安装慢，可参考清华镜像。

二、下载 DeepSeek 模型

访问 HuggingFace DeepSeek 模型页
- 推荐使用 DeepSeek-R1-Distill-Qwen-1.5B 模型，适合大多数任务。
- 也可选择其他 DeepSeek 模型，如 deepseek-ai/deepseek-llm-7b-base。
可直接用 Transformers 自动下载，或提前手动下载模型到本地。

三、Python 本地调用 DeepSeek 简单示例

pipe = pipeline("text-generation", model="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")
messages = [
    {"role": "user", "content": "什么是deepseek?"},
]
msg = pipe(messages)
print(msg)

若模型已下载到本地，可将 model_name 替换为本地路径。

四、结合 LangChain 调用 DeepSeek

LangChain 可用于构建更复杂的链式推理、对话等应用。

from itertools import chain
from math import pi, trunc
from pydoc_data import topics
from regex import template
from transformers import pipeline
from langchain_huggingface import HuggingFacePipeline
from langchain.prompts import PromptTemplate
import torch

# check if metal is available
if torch.backends.mps.is_available():
    print("Using Apple Silicon GPU (MPS) for inference.")
else:
    print("Using CPU for inference.")

model = pipeline(
    "text-generation",
    model="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
     device="mps",  # 使用 Apple Silicon Metal(Metal Performance Shaders) GPU
    max_length=512, # 设置生成文本的最大长度
    truncation=True, # 启用截断，避免超出模型最大输入长度
)

llm = HuggingFacePipeline(pipeline=model)

template = PromptTemplate.from_template("tell me a joke about {topic} in {language}.")

chain = template | llm
topic = input("Enter a topic: ")
language = input("Enter a language: ")

response = chain.invoke({"topic": topic, "language": language})
print(response)

五、使用 LangChain 进行更复杂的任务

from transformers import pipeline
from langchain_huggingface import HuggingFacePipeline
from langchain.prompts import PromptTemplate
from transformers.utils.logging import set_verbosity_error

set_verbosity_error()

summarization_pipeline = pipeline(
    "summarization", model="sshleifer/distilbart-cnn-12-6", device="mps"
)
summarizer = HuggingFacePipeline(pipeline=summarization_pipeline)

refinement_pipeline = pipeline(
    "summarization", model="sshleifer/distilbart-cnn-12-6", device="mps"
)
refiner = HuggingFacePipeline(pipeline=refinement_pipeline)

qa_pipeline = pipeline(
    "question-answering", model="distilbert-base-cased-distilled-squad", device="mps"
)

summary_template = PromptTemplate.from_template(
    "Summarize the following text in a {length} way:\n\n{text}"
)

summarization_chain = summary_template | summarizer | refiner

text_to_summarize = input("\nEnter text to summarize:\n")
length = input("\nEnter the length (short/medium/long): ")

summary = summarization_chain.invoke({"text": text_to_summarize, "length": length})

print("\n🔹 **Generated Summary:**")
print(summary)

while True:
    question = input("\nAsk a question about the summary (or type 'exit' to stop):\n")
    if question.lower() == "exit":
        break

    qa_result = qa_pipeline(question=question, context=summary)

    print("\n🔹 **Answer:**")
    print(qa_result["answer"])