DeepSeek本地部署测试

billwanhua · 2025-01-30

老大试试原生DS模型，有人把模型压缩成720GB to just 131GB，只需要cpu+gpu内存超过80G就可以流畅运行，这太有意思了，每个人都可以自己搞世界最高AI:

We shrank R1, the 671B parameter model from 720GB to just 131GB (a 80% size reduction) whilst making it still fully functional and great
No the dynamic GGUFs does not work directly with Ollama but it does work on llama.cpp as they support sharded GGUFs and disk mmap offloading. For Ollama, you will need to merge the GGUFs manually using llama.cpp.
Minimum requirements: a CPU with 20GB of RAM (but it will be slow) - and 140GB of diskspace (to download the model weights)
Optimal requirements: sum of your VRAM+RAM= 80GB+ (this will be somewhat ok)
No, you do not need hundreds of RAM+VRAM but if you have it, you can get 140 tokens per second for throughput & 14 tokens/s for single user inference with 2xH100
Our open-source GitHub repo: github.com/unslothai/unsloth

View: https://www.reddit.com/r/selfhosted/comments/1ic8zil/yes_you_can_run_deepseekr1_locally_on_your_device/

GitHub - unslothai/unsloth: Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM. - unslothai/unsloth

github.com

ert0000 · 2025-01-30

billwanhua 说:
老大试试原生DS模型，有人把模型压缩成720GB to just 131GB，只需要cpu+gpu内存超过80G就可以流畅运行，这太有意思了，每个人都可以自己搞世界最高AI:

We shrank R1, the 671B parameter model from 720GB to just 131GB (a 80% size reduction) whilst making it still fully functional and great

No the dynamic GGUFs does not work directly with Ollama but it does work on llama.cpp as they support sharded GGUFs and disk mmap offloading. For Ollama, you will need to merge the GGUFs manually using llama.cpp.

Minimum requirements: a CPU with 20GB of RAM (but it will be slow) - and 140GB of diskspace (to download the model weights)

Optimal requirements: sum of your VRAM+RAM= 80GB+ (this will be somewhat ok)

No, you do not need hundreds of RAM+VRAM but if you have it, you can get 140 tokens per second for throughput & 14 tokens/s for single user inference with 2xH100

Our open-source GitHub repo: github.com/unslothai/unsloth

View: https://www.reddit.com/r/selfhosted/comments/1ic8zil/yes_you_can_run_deepseekr1_locally_on_your_device/

GitHub - unslothai/unsloth: Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM. - unslothai/unsloth

github.com

现在有一个在RASPBERRY PI 上装DS R1 8B的视频一天就到1.3M 点击。

Riven · 2025-01-30

billwanhua 说:
老大试试原生DS模型，有人把模型压缩成720GB to just 131GB，只需要cpu+gpu内存超过80G就可以流畅运行，这太有意思了，每个人都可以自己搞世界最高AI:

We shrank R1, the 671B parameter model from 720GB to just 131GB (a 80% size reduction) whilst making it still fully functional and great

No the dynamic GGUFs does not work directly with Ollama but it does work on llama.cpp as they support sharded GGUFs and disk mmap offloading. For Ollama, you will need to merge the GGUFs manually using llama.cpp.

Minimum requirements: a CPU with 20GB of RAM (but it will be slow) - and 140GB of diskspace (to download the model weights)

Optimal requirements: sum of your VRAM+RAM= 80GB+ (this will be somewhat ok)

No, you do not need hundreds of RAM+VRAM but if you have it, you can get 140 tokens per second for throughput & 14 tokens/s for single user inference with 2xH100

Our open-source GitHub repo: github.com/unslothai/unsloth

View: https://www.reddit.com/r/selfhosted/comments/1ic8zil/yes_you_can_run_deepseekr1_locally_on_your_device/

GitHub - unslothai/unsloth: Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM. - unslothai/unsloth

github.com

这个厉害了，我去买内存

Riven · 2025-01-30

卖姑娘的小火柴说:
@Riven 好！回想起当年第一次在286上运行译星翻译软件，真是进步太多太多了。

@Hui 应用远不止翻译这几句话。比如它可以瞬间准确翻译成所有语言，比如塔希提语。还可以根据资料推理解决问题。如果有兴趣你可以学习一下。

这个deepseek我觉得有两点贡献，一个是让成本直接降到拼多多水平，二是让国内普通民众人人都用上了AI

非常同意， DS的贡献是降低了使用成本和使用门槛，极大的推动了AI发展

billwanhua · 2025-01-30

为准备买两个4060Ti 16GB （$599），或者2个AMD 7900 24GB, 试试。关键要内存。

ccc · 2025-01-30

Riven 说:
Deepseek是用多少成本训练的无法考证，但是这部署成本真的是低。

来个中译英看看。

Riven · 2025-01-30

billwanhua 说:
为准备买两个4060Ti 16GB （$599），或者2个AMD 7900 24GB, 试试。关键要内存。

我有64G内存， 16G显存，感觉应该够了。

billwanhua · 2025-01-30

Riven 说:
我有64G内存， 16G显存，感觉应该够了。

你那个肯定够了，我说我自己准备

ert0000 · 2025-01-30

billwanhua 说:
为准备买两个4060Ti 16GB （$599），或者2个AMD 7900 24GB, 试试。关键要内存。

从性价比看，买GPU的提升性价比不高

我做过实验R1 70B，很大的MODEL了，256G RAM 电脑，在没有GPU下也能RUN得不错，加一张类似电脑价值的GPU 卡，TOKEN 产生速度会提升，但只提升15-20%左右。

Riven · 2025-01-30

ccc 说:
来个中译英看看。

哈马斯证实其原军事部门首领穆罕默德·戴夫身亡来源：新华网 | 2025年01月31日 03:27:36原标题：哈马斯证实其原军事部门首领穆罕默德·戴夫身亡　　新华社加沙1月30日电巴勒斯坦伊斯兰抵抗运动（哈马斯）下属武装组织卡桑旅发言人阿布·乌拜达30日证实，哈马斯原军事部门首领穆罕默德·戴夫已在以色列军队袭击中身亡。　　乌拜达在视频声明中还证实，卡桑旅原副首领马尔万·伊萨、哈马斯原汗尤尼斯旅指挥官拉法阿·萨拉马等多名哈马斯高级指挥官也已身亡。但他并未透露更多细节。　　以色列军方去年8月1日发表声明说，穆罕默德·戴夫在同年7月13日以军对加沙地带汗尤尼斯地区的空袭中死亡。

Hamas has confirmed that its former military leader, Muhammad Daf, has been killed.

The statement from Abu Ubaidah of al-Qassam Brigades on the 30th confirmed that Muhammad Daf, who was the former commander of Hamas' military wing, was killed in an Israeli airstrike. Abu Ubaidah also confirmed that the former deputy commander of al-Qassam, Marwan Eissa, and Rafaa Slaama, who was the commander of the Khan Younis-based faction of Hamas, have been killed.

According to the Israeli military in August, Daf had been killed in airstrikes on Khan Younis on July 13th of that year.

DeepSeek-R1-abliterated:32B

月下独酌 · 2025-01-30

@Riven 你这都是Distilled model
我H100NVL 188G 的显存都load不起来原生态的671b

向问天 · 2025-01-30

月下独酌说:
@Riven 你这都是Distilled model
我H100NVL 188G 的显存都load不起来原生态的671b

DS只有Distilled model吧？

月下独酌 · 2025-01-30

向问天说:
DS只有Distilled model吧？

641b的不是，问题是你需要6块h100

edit, typo 671b

月下独酌 · 2025-01-30

各有千秋，DS没那么神
DS <think>这个block太愚蠢了，无法去掉思维step

llama3.3 70b
新任交通部长肖恩·达菲（Sean Duffy）在宣誓就职仅几个小时后就面临了他的第一次重大危机。达菲于周二被参议院确认，并迅速成为联邦政府应对发生在华盛顿特区最近的雷根国家机场的致命飞机坠毁事故的公共形象。星期三晚上，一架美军直升机与载有60名乘客和4名机组人员的美国航空公司飞机相撞，当时该飞机正在降落，导致其冲入冰冷的波托马克河。人们担心机上所有人都已遇难，这可能是几十年来最致命的美国空难之一。

“我们的新交通部长肖恩·达菲——在他上任第二天就发生了这件事，那真的是一个艰难的局面，”特朗普在星期四白宫简报会上与达菲一起出席时说。

（ translation note： "肖恩·达菲" 是Sean Duffy 的中文翻译，"雷根国家机场" 是Reagan National Airport 的中文翻译，"波托马克河" 是Potomac River 的中文翻译）

ds-r1:70b (distlled)
新任美国运输部长肖恩·达菲在就职几小时后便迎来了他执政以来的第一次重大危机。达菲于周二被参议院确认，他迅速成为联邦政府回应里根国家机场附近致命飞机坠毁事件的公开面孔。周三晚间，一架美国航空公司的客机正在里根国家机场着陆时，与一架阿米尼亚军用直升机发生碰撞，导致客机冲向冰冷的波托马克河。机上所有人都担心已死亡，这场灾难看起来是几十年来美国最致命的航空事故。

周四在白宫的一个简报会上，特朗普和达菲一起出席时说：“我们的新任运输部长肖恩·达菲——他就职第二天时发生了这件事。这确实很艰难。”

冬暖 · 2025-01-30

Riven 说:
非常同意， DS的贡献是降低了使用成本和使用门槛，极大的推动了AI发展

不是降低门槛，是拆门

DeepSeek本地部署测试

billwanhua

本站元老

GitHub - unslothai/unsloth: Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.

ert0000

本站元老

GitHub - unslothai/unsloth: Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.

Riven

Administrator

GitHub - unslothai/unsloth: Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.

Riven

Administrator

billwanhua

本站元老

ccc

难得糊涂

Riven

Administrator

billwanhua

本站元老

ert0000

本站元老

Riven

Administrator

DeepSeek-R1-abliterated:32B

月下独酌

浪子

向问天

日月神教光明左使

月下独酌

浪子

月下独酌

浪子

冬暖

知名会员

相关推荐

DeepSeek本地部署测试

本站元老

本站元老

Administrator

Administrator

本站元老

难得糊涂

Administrator

本站元老

本站元老

Administrator

DeepSeek-R1-abliterated:32B​

浪子

日月神教光明左使

浪子

浪子

知名会员

相关推荐

DeepSeek-R1-abliterated:32B