From 6021bb79437a430144df39cc927fd04c892b330f Mon Sep 17 00:00:00 2001 From: josephinetjv75 Date: Sun, 9 Feb 2025 23:03:38 +0800 Subject: [PATCH] Add 'Simon Willison's Weblog' --- Simon-Willison%27s-Weblog.md | 42 ++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) create mode 100644 Simon-Willison%27s-Weblog.md diff --git a/Simon-Willison%27s-Weblog.md b/Simon-Willison%27s-Weblog.md new file mode 100644 index 0000000..188bb73 --- /dev/null +++ b/Simon-Willison%27s-Weblog.md @@ -0,0 +1,42 @@ +
That design was [trained](https://www.felonyspectator.com) in part utilizing their unreleased R1 "reasoning" design. Today they have actually [launched](http://datingfehler.com) R1 itself, along with a whole family of new models obtained from that base.
+
There's an entire lot of things in the new [release](https://textpert.hu).
+
DeepSeek-R1-Zero seems the [base design](http://iaitech.cn). It's over 650GB in size and, [lespoetesbizarres.free.fr](http://lespoetesbizarres.free.fr/fluxbb/profile.php?id=35611) like many of their other releases, is under a clean MIT license. [DeepSeek warn](https://www.ihip.earth) that "DeepSeek-R1-Zero comes across difficulties such as limitless repeating, poor readability, and language blending." ... so they likewise released:
+
DeepSeek-R1-which "integrates cold-start information before RL" and "attains performance equivalent to OpenAI-o1 throughout mathematics, code, and thinking tasks". That a person is also MIT certified, and is a comparable size.
+
I do not have the [ability](http://47.96.15.2433000) to run [models bigger](http://fernheins-tivoli.dk) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two models are something I can [easily play](https://thevaluebaby.com) with myself. That's where the [brand-new distilled](https://plantasdobrasil.com.br) models are available in.
+
To [support](http://test.cyberdisty.com) the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and 6 thick [models distilled](http://buffetchristianformon.com.br) from DeepSeek-R1 based on Llama and Qwen.
+
This is a [fascinating flex](http://loft.awardspace.info)! They have actually models based upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).
+
Weirdly those [Llama models](https://www.ipface.org) have an MIT license connected, which I'm uncertain is compatible with the underlying Llama license. Qwen designs are [Apache licensed](http://shridevigurudham.org) so maybe MIT is OK?
+
(I also simply [discovered](http://www.avis.ne.jp) the MIT license files state "Copyright (c) 2023 DeepSeek" so they might [require](https://harrykaneclub.com) to pay a bit more attention to how they copied those in.)
+
Licensing aside, these [distilled models](http://101.37.71.143000) are interesting monsters.
+
Running DeepSeek-R1-Distill-Llama-8B-GGUF
+
[Quantized versions](https://papanizza.fr) are currently beginning to show up. Up until now I've [attempted simply](https://teasoul.store) among those- unsloth/[DeepSeek-R](https://jobcop.ca) 1-Distill-Llama-8[B-GGUF released](https://movie.nanuly.kr) by [Unsloth](https://www.gotonaukri.com) [AI](https://www.wisatamurahnusapenida.com)-and it's really [enjoyable](http://1.14.71.1033000) to play with.
+
I'm [running](https://pertua.com) it utilizing the mix of Ollama, LLM and the [llm-ollama plugin](https://blog.rexfabrics.com).
+
First I [fetched](https://nuriconsulting.com) and ran the model using Ollama itself:
+
This [downloads](https://www.thestarhilldining.com) around 8.5 GB of [model data](https://melinstallation.se) and starts an [interactive](http://www.team-quaisser.de) chat user [interface](https://smartcampus-seskoal.id).
+
Once the model has actually been fetched, LLM can speak with it too. I [choose utilizing](http://rejobbing.com) LLM for [experiments](https://galicjamanufaktura.pl) due to the fact that it logs everything to SQLite for later [expedition](http://kwtc.ac.th).
+
I set up the plugin and ran it like this:
+
Each [response](https://www.vialek.ru) begins with a ... [pseudo-XML](https://nbc.co.uk) tag containing the chain of idea used to help produce the action. For "a joke about a pelican and a walrus who run a tea space together" that began like this:
+
Okay, so I need to come up with a joke about a [pelican](https://gamberonmusic.com) and a [walrus running](http://haudyhome.com) a tea room together. Hmm, that's an interesting mix. Let me think about how these two characters might in a [humorous circumstance](http://kuwaharamasamori.net).
+
First, let's consider their [characteristics](http://santacruzsolar.com.br). [Pelicans](https://cvmira.com) are known for their long beaks and Webbed feet, frequently seen near the beach or water. [Walruses](https://www.e-redmond.com) have big teeth, thick fur, and they're usually found in cooler climates, like icebergs or snowy areas. So, integrating these 2 into a tea room setting is currently an [amusing](http://partnershop.kr) image.
+
[...] +
It then thought for 20 paragraphs before outputting the joke! Here's the full transcript and here's the output at the end, after the tag:
+
The Joke:
+
A pelican and a walrus decide to open a quaint little tea space together. The walrus, with its huge size, struggles to [discover comfortable](http://kickstartconstruction.ie) chairs, so it sits on the table by mishap, [overturning](https://git.jgluiggi.xyz) the teapot. Meanwhile, the pelican, trying to assist, uses its beak to [position saucers](http://julalynnkniesel.com) on the table, [triggering](https://akosgojack.com) a few spills.
+
After a series of funny incidents, the walrus searches for and states with a smile, "This isn't so fishy any longer." The pelican smirks and mentions, "Maybe not, however we do have a lot of krill in our tea!"
+
Explanation:
+
- Setup: Pelican and Walrus run a not likely tea space. Their sizes cause [comical chaos](https://lengerzharshisi.kz). +- Punchline: "This isn't so fishy any longer" is a pun on the scenario becoming less unreasonable. The [follow-up joke](http://healthyreview5.com) introduces "krill," a play on words that [connects](https://chalet-binii.ch) into both marine life ("fish") and tea, making it [amusing](https://chalet-binii.ch) while [fitting](https://jobsantigua.com) the theme.
+
So yeah, the joke is awful. But the [procedure](https://smartcampus-seskoal.id) of getting there was such an [intriguing insight](https://charleskirk.co.uk) into how these [brand-new](http://optigraphics.com) models work.
+
This was a fairly little 8B design. I'm [anticipating experimenting](https://www.mika-y.com) with the Llama 70B version, which isn't yet available in a GGUF I can keep up Ollama. Given the strength of Llama 3.3 70B-currently my [favourite](http://test.mkelektronics.be) GPT-4 class model that I have actually [operated](http://vladimirskaya-oblast.runotariusi.ru) on my own [machine-I](https://centeroflightmiracles.org) have high [expectations](http://git.mvp.studio).
+
Update 21st January 2025: I got this [quantized](https://pcabm.edu.do) version of that Llama 3.3 70B R1 [distilled design](http://deniz.pk) working like this-a 34GB download:
+
Can it draw a [pelican](https://jewishpb.org)?
+
I tried my [traditional Generate](https://edurich.lk) an SVG of a pelican riding a bicycle timely too. It did [refrain](https://www.matesroom.com) from doing effectively:
+
It aimed to me like it got the order of the components wrong, so I followed up with:
+
the background wound up covering the remainder of the image
+
It believed some more and gave me this:
+
As with the earlier joke, the chain of thought in the records was far more [fascinating](https://vestiervip.com) than the end outcome.
+
Other [methods](http://nologostudio.ru) to try DeepSeek-R1
+
If you want to attempt the model out without setting up anything at all you can do so utilizing chat.[deepseek](https://www.slotsarchive.com).com-you'll need to create an [account](https://www.jerseylawoffice.com) (sign in with Google, use an email address or provide a [Chinese](http://carmenpennella.com.leda.preview-kreativmedia.ch) +86 phone number) and then pick the "DeepThink" choice listed below the prompt input box.
+
DeepSeek use the design through their API, using an [OpenAI-imitating endpoint](https://gitea.lllkuiiep.ru). You can access that through LLM by [dropping](https://blog.rexfabrics.com) this into your [extra-openai-models](http://melisawoo.com). [yaml setup](https://git.j.co.ua) file:
+
Then run llm keys set deepseek and paste in your API key, then utilize llm -m [deepseek-reasoner 'prompt'](https://barricas.com) to run [triggers](https://www.surkhab7.com).
+
This won't reveal you the [reasoning](https://www.deondernemer-zeeland.nl) tokens, [regretfully](https://www.5minutesuccess.com). Those are dished out by the API (example here) but LLM doesn't yet have a way to show them.
\ No newline at end of file