Add 'Simon Willison's Weblog'

6 months ago · 6021bb7943
commit 6021bb7943
1 changed files with 42 additions and 0 deletions
--- a/Simon-Willison%27s-Weblog.md
+++ b/Simon-Willison%27s-Weblog.md
@ -0,0 +1,42 @@
 <br>That design was [trained](https://www.felonyspectator.com) in part utilizing their unreleased R1 "reasoning" design. Today they have actually [launched](http://datingfehler.com) R1 itself, along with a whole family of new models obtained from that base.<br>
 <br>There's an entire lot of things in the new [release](https://textpert.hu).<br>
 <br>DeepSeek-R1-Zero seems the [base design](http://iaitech.cn). It's over 650GB in size and,  [lespoetesbizarres.free.fr](http://lespoetesbizarres.free.fr/fluxbb/profile.php?id=35611) like many of their other releases, is under a clean MIT license. [DeepSeek warn](https://www.ihip.earth) that "DeepSeek-R1-Zero comes across difficulties such as limitless repeating, poor readability, and language blending." ... so they likewise released:<br>
 <br>DeepSeek-R1-which "integrates cold-start information before RL" and "attains performance equivalent to OpenAI-o1 throughout mathematics, code, and thinking tasks". That a person is also MIT certified, and is a comparable size.<br>
 <br>I do not have the [ability](http://47.96.15.2433000) to run [models bigger](http://fernheins-tivoli.dk) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two models are something I can [easily play](https://thevaluebaby.com) with myself. That's where the [brand-new distilled](https://plantasdobrasil.com.br) models are available in.<br>
 <br>To [support](http://test.cyberdisty.com) the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and 6 thick [models distilled](http://buffetchristianformon.com.br) from DeepSeek-R1 based on Llama and Qwen.<br>
 <br>This is a [fascinating flex](http://loft.awardspace.info)! They have actually models based upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
 <br>Weirdly those [Llama models](https://www.ipface.org) have an MIT license connected, which I'm uncertain is compatible with the underlying Llama license. Qwen designs are [Apache licensed](http://shridevigurudham.org) so maybe MIT is OK?<br>
 <br>(I also simply [discovered](http://www.avis.ne.jp) the MIT license files state "Copyright (c) 2023 DeepSeek" so they might [require](https://harrykaneclub.com) to pay a bit more attention to how they copied those in.)<br>
 <br>Licensing aside, these [distilled models](http://101.37.71.143000) are interesting monsters.<br>
 <br>Running DeepSeek-R1-Distill-Llama-8B-GGUF<br>
 <br>[Quantized versions](https://papanizza.fr) are currently beginning to show up. Up until now I've [attempted simply](https://teasoul.store) among those- unsloth/[DeepSeek-R](https://jobcop.ca) 1-Distill-Llama-8[B-GGUF released](https://movie.nanuly.kr) by [Unsloth](https://www.gotonaukri.com) [AI](https://www.wisatamurahnusapenida.com)-and it's really [enjoyable](http://1.14.71.1033000) to play with.<br>
 <br>I'm [running](https://pertua.com) it utilizing the mix of Ollama, LLM and the [llm-ollama plugin](https://blog.rexfabrics.com).<br>
 <br>First I [fetched](https://nuriconsulting.com) and ran the model using Ollama itself:<br>
 <br>This [downloads](https://www.thestarhilldining.com) around 8.5 GB of [model data](https://melinstallation.se) and starts an [interactive](http://www.team-quaisser.de) chat user [interface](https://smartcampus-seskoal.id).<br>
 <br>Once the model has actually been fetched, LLM can speak with it too. I [choose utilizing](http://rejobbing.com) LLM for [experiments](https://galicjamanufaktura.pl) due to the fact that it logs everything to SQLite for later [expedition](http://kwtc.ac.th).<br>
 <br>I set up the plugin and ran it like this:<br>
 <br>Each [response](https://www.vialek.ru) begins with a ... [pseudo-XML](https://nbc.co.uk) tag containing the chain of idea used to help produce the action. For "a joke about a pelican and a walrus who run a tea space together" that began like this:<br>
 <br>Okay, so I need to come up with a joke about a [pelican](https://gamberonmusic.com) and a [walrus running](http://haudyhome.com) a tea room together. Hmm, that's an interesting mix. Let me think about how these two characters might  in a [humorous circumstance](http://kuwaharamasamori.net).<br>
 <br>First, let's consider their [characteristics](http://santacruzsolar.com.br). [Pelicans](https://cvmira.com) are known for their long beaks and Webbed feet, frequently seen near the beach or water. [Walruses](https://www.e-redmond.com) have big teeth, thick fur, and they're usually found in cooler climates, like icebergs or snowy areas. So, integrating these 2 into a tea room setting is currently an [amusing](http://partnershop.kr) image.<br>
 <br> [...]
 <br>It then thought for 20 paragraphs before outputting the joke! Here's the full transcript and here's the output at the end, after the tag:<br>
 <br>The Joke:<br>
 <br>A pelican and a walrus decide to open a quaint little tea space together. The walrus, with its huge size, struggles to [discover comfortable](http://kickstartconstruction.ie) chairs, so it sits on the table by mishap, [overturning](https://git.jgluiggi.xyz) the teapot. Meanwhile, the pelican, trying to assist, uses its beak to [position saucers](http://julalynnkniesel.com) on the table, [triggering](https://akosgojack.com) a few spills.<br>
 <br>After a series of funny incidents, the walrus searches for and states with a smile, "This isn't so fishy any longer." The pelican smirks and mentions, "Maybe not, however we do have a lot of krill in our tea!"<br>
 <br>Explanation:<br>
 <br>- Setup: Pelican and Walrus run a not likely tea space. Their sizes cause [comical chaos](https://lengerzharshisi.kz).
 - Punchline: "This isn't so fishy any longer" is a pun on the scenario becoming less unreasonable. The [follow-up joke](http://healthyreview5.com) introduces "krill," a play on words that [connects](https://chalet-binii.ch) into both marine life ("fish") and tea, making it [amusing](https://chalet-binii.ch) while [fitting](https://jobsantigua.com) the theme.<br>
 <br>So yeah, the joke is awful. But the [procedure](https://smartcampus-seskoal.id) of getting there was such an [intriguing insight](https://charleskirk.co.uk) into how these [brand-new](http://optigraphics.com) models work.<br>
 <br>This was a fairly little 8B design. I'm [anticipating experimenting](https://www.mika-y.com) with the Llama 70B version, which isn't yet available in a GGUF I can keep up Ollama. Given the strength of Llama 3.3 70B-currently my [favourite](http://test.mkelektronics.be) GPT-4 class model that I have actually [operated](http://vladimirskaya-oblast.runotariusi.ru) on my own [machine-I](https://centeroflightmiracles.org) have high [expectations](http://git.mvp.studio).<br>
 <br>Update 21st January 2025: I got this [quantized](https://pcabm.edu.do) version of that Llama 3.3 70B R1 [distilled design](http://deniz.pk) working like this-a 34GB download:<br>
 <br>Can it draw a [pelican](https://jewishpb.org)?<br>
 <br>I tried my [traditional Generate](https://edurich.lk) an SVG of a pelican riding a bicycle timely too. It did [refrain](https://www.matesroom.com) from doing effectively:<br>
 <br>It aimed to me like it got the order of the components wrong, so I followed up with:<br>
 <br>the background wound up covering the remainder of the image<br>
 <br>It believed some more and gave me this:<br>
 <br>As with the earlier joke, the chain of thought in the records was far more [fascinating](https://vestiervip.com) than the end outcome.<br>
 <br>Other [methods](http://nologostudio.ru) to try DeepSeek-R1<br>
 <br>If you want to attempt the model out without setting up anything at all you can do so utilizing chat.[deepseek](https://www.slotsarchive.com).com-you'll need to create an [account](https://www.jerseylawoffice.com) (sign in with Google, use an email address or provide a [Chinese](http://carmenpennella.com.leda.preview-kreativmedia.ch) +86 phone number) and then pick the "DeepThink" choice listed below the prompt input box.<br>
 <br>DeepSeek use the design through their API, using an [OpenAI-imitating endpoint](https://gitea.lllkuiiep.ru). You can access that through LLM by [dropping](https://blog.rexfabrics.com) this into your [extra-openai-models](http://melisawoo.com). [yaml setup](https://git.j.co.ua) file:<br>
 <br>Then run llm keys set deepseek and paste in your API key, then utilize llm -m [deepseek-reasoner 'prompt'](https://barricas.com) to run [triggers](https://www.surkhab7.com).<br>
 <br>This won't reveal you the [reasoning](https://www.deondernemer-zeeland.nl) tokens, [regretfully](https://www.5minutesuccess.com). Those are dished out by the API (example here) but LLM doesn't yet have a way to show them.<br>