FRESH

Hacker News

Home

Llama 3.2: Revolutionizing edge AI and vision with open, customizable models

919 points by nmwnmw

by simonw

8 subcomments

I'm absolutely amazed at how capable the new 1B model is, considering it's just a 1.3GB download (for the Ollama GGUF version).
I tried running a full codebase through it (since it can handle 128,000 tokens) and asking it to summarize the code - it did a surprisingly decent job, incomplete but still unbelievable for a model that tiny: https://gist.github.com/simonw/64c5f5b111fe473999144932bef42...
More of my notes here: https://simonwillison.net/2024/Sep/25/llama-32/
I've been trying out the larger image models to using the versions hosted on https://lmarena.ai/ - navigate to "Direct Chat" and you can select them from the dropdown and upload images to run prompts.

by opdahl

11 subcomments

I'm blown away with just how open the Llama team at Meta is. It is nice to see that they are not only giving access to the models, but they at the same time are open about how they built them. I don't know how the future is going to go in the terms of models, but I sure am grateful that Meta has taken this position, and are pushing more openness.

by a_wild_dandan

6 subcomments

"The Llama jumped over the ______!" (Fence? River? Wall? Synagogue?)
With 1-hot encoding, the answer is "wall", with 100% probability. Oh, you gave plausibility to "fence" too? WRONG! ENJOY MORE PENALTY, SCRUB!
I believe this unforgiving dynamic is why model distillation works well. The original teacher model had to learn via the "hot or cold" game on text answers. But when the child instead imitates the teacher's predictions, it learns semantically rich answers. That strikes me as vastly more compute-efficient. So to me, it makes sense why these Llama 3.2 edge models punch so far above their weight(s). But it still blows my mind thinking how far models have advanced from a year or two ago. Kudos to Meta for these releases.

by alanzhuly

2 subcomments

Llama3.2 3B feels a lot better than other models with same size (e.g. Gemma2, Phi3.5-mini models).
For anyone looking for a simple way to test Llama3.2 3B locally with UI, Install nexa-sdk(https://github.com/NexaAI/nexa-sdk) and type in terminal:
nexa run llama3.2 --streamlit
Disclaimer: I am from Nexa AI and nexa-sdk is an open-sourced. We'd love your feedback.

by freedomben

3 subcomments

If anyone else is looking for the bigger models on ollama and wondering where they are, the Ollama blog post answered that for me. The are "coming soon" so they just aren't ready quite yet[1]. I was a little worried when I couldn't find them but sounds like we just need to be patient.
[1]: https://ollama.com/blog/llama3.2

by moffkalast

1 subcomments

I've just tested the 1B and 3B at Q8, some interesting bits:
- The 1B is extremely coherent (feels something like maybe Mistral 7B at 4 bits), and with flash attention and 4 bit KV cache it only uses about 4.2 GB of VRAM for 128k context
- A Pi 5 runs the 1B at 8.4 tok/s, haven't tested the 3B yet but it might need a lower quant to fit it and with 9T training tokens it'll probably degrade pretty badly
- The 3B is a certified Gemma-2-2B killer
Given that llama.cpp doesn't support any multimodality (they removed the old implementation), it might be a while before the 11B and 90B become runnable. Doesn't seem like they outperform Qwen-2-VL at vision benchmarks though.

by dhbradshaw

2 subcomments

Tried out 3B on ollama, asking questions in optics, bio, and rust.
It's super fast with a lot of knowledge, a large context and great understanding. Really impressive model.

by kingkongjaffa

3 subcomments

llama3.2:3b-instruct-q8_0 is performing better than 3.1 8b-q4 on my macbookpro M1. It's faster and the results are better. It answered a few riddles and thought experiments better despite being 3b vs 8b.
I just removed my install of 3.1-8b.
my ollama list is currently:
$ ollama list
NAME ID SIZE MODIFIED
llama3.2:3b-instruct-q8_0 e410b836fe61 3.4 GB 2 hours ago
gemma2:9b-instruct-q4_1 5bfc4cf059e2 6.0 GB 3 days ago
phi3.5:3.8b-mini-instruct-q8_0 8b50e8e1e216 4.1 GB 3 days ago
mxbai-embed-large:latest 468836162de7 669 MB 3 months ago

by kgeist

4 subcomments

Tried the 1B model with the "think step by step" prompt.
It gets "which is larger: 9.11 or 9.9?" right if it manages to mention that decimals need to be compared first in its step-by-step thinking. If it skips mentioning decimals, then it says 9.11 is larger.
It gets the strawberry question wrong even after enumerating all the letters correctly, probably because it can't properly count.

by JohnHammersley

0 subcomment

Ollama post: https://ollama.com/blog/llama3.2

by arnaudsm

1 subcomments

Is there an up-to-date leaderboard with multiple LLM benchmarks?
Livebench and Lmsys are weeks behind and sometimes refuse to add some major models. And press releases like this cherry pick their benchmarks and ignore better models like qwen2.5.
If it doesn't exist I'm willing to create it

by gdiamos

1 subcomments

Llama 3.2 includes a 1B parameter model. This should be 8x higher throughput for data pipelines. In our experience, smaller models are just fine for simple tasks like reading paragraphs from PDF documents.

by getcrunk

3 subcomments

Still no 14/30b parameter models since llama 2. Seriously killing real usability for power users/diy.
The 7/8B models are great for poc and moving to edge for minor use cases … but there’s a big and empty gap till 70b that most people can’t run.
The tin foil hat in me is saying this is the compromise the powers that be have agreed too. Basically being “open” but practically gimped for average joe techie. Basically arms control

by Ey7NFZ3P0nzAe

0 subcomment

Interesting that its scores are somewhat helow Pixtral 12B https://mistral.ai/news/pixtral-12b/

by kombine

1 subcomments

Are these models suitable for Code assistance - as an alternative to Cursor or Copilot?

by l5870uoo9y

1 subcomments

> These models are enabled on day one for Qualcomm and MediaTek hardware and optimized for Arm processors.
Do they require GPU or can they be deployed on VPS with dedicated CPU?

by gunalx

0 subcomment

3b was pretty good at multimodal (Norwegian) still a lot of gibberish at times, and way more sensitive than 8b but more usable than Gemma 2 2b at multi modal, fine at my python list sorter with args standard question. But 90b vision just refuses all my actually useful tasks like helping recreate the images in html or do anything useful with the image data other than describing it. Have not gotten as stuck with 70b or openai before. Insane amount of refusals all the time.

by resters

3 subcomments

This is great! Does anyone know if the llama models are trained to do function calling like openAI models are? And/or are there any function calling training datasets?

by chriskanan

1 subcomments

The assessments of visual capability really need to be more robust. They are still using datasets like VQAv2, which while providing some insight, have many issues. There are many newer datasets that serve as much more robust tests and that are less prone to being affected by linguistic bias.
I'd like to see more head-to-head comparisons with community created multi-modal LLMs as done in these papers:
https://arxiv.org/abs/2408.05334
https://arxiv.org/abs/2408.03326
I look forward to reading the technical report, once its available. I couldn't find a link to one, yet.

by sgt

2 subcomments

Anyone on HN running models on their own local machines, like smaller Llama models or such? Or something else?

by 404mm

4 subcomments

Can anyone recommend a webUI client for ollama?

by xrd

0 subcomment

I'm currently fighting with a fastapi python app deployed to render. It's interesting because I'm struggling to see how I encode the image and send it using curl. Their example sends directly from the browser and uses a data uri.
But, this is relevant because I'm curious how this new model allows image inputs. Do you paste a base64 image into the prompt?
It feels like these models can start not only providing the text generation backend, but start to replace the infrastructure for the API as well.
Can you input images without something in front of it like openwebui?

by josephernest

0 subcomment

Can it run with llama-cpp-python? If so, where can we find and download the gguf files? Are they distributed directly by meta, or are they converted to gguf format by third parties?

by thimabi

0 subcomment

Does anyone know how these models fare in terms of multilingual real-world usage? I’ve used previous iterations of llama models and they all seemed to be lacking in that regard.

by aussieguy1234

0 subcomment

When using meta.ai, its able to generate images as well as understand them. Has this also been open sourced or just a GPT4o style ability to see images?

by desireco42

0 subcomment

I have to say that running this model locally I was pleasantly suprised how well it ran, it doesn't use as much resources and produce decent output, comparable to ChatGPT, it is not quite as OpenAI but for a lot of tasks, since it doesn't burden the computer, it can be used with local model.
Next I want to try to use Aider with it and see how this would work.

by notpublic

4 subcomments

Llama-3.2-11B-Vision-Instruct does an excellent job extracting/answering questions from screenshots. It is even able to answer questions based on information buried inside a flowchart. How is this even possible??

by GaggiX

0 subcomment

The 90B seem to perform pretty weak on visual tasks compare to Qwen2-VL-72B: https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct, or am I missing something?

by sk11001

2 subcomments

Can one of thse models be run on a single machine? What specs do you need?

by bombi

1 subcomments

Is Termux enough to run the 1B model on Android?

0 subcomment

by oulipo

1 subcomments

Can the 3B run on a M1 macbook? It seems that it hogs all the memory. The 1B runs fine

0 subcomment

by troupo

1 subcomments

It's hallucinating so badly, it's kinda hilarious
Literally everything about the quote below is wrong. The problem with these "amazing ground-breaking models" is that you never know which knowledge they actually possess.
--- start quote ---
>>> what is the last version of Ash Framework for Elixir you know My training data goes up to December 2023, and according to the Ash Framework releases, the last stable version that I'm familiar with is:
* *Ash 2.14*: Released on April 8, 2021.
However, it's worth noting that Ash has been merged into Elixir itself, and most of its features have been integrated into the Elixir language and standard libraries.
In fact, Ash was one of the first frameworks to be included in Elixir, and it played a significant role in shaping the language. Many of Ash's core concepts, such as `defstruct/1` and `defmodule/1`, were later adopted by Elixir itself.
...
Keep in mind that Ash is no longer actively maintained as a separate framework, but its legacy continues to influence Elixir and its community.
--- end quote ---

by bottlepalm

1 subcomments

What mobile devices can the smaller models run on? iPhone, Android?

by ofermend

0 subcomment

Great release. Models just added to Hallucination Leaderboard: https://github.com/vectara/hallucination-leaderboard.
TL;DR: * 90B-Vision: 4.3% hallucination rate * 11B-Vision: 5.5% hallucination rate

by gdiamos

2 subcomments

Do inference frameworks like vllm support vision?

by stogot

1 subcomments

Surprised no mention of audio?

by dharma1

0 subcomment

are these better than qwen at codegen?

by taytus

0 subcomment

meta.ai still running on 3.1

by 84adam

0 subcomment

excited for this

by sva_

8 subcomments

Curious about the multimodal model's architecture. But alas, when I try to request access
> Llama 3.2 Multimodal is not available in your region.
It sounds like they input the continuous output of an image encoder into a transformer, similar to transfusion[0]? Does someone know where to find more details?
Edit:
> Regarding the licensing terms, Llama 3.2 comes with a very similar license to Llama 3.1, with one key difference in the acceptable use policy: any individual domiciled in, or a company with a principal place of business in, the European Union is not being granted the license rights to use multimodal models included in Llama 3.2. [1]
What a bummer.
0. https://www.arxiv.org/abs/2408.11039
1. https://huggingface.co/blog/llama32#llama-32-license-changes...

by minimaxir

3 subcomments

Off topic/meta, but the Llama 3.2 news topic received many, many HN submissions and upvotes but never made it to the front page: the fact that it's on the front page now indicates that moderators intervened to rescue it: https://news.ycombinator.com/from?site=meta.com (showdead on)
If there's an algorithmic penalty against the news for whatever reason, that may be a flaw in the HN ranking algorithm.

by monkfish328

3 subcomments

Zuckerberg has never liked having Android/iOs as gatekeepers i.e. "platforms" for his apps.
He's hoping to control AI as the next platform through which users interact with apps. Free AI is then fine if the surplus value created by not having a gatekeeper to his apps exceeds the cost of the free AI.
That's the strategy. No values here - just strategy folks.

by nmwnmw

1 subcomments

- Llama 3.2 introduces small vision LLMs (11B and 90B parameters) and lightweight text-only models (1B and 3B) for edge/mobile devices, with the smaller models supporting 128K token context.
- The 11B and 90B vision models are competitive with leading closed models like Claude 3 Haiku on image understanding tasks, while being open and customizable.
- Llama 3.2 comes with official Llama Stack distributions to simplify deployment across environments (cloud, on-prem, edge), including support for RAG and safety features.
- The lightweight 1B and 3B models are optimized for on-device use cases like summarization and instruction following.

by TheAceOfHearts

5 subcomments

I still can't access the hosted model at meta.ai from Puerto Rico, despite us being U.S. citizens. I don't know what Meta has against us.
Could someone try giving the 90b model this word search problem [0] and tell me how it performs? So far with every model I've tried, none has ever managed to find a single word correctly.
[0] https://imgur.com/i9Ps1v6

0 subcomment

by alexcpn

0 subcomment

In KungfuPanda there is this line that the Panda says "I love KungFuuuuuuuu", well I normally don't tell like this, but when I saw this and (starting to use this), I feel like yelling"I like Metaaaaa or is it LLAMMMAAA or is it Open source.. or is it this cool ecosystem which gives such value for free...

by 404mm

6 subcomments

Newbie question, what size model would be needed to have a 10x software engineer skills and no knowledge of the human kind (ie, no need to know how to make a pizza or sequence your DNA). Is there such a model?