GPT Neo: open-source GPT model, with pretrained 1.3B & 2.7B weight models

6 hours ago by sillysaurusx

Please test their models before you take it at face value.

Eleuther has a history of claiming to replicate projects when they haven't. For example, they shipped a DALL-E repo a few days after OpenAI announced it (https://twitter.com/theshawwn/status/1348017515897659392) which was broken, and they've walked back their GPT-3 replication claims to replicating 1.5B due to the fact that their architecture doesn't scale.

As far as I can tell, they're generating a large amount of hype with grandiose claims that they can't deliver on.

All I care about is whether you like their models and actually use them in practice. If you do, please let me know and I'll pipe down. But so far, I haven't heard of anyone who uses anything they've produced, and that worries me. Has anyone?

One specific claim they made: https://twitter.com/BlancheMinerva/status/134727697554780980...

"DALL-E is quite straight forward and already coded. We just need data to train it."

No, DALL-E is neither straightforward nor was it successfully coded, especially back on January 7th.

Anyway, carry on. I really don't like speaking badly of AI projects, and I hope that they succeed. The model release today is a good step forward, assuming it works. But it might be better to have the expectation of "the models don't work" until proven otherwise.

I'd also like to point out that there are some capable people doing work at Eleuther. Sid in particular is one of the best TPU hackers in the scene. I just wish they would scale down their claims, release more models, and not claim that they've done X until actually doing X. For example, the readme says they have "the ability to scale up to full GPT3 sizes (and possibly more!), using the mesh-tensorflow library," which they don't.

2 hours ago by loxias

Geez, that's really harsh.

I don't think any single thing you've claimed is factually wrong, and I don't speak for Eleuther nor am I attempting to justify their claims.

But.

As I understand it (mostly from lurking on their discord and reading publicly available materials) this is a group of volunteer academic types trying to replicate something great and awesome, with the only goal of giving it to the world. You could cut them some slack.

I can't speak for you, but as a "for free, weekend project" what they've done certainly makes me feel I need to up my game.

an hour ago by OgAstorga

This has nothing to do with the good work, awesome intentions, nor the fact that they have no financial incentives behind this.

Claiming something that is not true is in itself wrong.

an hour ago by stellaathena

I am sorry that I was misinformed about the state of our DALL-E replication when I made that tweet. It was not malicious - I was reporting what I had been told by someone else.

Yes I was wrong. That said, I had hoped that maybe after two an a half months Shawn would stop holding it over my head.

an hour ago by undefined

[deleted]

2 hours ago by cookiengineer

What I find interesting about their marketing(?) is that they identified a market niche that they want to position themselves in.

Enterprise customers that have no idea about the technical details will just hear about OpenAI's success in this fancy new model and assume that Eleuther can deliver.

I mean, most use cases for "big data" projects that are tiny in comparison with Alphabet's datasets will just work with GPT2 fine, probably.

And Enterprise customers that hear those claims and see some code, maybe some demo, is enough for them to start the consultancy process.

In my opinion that's a policy problem that OpenAI introduced by not requiring the absolute reproducability of both the code and model, and both training and dataset of their models upon release.

Stakes are pretty high in the AI industry, and OpenAI actively influences it. My dream was in the beginning that they are a source of verification, audits and "proof" that models are legit...yet I have the feeling lately that they just buzz around like everyone else.

To this date I haven't seen anyone replicate any of the DNC results, for example.

Anyways, just my two cents on this one.

an hour ago by leogao

To date, EleutherAI as an "organization" (read: basically a Discord server) has not really attempted any kind of marketing. It has no PR dept, just individuals tweeting about the work that Eleuther does.

3 hours ago by natch

Good perspective but I would like to hear the response from the developers before concluding too much.

This is not meant as a goad to you, but more just as more info for everyone, my understanding is it is an open source community of like minded people type project (as opposed to a bigco) and actively solicits contributions (by which I mean code and data) so anyone seeing room for improvement is welcome to step in from what I can tell.

I did find your comment helpful and informative; just adding another angle here.

2 hours ago by stellaathena

It's literally a couple people hanging out in a discord channel and doing this as a way to procrastinate their jobs.

3 hours ago by ImprobableTruth

Disclaimer: I know absolutely nothing about machine learning.

Isn't GPT-3 the architecture? Are they doing something different or why would it not scale?

2 hours ago by nmfisher

GPT-3 is the name for the architecture, but there are a few different versions/sizes. The OpenAI version that impressed us all was ~170B parameters, this is far smaller.

To go from 2.7B to 170B parameters will need more than just a few config tweaks. There's a whole bunch of hacks and tricks needed to coax a model to train at that scale, the Eleuther version is almost guaranteed to fail out-of-the-box.

2 hours ago by minimaxir

It's worth noting that the GPT-3 paper did train models with more sane sizes (e.g. 1.5B) as a point of comparison. I am surprised/annoyed they never released them though.

an hour ago by sendtown_expwy

I would guess that an average FAANG ML engineer could code up and successfully execute a forward/backward pass on a GPT-1 or GPT-2 model with a day of effort or less. (GPT-3 a little harder, but not significantly). But is that model actually going to perform well? Most likely no. Model performance varies significantly due to subtle details in data processing implementations, seemingly insignificant details in code, and even from different numerical methods of calculating the same semantics.

If you don't believe me, consider that many ML researchers track their commits (or exact code versions) extremely carefully, because oftentimes they will make some change (or changes) they think are inconsequential and later find that actually, their model broke. If they made too many changes, whoops, guess you have to binary search over the diff to see what happened since your last "good run".

If the people who spent months (if not years) tuning a model can't tell whether it will work from the code, how could anyone else? Most ML researchers will not bother with most code that doesn't give proof of results (in terms of a model that can actually be evaluated) because it is just so unlikely that it will actually work well. Now, it might "work" in the sense that it converges and does something when you prompt it with examples. But will this GPT-3 reimplementation actually outperform say, the 10x smaller T5 checkpoint that was released by Google, or the other smaller language models others have released? If it doesn't, it's hard to argue that its very useful at all.

I think that's the spirit of why the original commenter said what they did, but I still do applaud the efforts of this team (and hope that their implementation is, in fact, highly performant!)

3 hours ago by m00x

It's the model, not the architecture, but you could say the model contains the architecture.

42 minutes ago by nl

Almost all the challenges with GPT-sized models are engineering and training challenges, not architectural.

How do you train a model too big to fit in a single GPU? It's doable, but not simple. How do you update weights across your cluster? etc etc

6 minutes ago by vincentmarle

Does anyone know if there's a hosted version of this kind of GPT model somewhere? All I want to do is just call a GPT-2 API and get a response back, I'm not interested in setting up the entire infrastructure by myself.

8 hours ago by ve55

This is a nice release, but the title is a bit misleading as the released sizes (1.3B and 2.7B parameters) do not yet compare to the size of GPT-3 (175B), but rather GPT-2 (1.5B) instead (although future releases may have significantly more!).

Edit: title improved, thank you!

8 hours ago by pizza

Fixed title to reflect that, thanks

7 hours ago by ve55

I would perhaps change 'GPT-3' to just say 'GPT' instead, as a more salient fix.

6 hours ago by stellaathena

GPT-3 isn't a single model. It's a model architecture that is very closely followed by GPT-Neo. The 2.7B model is the exact same size as something OpenAI sells under the label "GPT-3"

6 hours ago by nl

This is incorrect. It's the GPT-3 model architecture and optimisations, and uses training techniques similar to GPT-3.

8 hours ago by nl

Yeah. They say they are doing a 10B release soon[1].

I suspect they have run into training issues since they are moving to a new repo[2]

[1] https://twitter.com/arankomatsuzaki/status/13737326468119674...

[2] https://github.com/EleutherAI/gpt-neox/

7 hours ago by chillee

It's more about hardware - these models were trained on TPUs, while GPT-NeoX is being trained on GPUs graciously provided by Coreweave.

7 hours ago by orra

Any idea what the required GPU time would cost (if not donated)? Is GPT-3 just a commodity soon?

8 hours ago by graiz

A great start for a truly open approach. It's ironic that OpenAI isn't particularly open about its tech.

6 hours ago by victor9000

It was disappointing to see just how quickly ClosedAI changed its tune once they produced something of value.

4 hours ago by catillac

Is ClosedAI some counterpart to OpenAI with a clever name?

2 hours ago by qPM9l3XJrF

Many people (correctly, in my view) criticized OpenAI for the name, saying that openness should be evaluated on a case by case basis. Glad they listened to critics instead of trying to maintain consistency for its own sake.

8 hours ago by pjfin123

GPT Paper:

"Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters"

README:

"1T or bust my dudes [...] An implementation of model & data parallel GPT2 & GPT3 -like models, with the ability to scale up to full GPT3 sizes (and possibly more!)"

It seems the largest model they released is 2.7 billion parameters or ~0.01 the size of GPT-3. The most interesting part about GPT-3 was its size and it seems this is only "GPT-3-like" in architecture.

I also have a translation library with ~100 million (0.001 GPT-3) parameters:

https://github.com/argosopentech/argos-translate

5 hours ago by stellaathena

GPT-3 is a model architecture, not a model. While the largest GPT-3 model is 175B, that very paper has a table that includes "GPT-3 XL" (1.3B) and "GPT-3 2.7B" as models in the GPT-3 architecture. The 2.7B model is the same size as Ada, a model that OpenAI currently sells API access to under the moniker "GPT-3"

5 hours ago by Dylan16807

None of the other models are even close to the big one, and the paper also suggests calling the big one "GPT-3". And people do that very often in practice. So it's often ambiguous but saying the term only means the architecture isn't right either.

8 hours ago by f430

what does he mean when he says 1T or bust? Is he referring to 1 trillion parameters? Are you saying that GTP-3 has 2.7 trillion parameters? Does it mean that to get to GPT-3 level it needs 100x more amount of dataset?

7 hours ago by jon_tau

The saying comes from a slide by Noam Shazeer (see: https://www.youtube.com/watch?v=HgGyWS40g-g&ab_channel=Tenso...). It just means the current goal should be to have models with 1 trillion parameters.

6 hours ago by Voloskaya

GPT-3 has 175 billion parameters. So they need to scale by 64x. They already have a comparable amount of data than what was used by OpenAI, so it's about scaling the numbers of GPUs.

5 hours ago by f430

I see so that means this GPT Neo is 64 less powerful?

7 hours ago by pjfin123

I interpreted it as aspiring to a trillion paramters but I'm not sure.

3 hours ago by PufPufPuf

Did anyone manage to successfully run inference in the provided Google Colab (https://colab.research.google.com/github/EleutherAI/GPTNeo/b...)? I can run training, but can't manage to make the inference (even from a pre-trained model) work.

an hour ago by stellaathena

Hi! Thanks for trying it out. There was a bug that should now be fixed. When I run the example unicorn prompt I get the follow. Don't hesitate to open an issue if you're still having trouble.

"In a shocking finding, scientists discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.

Bebek Uranzoglu, another member of the research team from the University of British Columbia, was working on a project the Latino-Canadian rodeo competition equipos to document a rare and remarkable ecosystem in the Andes Mountains.

His curiosity was piqued when he spotted an adolescent herd of about 10 unicorns foraging in a forest near the valley of the Jumbo Flu Group. The unicorns — whose numbers once swelled to 46,000 — were perched on the forest floor and watched the researchers work.

Urizoglu grew excited when he spotted another group that seemed to be thriving in an area below the herd. The team hoped the apparent population growth would indicate a human presence.

But when a team of researchers set up a camera trap, they were surprised to find the unicorns in the first place, and in a forest near a lake — in fact the forest was almost entirely made up of the animals. Despite their own magical presence, the team could not see the herd was populated by humans.

“The whole place almost smelled like animals,” says Bebek. “We were never able to find human footprints at any of the points we stood at. The trees were so large, you wouldn’t have been able to walk 40 meters through them. We assumed that the truth of the matter was, ‘Well the deer didn’t like this forest at all.’”

3 hours ago by MasterScrat

Same here. I managed to make it "work" in the sense that it wouldn't crash during inference, but then it generated gibberish. Has anyone managed to make it work reliably?

an hour ago by aeroscripts

The problem in my case was "train_steps" in the model json file. Default is 0. The notebook sets it to 401000 which works.

7 hours ago by choxi

Is there anything a non-AI researcher can do to help support this project? Is there a way to donate money? Or could a software engineer help with testing, tooling, or other kinds of infrastructure?

I was really excited about OpenAI's original plan and still believe that an open source solution is the best way to prevent the potential negative impacts AI might have on society. I can sort of appreciate why OpenAI went the route of going private and trying to monetize their work instead, it might prevent people from using their work nefariously and will probably provide them with way more capital to continue their efforts. But, I trust humanity as a collective more than any particular group of people in the long run. I'm sure there are many others like me who would be eager to help out if they could.

Edit: EleutherAI has a whole page on their site about how others can contribute: https://www.eleuther.ai/get-involved/. I didn't see anything about accepting donations though, if anyone involved with the project was interested in setting up a crowdfunding account somewhere I'd be eager to donate.

6 hours ago by zmix

You may indirectly support the project by supporting the host, that hosts their data, https://the-eye.eu

Right on the front they write:

    > Hey there fans! We are currently looking for help funding large storage upgrades, 
    > if you want to help us serve more data see our donation options (crypto, etc) 
    > Thanks for reading, happy downloading!

6 hours ago by stellaathena

The Eye has been a phenomenal partner and enables a lot of what we do. In addition to providing terabytes of storage for free, they also help us out with CPU from time to time.

6 hours ago by punnerud

Indirectly they say you can donate money, in the form of computation that can be rented: “As an independent organization, we are dependent upon donations for our computing costs. We are always interested in speaking with people who can donate compute times.”

8 hours ago by FL33TW00D

Whilst obviously BERT is not the same as GPT-3 in architecture, Amazons recent paper discussing architecture optimizations for BERT seems pretty relevant here (https://arxiv.org/pdf/2010.10499.pdf) given the chance to improve upon GPT-3s architecture (because it surely isn't the best we can get). Have the Eleuther.ai team been exploring this?