Hacker News

4 years ago by sillysaurusx

Please test their models before you take it at face value.

Eleuther has a history of claiming to replicate projects when they haven't. For example, they shipped a DALL-E repo a few days after OpenAI announced it (https://twitter.com/theshawwn/status/1348017515897659392) which was broken, and they've walked back their GPT-3 replication claims to replicating 1.5B due to the fact that their architecture doesn't scale.

As far as I can tell, they're generating a large amount of hype with grandiose claims that they can't deliver on.

All I care about is whether you like their models and actually use them in practice. If you do, please let me know and I'll pipe down. But so far, I haven't heard of anyone who uses anything they've produced, and that worries me. Has anyone?

One specific claim they made: https://twitter.com/BlancheMinerva/status/134727697554780980...

"DALL-E is quite straight forward and already coded. We just need data to train it."

No, DALL-E is neither straightforward nor was it successfully coded, especially back on January 7th.

Anyway, carry on. I really don't like speaking badly of AI projects, and I hope that they succeed. The model release today is a good step forward, assuming it works. But it might be better to have the expectation of "the models don't work" until proven otherwise.

I'd also like to point out that there are some capable people doing work at Eleuther. Sid in particular is one of the best TPU hackers in the scene. I just wish they would scale down their claims, release more models, and not claim that they've done X until actually doing X. For example, the readme says they have "the ability to scale up to full GPT3 sizes (and possibly more!), using the mesh-tensorflow library," which they don't.

4 years ago by loxias

Geez, that's really harsh.

I don't think any single thing you've claimed is factually wrong, and I don't speak for Eleuther nor am I attempting to justify their claims.

But.

As I understand it (mostly from lurking on their discord and reading publicly available materials) this is a group of volunteer academic types trying to replicate something great and awesome, with the only goal of giving it to the world. You could cut them some slack.

I can't speak for you, but as a "for free, weekend project" what they've done certainly makes me feel I need to up my game.

4 years ago by OgAstorga

This has nothing to do with the good work, awesome intentions, nor the fact that they have no financial incentives behind this.

Claiming something that is not true is in itself wrong.

4 years ago by stellaathena

I am sorry that I was misinformed about the state of our DALL-E replication when I made that tweet. It was not malicious - I was reporting what I had been told by someone else.

Yes I was wrong. That said, I had hoped that maybe after two an a half months Shawn would stop holding it over my head.

4 years ago by loxias

> Claiming something that is not true is in itself wrong.

I 100% agree with this.

I also think that one catches more flies with honey than vinegar, and the criticism in the parent comment, while possibly valid, could be phrased more encouragingly and less combatively. It's easy to criticize, it's hard to create, and it's even harder to release.

4 years ago by wiz21c

> Claiming something that is not true is in itself wrong.

yup, in any project, and especially the one done for the community where the only you get is satisfaction and fame, the success is super tied to communication. Good, honest communication is what builds trust.

4 years ago by undefined

[deleted]

4 years ago by nullc

Wouldn't it be nice if OpenAI were like .. actually open? :P

4 years ago by ryanackley

Not just that. To even get access to their API, you need to apply. That is the future of AI I'm afraid without projects like this. That is Elites controlling AI and deciding who is "worthy" to use it.

I'm sure they have the best of intentions but "worthiness" is subjective.

4 years ago by TremendousJudge

Depends on who you mean by "they". If you mean the researchers, then sure, they probably actually believe whatever's written in their ethics statement.

Now, the actual owners? I don't believe it for a second

4 years ago by brunoluiz

Considering last decade social consequences due to easy access to APIs and data, I am quite happy that these initiatives are cautious around opening up software which can have huge impact on society.

4 years ago by hesdeadjim

Unfortunately, the cat is out of the bag. Their methods are documented and the results exciting, so to a bad actor (especially state-sponsored) it's completely justified to spend millions attempting to replicate their results from what is publicly available.

4 years ago by You-Are-Right

Freedom is Slavery.

4 years ago by natch

Good perspective but I would like to hear the response from the developers before concluding too much.

This is not meant as a goad to you, but more just as more info for everyone, my understanding is it is an open source community of like minded people type project (as opposed to a bigco) and actively solicits contributions (by which I mean code and data) so anyone seeing room for improvement is welcome to step in from what I can tell.

I did find your comment helpful and informative; just adding another angle here.

4 years ago by stellaathena

It's literally a couple people hanging out in a discord channel and doing this as a way to procrastinate their jobs.

4 years ago by mirekrusin

Peak of laziness - build ai to do your job so you have more time building ai.

4 years ago by ShamelessC

I think it would help your PR efforts to let people know that more often. People hear "we are <org_name>" and assume you're, you know, an organization. That comes with some amount of expected bug fixes, documentation, verifying results _before_ you release, etc.

I'm not really sure how much you gain by attracting tons of people to the discord if you finally release and everyone has unreasonable expectations due to the way you advertise the group as a whole.

4 years ago by 6gvONxR4sf7o

People are always claiming to release replicated models by replicating the architecture (or main parts of it) but not testing whether it produces the same level of results. It's maddening, especially when the level of results is so directly measurable (just measure what the paper did, not that it's easy, just concrete).

4 years ago by stellaathena

Our README has comparisons with GPT-2 and GPT-3

4 years ago by 6gvONxR4sf7o

Is there anything on its few shot learning performance? I took few shot learning as the main point of GPT-3. Sorry if I just overlooked it, but I don’t see anything on few shot learning in the readme.

4 years ago by ve55

This is a nice release, but the title is a bit misleading as the released sizes (1.3B and 2.7B parameters) do not yet compare to the size of GPT-3 (175B), but rather GPT-2 (1.5B) instead (although future releases may have significantly more!).

Edit: title improved, thank you!

4 years ago by nl

Yeah. They say they are doing a 10B release soon[1].

I suspect they have run into training issues since they are moving to a new repo[2]

[1] https://twitter.com/arankomatsuzaki/status/13737326468119674...

[2] https://github.com/EleutherAI/gpt-neox/

4 years ago by chillee

It's more about hardware - these models were trained on TPUs, while GPT-NeoX is being trained on GPUs graciously provided by Coreweave.

4 years ago by orra

Any idea what the required GPU time would cost (if not donated)? Is GPT-3 just a commodity soon?

4 years ago by pizza

Fixed title to reflect that, thanks

4 years ago by ve55

I would perhaps change 'GPT-3' to just say 'GPT' instead, as a more salient fix.

4 years ago by stellaathena

GPT-3 isn't a single model. It's a model architecture that is very closely followed by GPT-Neo. The 2.7B model is the exact same size as something OpenAI sells under the label "GPT-3"

4 years ago by nl

This is incorrect. It's the GPT-3 model architecture and optimisations, and uses training techniques similar to GPT-3.

4 years ago by graiz

A great start for a truly open approach. It's ironic that OpenAI isn't particularly open about its tech.

4 years ago by victor9000

It was disappointing to see just how quickly ClosedAI changed its tune once they produced something of value.

4 years ago by catillac

Is ClosedAI some counterpart to OpenAI with a clever name?

4 years ago by qPM9l3XJrF

Many people (correctly, in my view) criticized OpenAI for the name, saying that openness should be evaluated on a case by case basis. Glad they listened to critics instead of trying to maintain consistency for its own sake.

4 years ago by choxi

Is there anything a non-AI researcher can do to help support this project? Is there a way to donate money? Or could a software engineer help with testing, tooling, or other kinds of infrastructure?

I was really excited about OpenAI's original plan and still believe that an open source solution is the best way to prevent the potential negative impacts AI might have on society. I can sort of appreciate why OpenAI went the route of going private and trying to monetize their work instead, it might prevent people from using their work nefariously and will probably provide them with way more capital to continue their efforts. But, I trust humanity as a collective more than any particular group of people in the long run. I'm sure there are many others like me who would be eager to help out if they could.

Edit: EleutherAI has a whole page on their site about how others can contribute: https://www.eleuther.ai/get-involved/. I didn't see anything about accepting donations though, if anyone involved with the project was interested in setting up a crowdfunding account somewhere I'd be eager to donate.

4 years ago by zmix

You may indirectly support the project by supporting the host, that hosts their data, https://the-eye.eu

Right on the front they write:

    > Hey there fans! We are currently looking for help funding large storage upgrades, 
    > if you want to help us serve more data see our donation options (crypto, etc) 
    > Thanks for reading, happy downloading!

4 years ago by stellaathena

The Eye has been a phenomenal partner and enables a lot of what we do. In addition to providing terabytes of storage for free, they also help us out with CPU from time to time.

4 years ago by dannyw

The Eye stores amazing and important archives. Drives the data hoarding community.

4 years ago by punnerud

Indirectly they say you can donate money, in the form of computation that can be rented: “As an independent organization, we are dependent upon donations for our computing costs. We are always interested in speaking with people who can donate compute times.”

4 years ago by pjfin123

GPT Paper:

"Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters"

README:

"1T or bust my dudes [...] An implementation of model & data parallel GPT2 & GPT3 -like models, with the ability to scale up to full GPT3 sizes (and possibly more!)"

It seems the largest model they released is 2.7 billion parameters or ~0.01 the size of GPT-3. The most interesting part about GPT-3 was its size and it seems this is only "GPT-3-like" in architecture.

I also have a translation library with ~100 million (0.001 GPT-3) parameters:

https://github.com/argosopentech/argos-translate

4 years ago by stellaathena

GPT-3 is a model architecture, not a model. While the largest GPT-3 model is 175B, that very paper has a table that includes "GPT-3 XL" (1.3B) and "GPT-3 2.7B" as models in the GPT-3 architecture. The 2.7B model is the same size as Ada, a model that OpenAI currently sells API access to under the moniker "GPT-3"

4 years ago by Dylan16807

None of the other models are even close to the big one, and the paper also suggests calling the big one "GPT-3". And people do that very often in practice. So it's often ambiguous but saying the term only means the architecture isn't right either.

4 years ago by f430

what does he mean when he says 1T or bust? Is he referring to 1 trillion parameters? Are you saying that GTP-3 has 2.7 trillion parameters? Does it mean that to get to GPT-3 level it needs 100x more amount of dataset?

4 years ago by jon_tau

The saying comes from a slide by Noam Shazeer (see: https://www.youtube.com/watch?v=HgGyWS40g-g&ab_channel=Tenso...). It just means the current goal should be to have models with 1 trillion parameters.

4 years ago by pjfin123

I interpreted it as aspiring to a trillion paramters but I'm not sure.

4 years ago by Voloskaya

GPT-3 has 175 billion parameters. So they need to scale by 64x. They already have a comparable amount of data than what was used by OpenAI, so it's about scaling the numbers of GPUs.

4 years ago by f430

I see so that means this GPT Neo is 64 less powerful?

4 years ago by PufPufPuf

Did anyone manage to successfully run inference in the provided Google Colab (https://colab.research.google.com/github/EleutherAI/GPTNeo/b...)? I can run training, but can't manage to make the inference (even from a pre-trained model) work.

4 years ago by stellaathena

Hi! Thanks for trying it out. There was a bug that should now be fixed. When I run the example unicorn prompt I get the follow. Don't hesitate to open an issue if you're still having trouble.

"In a shocking finding, scientists discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.

Bebek Uranzoglu, another member of the research team from the University of British Columbia, was working on a project the Latino-Canadian rodeo competition equipos to document a rare and remarkable ecosystem in the Andes Mountains.

His curiosity was piqued when he spotted an adolescent herd of about 10 unicorns foraging in a forest near the valley of the Jumbo Flu Group. The unicorns — whose numbers once swelled to 46,000 — were perched on the forest floor and watched the researchers work.

Urizoglu grew excited when he spotted another group that seemed to be thriving in an area below the herd. The team hoped the apparent population growth would indicate a human presence.

But when a team of researchers set up a camera trap, they were surprised to find the unicorns in the first place, and in a forest near a lake — in fact the forest was almost entirely made up of the animals. Despite their own magical presence, the team could not see the herd was populated by humans.

“The whole place almost smelled like animals,” says Bebek. “We were never able to find human footprints at any of the points we stood at. The trees were so large, you wouldn’t have been able to walk 40 meters through them. We assumed that the truth of the matter was, ‘Well the deer didn’t like this forest at all.’”

4 years ago by MasterScrat

Same here. I managed to make it "work" in the sense that it wouldn't crash during inference, but then it generated gibberish. Has anyone managed to make it work reliably?

4 years ago by aeroscripts

The problem in my case was "train_steps" in the model json file. Default is 0. The notebook sets it to 401000 which works.

4 years ago by FL33TW00D

Whilst obviously BERT is not the same as GPT-3 in architecture, Amazons recent paper discussing architecture optimizations for BERT seems pretty relevant here (https://arxiv.org/pdf/2010.10499.pdf) given the chance to improve upon GPT-3s architecture (because it surely isn't the best we can get). Have the Eleuther.ai team been exploring this?

4 years ago by leogao

Could the title of this post be change to emphasize that the model sizes released were 1.3B and 2.7B? Something like "EleutherAI releases 1.3B and 2.7B parameter GPT-like language models". The current title implies that a full sized GPT-3 model is currently available, which is not the case.

edit: the title has been changed, seems good enough

Daily digest email

Get a daily email with the the top stories from Hacker News. No spam, unsubscribe at any time.