Emerging risks and opportunities from large language models

Posted at — May 22, 2022


Big gains from pretrained language models (from internet corpora). We now have a paradigm: taking a large pre-existing model and fine-tune on some downstream task.

Can we trust these models?

Language models are now being used to generate new synthetic data. Results even suggest the models are good at recovering facts. Since it's pretty hard to evaluate (quantify) the quality of natural language, we have come with a dream:

Train language models to evaluate natural language (against high-quality, human judgement). Then, use these trained models as an evaluation metric.

We are getting very close to human-level evaluation! However, LLMs break in surprising ways:

Does pretraining result in new harms?

Privacy concerns