A large multimodal model (taking images and text as input, producing text as output) that, while inferior to humans in many real-world scenarios, demonstrates human-level performance in a variety of professional and academic tests. You can sign up for the API waiting list.