Five 5-minute reads/videos to keep you learning
1.Transformer Math 101
This article provides essential numbers and equations for working with large language models (LLMs). It covers topics such as compute requirements, computing optima, minimum dataset size, minimum hardware performance, and memory requirements for inference.
2. Why LLaVa-1.5 Is a Great Victory for Open-Source AI
LLaVa-1.5, a smaller yet powerful alternative to OpenAI’s GPT-4 Vision, proves the potential of open-source models for Large Multimodal Models (LMMs). It emphasizes the significance of understanding multimodality in AI, debunking doubts about the feasibility of open-source approaches.
3. GPT-4 Vision Prompt Injection
Vision Prompt Injection is a vulnerability that allows attackers to inject harmful data into prompts via images in OpenAI’s GPT-4. This risks system security, as attackers can execute unauthorized actions or extract data. Defending against this vulnerability is complex and may affect the model’s usability.
4. GPT-4 is Getting Faster
GPT-4 is rapidly improving its response speed, particularly in the 99th percentile, where latencies have decreased. GPT-4 and GPT-3.5 maintain a low latency-to-token ratio, indicating efficient performance.
5. Introducing The Foundation Model Transparency Index
A team of researchers from Stanford, MIT, and Princeton has developed a transparency index to evaluate the level of transparency in commercial foundation models. The index, known as the Foundation Model Transparency Index (FMTI), assesses 100 different aspects of transparency, and the results indicate that there is significant room for improvement among major foundation model companies.
Papers & Repositories
1.BitNet: Scaling 1-bit Transformers for Large Language Models
BitNet is a 1-bit Transformer architecture designed to improve memory efficiency and reduce energy consumption in large language models (LLMs). It outperforms 8-bit and FP16 quantization methods and shows potential for effectively scaling to even larger LLMs while maintaining efficiency and performance advantages.
2. HyperAttention: Long-context Attention in Near-Linear Time
HyperAttention is a novel solution that addresses the computational challenge of longer contexts in language models. It outperforms existing methods using Locality Sensitive Hashing (LSH), considerably improving speed. It excels on long-context datasets, making inference faster while maintaining a reasonable perplexity.
3. Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
This paper introduces a new framework called Self-RAG. It is an enhanced model that improves Retrieval Augmented Generation (RAG) by allowing language models to reflect on passages using “reflection tokens.” This improvement leads to better responses in knowledge-intensive tasks like QA, reasoning, and fact verification.
4. PaLI-3 Vision Language Models: Smaller, Faster, Stronger
This paper presents PaLI-3, a smaller, faster, and stronger vision language model (VLM) that compares favorably to similar models that are 10x larger. It utilizes a ViT model trained with contrastive objectives, which allows it to excel in multimodal benchmarks.
5. DeepSparse: Enabling GPU-Level Inference on Your CPU
DeepSparse is a robust framework that enhances deep learning on CPUs by incorporating sparse kernels, quantization, pruning, and caching of attention keys/values. It achieves GPU-like performance on commonly used CPUs, enabling efficient and robust deployment of models without dedicated accelerators