Feb 26 2024
Google open-sourced LLMs and Nvidia CEOs thoughts on Countries and AI
Gemma: Open LLMs by Google
Google released two open LLMs: Gemma 2B and 7B. They are available on the Hugging Face Hub as base and instruct LLMs:
HuggingFace GEMMA Collection
About Gemma
They have an architecture similar to Llama 2 but with an unusually large vocabulary of 256,000 tokens. It’s 8x more than Llama 2 and 100k more tokens than Qwen-1.5.
Gemma 7B seems to outperform all other 7B models, including Mistral and Qwen1-5 7B. However, while Gemma 2B also performs well in its category, it seems that Microsoft’s Phi-1.5 remains better.
Gemma is Ready to be deployed on Vertex AI
COSMO 1B: A Tiny LLM by Hugging Face
Like the Phi models, COSMO 1B is entirely trained on synthetic data. The data were generated by Mixtral-8x7B instruct. Hugging Face released the datasets they have used to train the model.
It doesn’t look good for Cosmo which seems to underperform models of similar sizes or smaller such as TinyLlama. However, note that Cosmo has only been trained for 15 hours on 180B tokens, 16.6 times less than TinyLlama. The training loss was still decreasing:
Jensen Huang: Every Country Needs Sovereign AI
Jensen Huang describes transformative potential of sovereign AI at World Governments Summit.
Every country needs to own the production of their own intelligence, NVIDIA founder and CEO Jensen Huang told attendees Monday at the World Governments Summit in Dubai.
“It codifies your culture, your society’s intelligence, your common sense, your history – you own your own data,” Huang told Al Olama during their conversation, a highlight of an event attended by more than 4,000 delegates from 150 countries.
“We completely subscribe to that vision,” Al Olama said. “That’s why the UAE is moving aggressively on creating large language models and mobilizing compute.”
Deep Dive: Deploying LLMs in Production
Deploying Large Language Models (LLMs) in a production environment presents a unique set of challenges that go beyond the capabilities of the models themselves.
1. **No Access to OpenAI API**:
2. Private LLMs like **Llama2-7b** or **Mixtral-7b** may offer a solution to the dependency issue, but they come with their own set of challenges.
3. **Context Window Limitations**:
LLMs, by design, have limitations on the size of the input they can process (known as the **'context window'**).
1. **Scalability and Performance**:
2. **Quality of Summarization**:
The ultimate goal of using an LLM in production is to produce high-quality, relevant summaries. However, ensuring the consistency and accuracy of these summaries is challenging.
Strategies for Ensuring Quality Summaries:
Prompt Techniques: Fine-tuning prompts to align with desired outputs.
From my personal experience, the **CoT (Chain of Thoughts)** prompt gives the best results when it comes to summary tasks.
SUMMARY_PROMPT_TEMPLATE = (
"Task: Analyze and summarize the provided legal or financial text enclosed within triple backticks. "
"Your summary should be concise, aiming for around 150 words, while capturing all essential aspects of the document. "
"Focus on the following key elements: "
"Think step by step:"
"1. Accurately identify the document's nature (e.g., legal letter, invoice, power of attorney) and the involved parties, including specific names, addresses, and roles (sender, recipient, debtor, creditor, etc.). "
"2. Clearly state the main purpose or subject matter of the document, including any legal or financial context (e.g., debt collection, contract details, claim settlement). "
"3. Provide exact financial details as mentioned in the document. This includes total amounts, itemized costs, interest rates, and any other monetary figures. Be precise in interpreting terms like 'percentage points above the base interest rate' and avoid misinterpretations. "
"4. If applicable, note any specific requests, deadlines, or instructions mentioned in the document (e.g., repayment plans, settlement offers). "
"5. Correctly interpret and include any relevant legal or financial terminology. "
"6. Identify and include any additional details that provide context or clarity to the document's purpose and contents, such as case numbers, invoice details, or specific legal references. "
"7. Avoid introducing information not present in the original text. Ensure your summary corrects any inaccuracies identified in previous evaluations and does not repeat them. "
"8.Recheck for step 7: do not introduce details there are not present in the original texts "
"Don't do assumption about some information.Focus only on original text"
"Text: {text} "
"SUMMARY:"
)
This prompt template can be used with LangChain to build a Prompt Template and run the inference.
Summary Metrics: Metrics to evaluate summary quality.
A metric based on Accuracy, Relevance, and Adherence (more details in the next section).
LLM Evaluator: A larger LLM, Llama2-13B or Mixtral-13B instructed to evaluate the summaries based on Accuracy, Relevance, and Adherence.
Human Feedback: Integrating humans check for final validation.
A Streamlit evaluation interface connected to DynamoDB where the summaries generated by our Llama2-7B/Mixtral-7B are stored. Human evaluators can read the original document and analyze the generated summary.
The evaluation is made based on a composed metric(accuracy, relevancy, and adherence) with a score between 0-2.
3. Composed summary metric
In my professional journey, I've developed a composed summary metric focusing on three key aspects: Accuracy, Relevance, and Adherence.
🎯 Accuracy
Score 0: We avoid summaries with significant inaccuracies or misleading information.
Score 1: We aim to improve summaries that are not fully accurate for reliable information extraction.
Score 2: Our goal is to achieve highly accurate summaries that capture the main points with only minor inaccuracies that don't impede understanding.
🔍 Relevancy
Score 0: We discard summaries that are completely irrelevant and lack meaningful information.
Score 1: We refine somewhat relevant summaries but miss some main themes of the document.
Score 2: We strive for highly relevant summaries, encapsulating all main themes with precision.
📐 Adherence
Score 0: We reject summaries that completely disregard the document's structure and content.
Score 1: We work on summaries with major adherence issues to reach a coherent structure.
Score 2: We aspire to create summaries that perfectly adhere to the document's structure, mirroring its logical flow with precision.