LLMs

Proprietary LLMs
OpenAI (GPT Series)
Google
Anthropic (Claude Series)
Amazon
AI21 Labs
Cohere
IBM
Open-Source LLMs
Meta (Llama Family)
Mistral AI
TII / Falcon
Alibaba (Qwen Models)
BigScience (BLOOM)
EleutherAI

Proprietary LLMs

OpenAI (GPT Series)

GPT-4o (Omni) Version 2024

Capabilities: Multimodal reasoning (text, vision, audio), real-time voice conversations (speech-to-text, spoken responses), and advanced text generation & Q&A.
Input Types: Text, Images (vision), Audio (voice).
Usage Limit: ChatGPT Plus offers approximately 80 messages per 3 hours (for GPT-4o), with a free GPT-4o mini version available with lower limits.
API Availability: Yes.
Prompt Engineering Tips: Few-shot examples can improve performance on specific tasks, but GPT-4o often excels zero-shot. Note that compared to GPT-4.1, which is trained to follow instructions more closely and literally, GPT-4o (and other predecessors) tended to more liberally infer intent.

GPT-4 Turbo Version 2023 (8K/32K)

Capabilities: Complex reasoning and creative writing, code generation and debugging, and Vision (GPT-4V) for image understanding.
Input Types: Text.
Usage Limit: ChatGPT Plus offers approximately 50 messages per 3 hours (for the 8K model); API usage has rate limits by token quota.
API Availability: Yes.
Prompt Engineering Tips: Use system/context messages to set roles (e.g., “You are a helpful analyst…”). Take advantage of multimodal input by including relevant images or diagrams. For code, prompting with comments describing the task improves accuracy. Utilize the model’s “thought summary” feature (if enabled) to have it explain its reasoning.

GPT-4.1 Family

Capabilities: Represents a significant step forward from GPT-4o in capabilities across coding, instruction following, and long context. It is highly steerable and responsive to well-specified prompts. It's noted as a great place to build agentic workflows and achieves state-of-the-art performance for non-reasoning models on SWE-bench Verified. GPT-4.1 has undergone more training on effectively utilizing tools passed as arguments in an OpenAI API request. It demonstrates outstanding instruction-following performance.
Prompt Engineering Tips: Many typical best practices still apply, such as providing context examples, making instructions as specific and clear as possible, and inducing planning via prompting to maximize model intelligence. Getting the most out of this model may require some prompt migration as it is trained to follow instructions more closely and literally than predecessors; a single sentence unequivocally clarifying desired behavior is almost always sufficient to steer the model.

For agentic workflows, include three key types of reminders: Persistence (keep going until the query is completely resolved), Reflectivity (MUST plan extensively before each function call, and reflect extensively on outcomes), and Adherence (reminders to follow instructions).
Exclusively use the API tools field to pass tools, rather than manually injecting tool descriptions into the prompt, to minimize errors. Name tools clearly and add a detailed description in the "description" field; for tool usage examples, use an # Examples section in the system prompt.
Start with a basic chain-of-thought (CoT) instruction like "First, think carefully step by step about what documents are needed...". Improve CoT by auditing failures and addressing systematic errors (misunderstanding user intent, insufficient context/analysis, incorrect step-by-step thinking) with more explicit instructions.
For output formatting, Markdown is recommended, using titles, inline backticks, and lists. XML also performs well, convenient for wrapping sections with start/end tags and metadata. JSON is highly structured but can be more verbose. In long context testing, XML performed well for structuring documents, while JSON performed particularly poorly. Use judgment to provide clear information that "stands out" to the model; if retrieving content with lots of XML, an XML-based delimiter may be less effective.
Start prompt development with an overall “Response Rulesˮ or “Instructionsˮ section with high-level guidance. Add specific sections for details (e.g., # Sample Phrases). For specific steps, add an ordered list and instruct the model to follow them.
If behavior isn't as expected, check for conflicting, underspecified, or wrong instructions/examples; GPT-4.1 tends to follow the one closer to the end of the prompt. Add examples that demonstrate desired behavior, ensuring rules cite the example behavior.
All-caps or incentives are generally not necessary; start without them. If providing sample phrases, instruct the model to vary them to avoid repetitiveness. Provide instructions and potential examples to mitigate unwanted extra prose or formatting.
When providing factual information from retrieved context, always include citations immediately after the relevant statement(s), using the format [NAME](ID) or [NAME](ID), [NAME](ID).

Google

PaLM 2 (Legacy)

Capabilities: General-purpose LLM (multilingual, reasoning, coding). Specialty fine-tunes available.
Input Types: Text (some vision capabilities via Bard’s integration of Google Lens).
Usage Limit: Still available via Vertex AI (until 2024 deprecation).
API Availability: Yes (Vertex AI PaLM API, migrating to Gemini API).
Prompt Engineering Tips: Similar prompting to Gemini: straightforward instructions and few-shot examples for best results. Migration to Gemini is recommended for better multimodal support.

Gemini Family (for Google Workspace)

Capabilities: Can be used to improve writing, organize data, create original images, summarize information and surface insights, have better meetings with automatic note taking, research unfamiliar topics, and spot trends/synthesize information/identify business opportunities. In Google Workspace, you can personalize output with information from your own files in Google Drive. Available across Docs, Sheets, Slides, Gmail, Meet, Drive, and Gemini Advanced.
Input Types: Primarily text interaction within Workspace applications; Gemini Advanced allows chat-based interaction and file uploads. Can reference images in Drive documents.
Prompt Engineering Tips:

Use natural language, writing as if speaking to another person with complete thoughts in full sentences.
Be specific and iterate. Tell Gemini exactly what you need it to do (summarize, write, change the tone, create) and provide as much context as possible. Make it a conversation; refine prompts if results aren't satisfactory.
Be concise and avoid complexity. State requests in brief but specific language; avoid jargon.
Use your documents by referencing files in Google Drive using @file name within the prompt. This allows Gemini to use information from those files to generate personalized responses.
Prompting often involves combining Persona, Task, Context, and Format elements, but you don't need to use all four in every prompt. Always include a verb or command as part of your task.
Review generated output for clarity, relevance, and accuracy before using it, as the final output is yours.
For complex or related tasks, break them into separate prompts.
Provide constraints for specific results, such as character count limits or the desired number of options.
Examples provided in the guide cover various roles (Program Manager, PR Manager, Analyst, Communications Manager, Customer Service, Executive, Marketing, Project Management, Sales, Business Owner, Head of Operations, Head of Product) and tasks, demonstrating how to apply these tips in practice.

Anthropic (Claude Series)

Claude 3 (Opus) 2024 flagship

Capabilities: Deep reasoning and complex problem solving (outperforms GPT-4 on some evaluations). Strong coding ability and knowledge integration. Trained with principles to be helpful & harmless (Constitutional AI model). More precise instruction following than previous generations of Claude models. Offers thinking capabilities helpful for tasks involving reflection after tool use or complex multi-step reasoning. Excels at parallel tool execution.
Input Types: Text (up to ~75,000 words or 100k tokens).
Usage Limit: Claude Pro offers unlimited 100k-token queries (paid).
API Availability: Yes – Anthropic API (Claude-v3), and via AWS Bedrock & Google Vertex AI.
Prompt Engineering Tips:

Be Explicit: Claude 4 models respond well to clear, explicit instructions. Be specific about desired output. More explicit requests might be needed for "above and beyond" behavior seen in previous models. Explain the context or motivation behind instructions.
Be Vigilant with Examples & Details: Claude 4 pays attention to details and examples. Ensure examples align with desired behaviors and minimize undesired ones. Use 3-5 diverse, relevant examples; wrap them in <example> tags (nested within <examples> if multiple). More examples generally lead to better performance, especially for complex tasks. Examples are your "secret weapon shortcut" for accuracy, consistency, and quality.
Use XML Tags: Use XML format indicators like <smoothly_flowing_prose_paragraphs> tags to control the format. XML tags can also structure your entire prompt.
Match Prompt Style: The formatting style used in your prompt can influence Claude’s response style. Match your prompt style to the desired output style as much as possible.
Leverage Thinking (CoT): Guide Claude’s initial or interleaved thinking. Using CoT (letting Claude think step-by-step) increases performance. Claude often performs better with high-level instructions to just think deeply rather than step-by-step prescriptive guidance, though it can follow complex structured steps. Start general, then iterate with specific instructions based on Claude's thinking output.
Extended Thinking: Maximizes instruction following. Be clear and specific; use numbered steps for complex instructions. Allow enough budget. Don't pass Claude's thinking back in the user block. If Claude repeats thinking in output, instruct it not to.
Control Output Format: Tell Claude what to do instead of what not to do.
Role Prompting: Use the system parameter to give Claude a role (e.g., domain expert). This is the most powerful way to use system prompts. Put task-specific instructions in the user turn.
Prefill: Prefill Claude's response for greater output control.
Chain Prompts: For complex tasks with multiple distinct steps, chain prompts rather than handling everything in a single prompt. This is different from CoT, which is for in-depth thought within a step. Examples include multi-step analysis, content creation pipelines, data processing. Self-correction chains are possible.
Long Context Tips (100K+ tokens): Queries at the end can improve response quality. Put longform data (20K+ tokens) near the top of your prompt. Structure document content/metadata with XML tags (<document>, <document_content>, <source>). Ground responses in quotes from the documents for long document tasks.
Enhance Code Generation: For frontend code, provide explicit encouragement and modifiers like "Include as many relevant features..." or "Add thoughtful details...". Instruct Claude to clean up temporary files in agentic coding if preferred.
Prompt Improvement Tools: Anthropic offers a Prompt Generator which creates templates following best practices and a Prompt Improver that analyzes and enhances prompts through automated steps like chain-of-thought refinement and example enhancement. The Test Case Generator helps create examples.

Amazon

Amazon Titan Text Premier G1 (2024)

Capabilities: Enterprise-focused, excels at long-form text generation, summarization, classification. Optimized for Retrieval-Augmented Generation (RAG) and tool/agent integration (function calling). Emphasizes reliability and low hallucination.
Input Types: Text.
Usage Limit: Available via Amazon Bedrock (enterprise-grade, pay-per-use).
API Availability: Yes – AWS Bedrock API.
Prompt Engineering Tips: Structured prompts: Titan responds well to clear, bullet-point instructions (it’s tuned for business communication). For RAG, provide documents or knowledge base context. For function calling, supply a JSON schema or action list.

Amazon Titan Text Express G1 (2024)

Capabilities: Multilingual generation (100+ languages). Versatile: does code, “rich text” formatting, and API orchestration. Balanced between Lite and Premier. Supports conversation history.
Input Types: Text.
Usage Limit: Amazon Bedrock (GA). Geared for higher throughput.
API Availability: Yes – AWS Bedrock.
Prompt Engineering Tips: Similar to Premier: be concise and explicit. For multilingual output, specify the target language (Express will follow language instructions well). When doing step-by-step tasks, break the instructions into numbered steps – Express reliably follows ordered lists.

Amazon Titan Text Lite G1 (2024)

Capabilities: Basic text and code generation in English. Good for simple tasks and short prompts (e.g., brief emails, outlines). Fastest model, but less detailed outputs.
Input Types: Text.
Usage Limit: Amazon Bedrock (GA). Lowest cost tier.
API Availability: Yes – AWS Bedrock.
Prompt Engineering Tips: Few-shot examples can help. Provide a short example of the desired output format (for instance, show a sample Q&A pair). Lite is more likely to get off-track on open-ended prompts, so keep queries narrowly scoped.

AI21 Labs

Jurassic-2 Jumbo (J2, 2023)

Capabilities: Long-form text generation (stories, articles). Fluent in multiple languages. Strong knowledge and commonsense reasoning. Customizable: supports fine-tuning and task-specific variants.
Input Types: Text.
Usage Limit: AI21 Studio API (pay-per-call; free trial). ~100K token context in "Jamba" version.
API Availability: Yes – AI21 API (Studio) and via AWS Bedrock.
Prompt Engineering Tips: Similar tips as Grande. Particularly effective if you specify the role (e.g., “As an expert translator, ...”).

Jurassic-2 Grande

Capabilities: Not explicitly listed in the excerpt, but noted as used when lower latency is needed.
Arch/Performance Notes: ~30B parameter range. 8k context.
Prompt Engineering Tips: Similar tips as Jumbo. Particularly effective if you specify the role (e.g., “As an expert translator, ...”).

Jurassic-2 Light

Capabilities: Suitable for simple tasks, short responses.
Input Types: Text.
Arch/Performance Notes: Smallest AI21 model (<10B). Faster inference, but may require more guidance.
Prompt Engineering Tips: Similar tips as Jumbo. Particularly effective if you specify the role (e.g., “As an expert translator, ...”).

Cohere

Cohere Command (latest “Command A”, 2024)

Capabilities: Dialog and instruction following (trained for chat/instructions). Tool use and RAG integration. Strong multilingual support (>100 languages) and reasoning. Good coding support.
Input Types: Text.
Usage Limit: Commercial API (subscription/cloud credits). Supports very long inputs (up to 256k tokens in Command-A).
API Availability: Yes – Cohere API.
Prompt Engineering Tips: No specific prompting tips were provided for Command in the provided source excerpt.

IBM

IBM Granite Series (e.g., Granite-13B-chat, Granite 3.1)

Capabilities: Business-focused LLMs. Emphasize transparency and data governance. Solid at business dialogue, summarization, and specialized domains. Newer versions add 100k token context and multilingual support.
Usage Limit: IBM cloud (with fine-tuning options).
API Availability: Available via the watsonx platform.
Arch/Performance Notes: ~13B and 20B parameter decoder models.
Prompt Engineering Tips: Keep a formal tone; these models are tuned for enterprise compliance. They respond well to structured, factual queries and will follow any provided corporate policy guidelines in the prompt.

Open-Source LLMs

Meta (Llama Family)

LLaMA 3 (70B) Meta AI, 2024

Capabilities: General-purpose AI: strong chat and reasoning (rivals proprietary models). Multimodal trained – can interpret images, though outputs text. Excels at coding, math, multilingual tasks (improved over LLaMA 2).
Input Types: Text (image inputs possible via fine-tunes; base model outputs text only).
Usage Limit: Unlimited self-hosted (open model download under community license).
API/Access: Downloadable weights (70B & 8B) with permit. Available via API on HuggingFace Inference and Azure.
Arch/Performance Notes: Transformer decoder; 8B and 70B param versions released. Largest 70B is state-of-the-art open model until late 2024. Trained on 7x more data than Llama2 for higher knowledge and reduced hallucinations. Context window 4k tokens (longer in some fine-tunes).
Prompt Engineering Tips: Use Meta’s recommended chat format: prompt with roles, e.g., [System: …] \n[User: …] \n[Assistant: …] to leverage fine-tuning (trained on examples of this format). Few-shot works well – showing one or two QA examples will guide style and correctness. Asking it to “think step by step” can produce chain-of-thought explanations. Avoid extremely long prompts on the 8B model (limited capacity); the 70B can handle more nuance in instructions.

LLaMA 2 (70B) Meta AI, 2023 [deprecated]

Capabilities: Strong chat performance (especially fine-tuned “Chat” version). Coding and reasoning abilities slightly behind L3, but still competitive. Multilingual understanding (trained on 20+ languages).
Input Types: Text.
Usage Limit: Unlimited (self-hosted).
API/Access: Open weights (7B, 13B, 70B) downloadable (research & business license). Also on HuggingFace and Azure endpoints.
Arch/Performance Notes: 70B param model was top open model of 2023. 4k token context. Requires more prompting effort compared to newer models (less RLHF).
Prompt Engineering Tips: Use the official template: <s>[INST] Your instruction [/ INST] for best results, as described in Meta’s LLaMA2 documentation. Be explicit – LLaMA 2 will follow clear step-by-step asks but is more likely than LLaMA 3 to go off-track if instructions are ambiguous.

Llama 3.1 (405B) Meta AI, 2024

Capabilities: Massive-scale model, very high knowledge retention and reasoning (aimed to rival GPT-4). Handles complex queries with greater accuracy.
Input Types: Text (multimodal training, but primarily text output).
Usage Limit: Unlimited (open model, heavy compute required).
API/Access: Open release (checkpoint available to researchers). Not easily deployable without significant hardware.
Arch/Performance Notes: 405 billion parameters – largest openly available LLM as of 2024. Trained on vast corpus; expensive to run. Context length likely 4k-8k.
Prompt Engineering Tips: Similar approach as Llama 3 (70B) – use structured prompts and role identifiers. Due to context length limitations, it benefits from summarizing or chunking inputs if very long.

Mistral AI

Mistral 7B v0.1 (2023)

Capabilities: Surprisingly strong performance at 7B params (comparable to larger models). Good at concise answers, basic reasoning, and moderate-length dialogues. Spawned instruct versions for better prompt following.
Input Types: Text.
Usage Limit: Unlimited (open source under Apache 2.0). Lightweight – runs on a single GPU.
API/Access: Download weights; easy to run locally. Also offered via APIs (e.g. JumpStart on AWS).
Arch/Performance Notes: 7.3B parameters, decoder-only. Notable for its efficient training. 32k token context via sliding window attention.
Prompt Engineering Tips: For best results on the base 7B, provide a clear role or persona (it wasn’t heavily RLHF-tuned). Example: “You are a helpful assistant.” Then the user query. The instruct variant responds well to <|im_start|> system style prompts or simple direct instructions. Keep requests relatively short – the 7B can lose track in very long prompts. Few-shot examples can help it with format.

Mistral Mix-8x7B (MoE 56B) 2024

Capabilities: Mixture-of-Experts model (8 experts) yields strong performance (~>30B dense model) at lower compute cost. Excels at reasoning and math for its size (MoE helps specialize).
Input Types: Text.
Usage Limit: Unlimited (open, Apache 2.0).
API/Access: Open source weights for research. Requires custom MoE inference support. Limited – mainly research use.
Arch/Performance Notes: 8 × 7B experts (total ~56B parameters). 32k context. Demonstrated as “best open model to date” in Apr 2024.
Prompt Engineering Tips: Similar prompting as Mistral 7B. MoE models can be sensitive to the distribution of the prompt; it can help to explicitly structure complex questions (e.g., separate logical sub-questions with bullet points) so the MoE router can send parts to the right expert.

Mistral v3.1 (Small & Medium) 2025 updates

Capabilities: Small (~7–13B class) is a new leader in small category with some image understanding. Medium (~30B class), multimodal capable (e.g. Pixtral for images) with 128k context. Domain-specific models like Codestral (code generation specialist, 256k context) and Saba (Middle-East languages) broaden capabilities.
Input Types: Text (plus images for Pixtral models, code for Codestral etc.).
Usage Limit: Unlimited (open for latest versions; some “premier” versions under research license).
API/Access: Yes – Mistral API (for latest premier); many models released openly.
Arch/Performance Notes: Aggressively optimized architecture; 128k context on many models, FlashAttention and other efficiency tricks. Continual improvement. Edge-optimized versions run on mobile devices.
Prompt Engineering Tips: No specific prompting tips were provided for these models in the provided source excerpt.

TII / Falcon

Falcon 180B TII, 2023

Capabilities: Very high text generation quality – at release, top of many open benchmarks. Strong general knowledge and reasoning. Capable in coding and multilingual tasks.
Input Types: Text.
Usage Limit: Unlimited (Apache 2.0 license – fully open). Requires high-end hardware (≥4×A100 GPUs for full run).
API/Access: Open weights, APIs (HuggingFace, Bedrock).
Arch/Performance Notes: 180B params (decoder-only). 8k context.
Prompt Engineering Tips: No specific prompting tips were provided for this model in the provided source excerpt.

Alibaba (Qwen Models)

Qwen-2.5 72B-Instruct Alibaba, 2024

Capabilities: Top-tier open-chat model: ranked #1 on OpenCompass benchmark (outperforming Claude 3.5 and GPT-4o on coding/math). Excellent coding skills. Strong math and logic reasoning. Multilingual (supports 29+ languages) and instruction-following.
Input Types: Text.
Usage Limit: Unlimited (Apache 2.0 license, open weights). Requires ~8×32GB GPUs for full inference.
API/Access: Open weights, APIs (HuggingFace, Azure, others).
Arch/Performance Notes: 72B params. 128k context.
Prompt Engineering Tips: Similar to Qwen 14B. The instruct version is straightforward.

Qwen 14B

Arch/Performance Notes: 14.3B parameters. 8k context. Released with permissive license.
Prompt Engineering Tips: Instruct version is straightforward. For best results with 14B, you can slightly prime it by mentioning the context (e.g., “I have a question about finance: …”). It helps to focus the smaller model.

BigScience (BLOOM)

BLOOM 176B BigScience, 2022

Capabilities: Multilingual text generation (supports 46 languages + 13 programming languages). Knowledgeable but tends to require careful prompting (less RLHF fine-tuning). Can handle long outputs; good at translation and basic dialogue.
Input Types: Text.
Usage Limit: Unlimited (open RAIL license). Very resource-intensive (176B params).
API/Access: Weights freely downloadable. Hosted versions on HuggingFace API. Not widely in production.
Arch/Performance Notes: 176 billion parameters, transformer decoder. Trained on massive multilingual dataset by international collaboration. Has an Instruct variant (BLOOMZ) fine-tuned on prompts in multiple languages. 2048-token context.
Prompt Engineering Tips: Requires clear instruction since it’s not inherently tuned to follow human prompts. For best results, frame your query as part of a conversation or an explicit command: e.g., “Question: ... Answer:” to prompt it to complete the answer. It may produce completion-style output (continuing your prompt) – to get an answer, you might need to include the question in the prompt itself. Fine-tuned versions (like Dolly 2.0, etc.) will perform much better for Q&A style prompting.

EleutherAI

GPT-NeoX 20B EleutherAI, 2022

Capabilities: General text generation with decent coherence. Good for experimentation and as a baseline model for fine-tuning. Handles creative tasks reasonably, but weaker at complex reasoning compared to newer models.
Input Types: Text.
Usage Limit: Unlimited (MIT license, open).
API/Access: Weights on HuggingFace; easy to deploy on a single high-end GPU. Often used in downstream fine-tunes.
Arch/Performance Notes: 20B parameters, trained on the Pile dataset. No instruction tuning by default (pure LM). 2048 token context.
Prompt Engineering Tips: Requires clear instruction since it’s not inherently tuned to follow human prompts. For best results, frame your query as part of a conversation or an explicit command: e.g., “Question: ... Answer:” to prompt it to complete the answer. It may produce completion-style output (continuing your prompt) – to get an answer, you might need to include the question in the prompt itself. Fine-tuned versions (like Dolly 2.0, etc.) will perform much better for Q&A style prompting.