Building AI Agents with CrewAI

What Are AI Agents — and Why Multi-Agent Systems?

An AI agent is an LLM that can take actions: searching the web, reading files, writing code, calling APIs. Unlike a single prompt-response, agents can plan a sequence of steps, use tools, observe results, and adjust their approach until a goal is achieved.

Multi-agent systems take this further by assigning specialized roles to different agents. Just as a company has a researcher, a writer, and an editor — each expert in their domain — a CrewAI crew has agents with defined personas, goals, and tool access. The agents collaborate, passing outputs between them, to produce results no single agent could achieve alone.

Common patterns: a Researcher gathers information → a Analyst synthesizes it → a Writer drafts the output → a Reviewer checks quality. CrewAI makes this orchestration simple.

Installing CrewAI

pip install crewai crewai-tools

# Verify installation
python -c "import crewai; print(crewai.__version__)"

You'll also need an OpenAI API key (or you can configure CrewAI to use other LLM providers like Anthropic, Groq, or local Ollama models):

export OPENAI_API_KEY="sk-..."

Core Concepts

Agent: An LLM with a role, goal, backstory, and optional tools. The backstory shapes its behavior — "You are a meticulous fact-checker who never assumes..." produces very different behavior than a generic agent.
Task: A specific unit of work assigned to an agent. Has a description, expected output, and optionally an output file or structured output format.
Crew: The orchestrator that manages multiple agents and tasks. Supports sequential (one after another) and hierarchical (manager delegates to workers) processes.
Tools: Functions agents can call — web search, file operations, code execution, API calls. CrewAI ships with many built-in tools.

Creating Your First Agent: The Researcher

from crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool

# Tool for web search (requires SERPER_API_KEY env var, free tier available)
search_tool = SerperDevTool()

researcher = Agent(
    role="Senior Research Analyst",
    goal="Uncover cutting-edge developments in AI and summarize them accurately",
    backstory=(
        "You are a veteran technology researcher with 15 years of experience. "
        "You're known for your ability to cut through hype and identify what's "
        "genuinely important. You always verify claims with multiple sources."
    ),
    tools=[search_tool],
    verbose=True,          # logs agent's thinking
    allow_delegation=False, # this agent won't delegate to others
    llm="gpt-4o",          # you can specify per-agent LLMs
)

Creating a Second Agent: The Writer

writer = Agent(
    role="Tech Content Writer",
    goal=(
        "Create engaging, accurate articles that make complex AI topics "
        "accessible to a general tech audience"
    ),
    backstory=(
        "You are an experienced technology journalist who has written for "
        "publications like Wired and MIT Technology Review. You translate "
        "technical findings into compelling narratives without dumbing them down."
    ),
    tools=[],  # writer doesn't need search tools
    verbose=True,
    allow_delegation=False,
    llm="gpt-4o",
)

Defining Tasks

research_task = Task(
    description=(
        "Research the latest developments in AI agents published in the last 30 days. "
        "Focus on: new frameworks, benchmark results, and real-world deployments. "
        "Identify the 3 most significant developments and explain why each matters."
    ),
    expected_output=(
        "A structured report with 3 sections, each covering one major development. "
        "Include: what it is, why it matters, and a direct source URL."
    ),
    agent=researcher,
    output_file="research_report.md",  # saves output to file
)

write_task = Task(
    description=(
        "Using the research report provided, write a 600-word blog article "
        "titled 'This Week in AI Agents'. Maintain an informative but conversational tone. "
        "Include an intro, one section per development, and a conclusion with your take."
    ),
    expected_output=(
        "A complete, publication-ready blog article in markdown format. "
        "The article should flow naturally and not feel like a list of facts."
    ),
    agent=writer,
    context=[research_task],  # tells CrewAI this task depends on research_task's output
    output_file="blog_article.md",
)

Assembling and Running the Crew

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, write_task],
    process=Process.sequential,  # tasks run in order
    verbose=True,
)

result = crew.kickoff()
print("=== FINAL OUTPUT ===")
print(result)

When you run this, you'll see each agent's thinking process in the terminal — what they search for, what they find, how they formulate their output. CrewAI handles the orchestration so you don't need to manually pass outputs between agents.

Using Tools

CrewAI provides many built-in tools. Here are the most useful:

from crewai_tools import (
    SerperDevTool,        # web search via Serper API
    FileReadTool,         # read local files
    FileWriterTool,       # write files
    DirectoryReadTool,    # list directory contents
    WebsiteSearchTool,    # search a specific website
    PDFSearchTool,        # RAG over PDF files
    CodeInterpreterTool,  # execute Python code
)

# Give an agent multiple tools
data_analyst = Agent(
    role="Data Analyst",
    goal="Analyze data files and produce insights",
    backstory="Expert data analyst who codes in Python",
    tools=[FileReadTool(), FileWriterTool(), CodeInterpreterTool()],
)

Debugging and Monitoring Your Crew

Set verbose=True on both agents and the crew to see the full thought process. In production, CrewAI integrates with Agentops for observability — add agentops.init() before your crew runs to get a dashboard with timings, token usage, and full trace logs.

Common debugging tips:

If an agent loops without progress, check that its tools are returning useful data
If output quality is low, strengthen the backstory and expected_output fields
Use Process.hierarchical with a manager LLM for complex workflows where a manager agent should coordinate workers

Real Use Cases

Content Pipeline

Researcher → Outline Writer → Article Writer → SEO Reviewer → Editor. Each agent specializes. The crew produces a complete, SEO-optimized article from a single topic keyword input.

Research Automation

Provide a list of companies. A Researcher agent searches each one, a Data Extractor pulls specific fields (revenue, headcount, funding), and a Report Writer formats a comparison table.

Code Review Crew

Feed a pull request diff. A Security Auditor checks for vulnerabilities, a Performance Analyst looks for bottlenecks, and a Code Quality Reviewer checks style — each files their findings, which a Summary Agent consolidates into a final review comment.

Limitations and Best Practices

Costs add up fast: Multi-agent systems make many LLM calls. Always test with GPT-4o mini before switching to GPT-4o
Non-determinism: Agents don't always produce the same output. Build workflows that are tolerant of variation
Long tasks need checkpoints: Use output_file on intermediate tasks so you can resume from a checkpoint if something fails
Avoid circular dependencies: Tasks should have a clear dependency order; circular task dependencies will cause infinite loops
Keep backstories focused and specific — vague backstories lead to generic outputs