LangChain and the DIY LLM Tool Problem

LangChain has been impossible to avoid for the last six months. Every tutorial on building LLM applications uses it. Every GitHub repo that starts with "AI-powered" depends on it. I've been using it, fighting it, and forming opinions worth sharing.

The honest version: LangChain solves real problems. It also introduces real costs. And the decision about whether to use it for a given project is more nuanced than either the "use it for everything LLM-related" or the "just use the API directly" camps would have you believe.

What LangChain Actually Gives You

Three things that are genuinely useful, stated precisely:

Chain composition. LangChain makes it easy to wire together a sequence of LLM calls where the output of one feeds the input of the next, with prompt templates that parameterize the handoffs. Without LangChain, you write this wiring by hand — not hard, but boilerplate. LangChain makes the composition explicit and readable.

Tool definitions and calling. LangChain has a structured way to define tools (functions the LLM can call), describe them to the model, and handle the model's tool call requests. The alternative — implementing the OpenAI function calling spec by hand — is straightforward but takes an hour you don't need to spend if LangChain already has it.

Memory management. LangChain has abstractions for conversation memory: buffered, summary-based, vector-stored. Useful for multi-turn applications where you need to control how much context gets passed to each call. Rolling your own is doable; LangChain's implementations are reasonable defaults.

What LangChain Costs You

The abstraction overhead. LangChain puts several layers between you and the API call. When something goes wrong — wrong output format, unexpected model behavior, a tool call that doesn't fire — debugging means tracing through LangChain's chain execution, not your code. The stack traces are long and the error messages are often less informative than what you'd get from the raw API response.

A concrete example: I was building a tool that calls a data catalog API to look up table schemas before generating a pipeline config. The LangChain tool definition looked clean. The tool wasn't getting called consistently — the model would sometimes try to answer from its training data instead of calling the tool. Debugging required instrumenting LangChain's internals to see what prompt was actually being sent. The same issue with direct API calls would have been visible in one print statement.

A Direct Comparison

The same schema lookup tool, implemented both ways:

# LangChain version
from langchain.agents import tool
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, AgentType

@tool
def get_table_schema(table_name: str) -> str:
    """Look up the schema for a table in the data catalog."""
    return catalog_client.get_schema(table_name)

llm = ChatOpenAI(model="gpt-4", temperature=0)
agent = initialize_agent(
    tools=[get_table_schema],
    llm=llm,
    agent=AgentType.OPENAI_FUNCTIONS,
)
result = agent.run("Get the schema for the orders table and summarize the key fields")
# Direct API version
import openai, json

def get_table_schema(table_name: str) -> str:
    return catalog_client.get_schema(table_name)

tools = [{
    "type": "function",
    "function": {
        "name": "get_table_schema",
        "description": "Look up the schema for a table in the data catalog.",
        "parameters": {
            "type": "object",
            "properties": {"table_name": {"type": "string"}},
            "required": ["table_name"]
        }
    }
}]

response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Get the schema for the orders table and summarize the key fields"}],
    tools=tools,
)

if response.choices[0].message.tool_calls:
    call = response.choices[0].message.tool_calls[0]
    result = get_table_schema(**json.loads(call.function.arguments))
    # second call with result...

The LangChain version is fewer lines. The direct version is fewer abstractions — when it breaks, you see exactly where and why.

The Guideline I've Landed On

Use LangChain when: you're building something with multiple chains, complex memory management, or a large number of tools where the composition overhead is real. Don't use it when: you have one or two API calls, a simple tool definition, or any situation where debuggability matters more than setup speed.

For the kind of data engineering tooling I've been building — schema-aware pipeline config generation, catalog-integrated code review — the direct API approach has been more maintainable. LangChain may earn its place as the use cases get more complex. As always, I'm here to help.

Read more