The new SDK of Google Gemini - a step backward?

Google introduced a new library for Gemini

Last month, Google released a new SDK for Gemini. This SDK includes a lot of improvements, including “thinking mode” for Gemini 2.5. However, integrating it right away is more than just an “upgrade”. While they have included a detailed migration document, the challenge of backward-compatibility is introduced.

When you’re building with LLMs, you live and die by traceability: what prompt did I send, with what settings, and what came back? OpenAI and Google Gemini take different philosophies here, and only one of them is making things easier for developers.

What OpenAI Does (The Reference Experience)

OpenAI’s chat completion API expects a request like this:

{
  "model": "gpt-4o-mini",
  "messages": [
                {"role": "user", "content": "If the earth is flat, respond with True."}
              ],
  "temperature": 0.7,
  "top_p": 1,
  "max_tokens": 32
}

{
  "model": "gpt-4o-mini",
  "messages": [
                {"role": "user", "content": "If the earth is flat, respond with True."}
              ],
  "temperature": 0.7,
  "top_p": 1,
  "max_tokens": 32
}

What you get back:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1730911111,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "message": {"role": "assistant", "content": "False"},
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 17,
    "completion_tokens": 2,
    "total_tokens": 19
  }
}

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1730911111,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "message": {"role": "assistant", "content": "False"},
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 17,
    "completion_tokens": 2,
    "total_tokens": 19
  }
}

What Gemini Does (And Where It Breaks)

Gemini’s request is even more complicated, thanks to the contents.parts all the new knobs (topK, etc).

Sample Gemini request:

{
  "model": "gemini‑2.5‑flash",
  "contents": [
    {
      "parts": [{ "text": "If the earth is flat, respond with True." }]
    }
  ],
  "generationConfig": {
    "temperature": 0.7,
    "topP": 0.8,
    "topK": 40,
    "maxOutputTokens": 64,
    "presencePenalty": 0.0,
    "frequencyPenalty": 0.0,
    "stopSequences": ["\n"]
  }
}

{
  "model": "gemini‑2.5‑flash",
  "contents": [
    {
      "parts": [{ "text": "If the earth is flat, respond with True." }]
    }
  ],
  "generationConfig": {
    "temperature": 0.7,
    "topP": 0.8,
    "topK": 40,
    "maxOutputTokens": 64,
    "presencePenalty": 0.0,
    "frequencyPenalty": 0.0,
    "stopSequences": ["\n"]
  }
}

Key takeaways:

Optional fields include tools, structured output schema, and safety filters.

contents is mandatory, supports flexible input types (text, inline data, files, multimodal).
generationConfig holds all standard sampling knobs: temperature, topP, topK, and more.

But here’s the kicker: none of your sampling configuration comes back in the response.

Sample Gemini response:

{
  "candidates": [
    {
      "content": {
        "parts": [{ "text": "False" }],
        "role": "model"
      },
      "generationMetadata": { "finishReason": "STOP" }
    }
  ],
  "modelVersion": "gemini-2.5-flash-001",
  "usageMetadata": {
    "promptTokenCount": 9,
    "candidatesTokenCount": 6,
    "totalTokenCount": 15,
    "thoughtsTokenCount": 5
  }
}

{
  "candidates": [
    {
      "content": {
        "parts": [{ "text": "False" }],
        "role": "model"
      },
      "generationMetadata": { "finishReason": "STOP" }
    }
  ],
  "modelVersion": "gemini-2.5-flash-001",
  "usageMetadata": {
    "promptTokenCount": 9,
    "candidatesTokenCount": 6,
    "totalTokenCount": 15,
    "thoughtsTokenCount": 5
  }
}

Observations:

No echo of temperature, topP, topK, or even the actual prompt.
Only gives you the output, finish reason, and token usage.

Why This Matters (And Why OpenAI’s Approach Is Still Better)

OpenAI’s flat schema + ecosystem makes logging trivial.
When I build on OpenAI, it’s easy to log both request and response in one go. Every tool, wrapper, and dashboard expects this, and most cloud logs give me the full picture by default.
Gemini’s extra controls require extra care—but they disappear after you hit “send.”
The more you tune, the more you need transparency. With Gemini, you’re responsible for logging all your parameters yourself—otherwise, you can’t audit, debug, or reproduce anything later.
No context for model behavior.
If output changes unexpectedly, with OpenAI you can quickly cross-reference logs to see if temperature or another parameter was changed. With Gemini, that’s on you, or it’s lost.
Vendor lock-in through inconvenience.
Gemini’s “stateless” responses force you to build your own logging infra for every experiment. With OpenAI, you just save the API payload and response and you’re covered.

My Take

OpenAI’s API is not perfect—they could also echo sampling parameters in the response for complete traceability. But their ecosystem, flat schema, and general developer culture make it easy to never lose context.

Google Gemini, in its push for “advanced controls” and stateless API design, actually makes things more fragile for serious builders. The more settings you tweak, the more bookkeeping you have to do—because Google refuses to echo your own config back to you.

Until Gemini changes this, my workflow has to include a custom bridge and careful request logging.

(This is my personal opinion as someone who builds with both every week.)

The new SDK of Google Gemini – a step backward?