feat: Add native async authentication for Vertex AI with aiohttp #15888

dharamendrak · 2025-10-24T07:22:31Z

Title

feat: Add native async authentication for Vertex AI with aiohttp

Relevant issues

Addresses scalability and resource utilization issues with Vertex AI authentication in high-concurrency async environments.

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
I have added a screenshot of my new test passing locally
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🆕 New Feature
✅ Test

Changes

Summary

Implement truly async token retrieval for Vertex AI credentials using aiohttp instead of running sync code in thread pools via asyncify. This provides better scalability and resource utilization under high concurrent load.

Implementation Details

New Async Methods:

refresh_auth_async() - Uses google.auth.transport._aiohttp_requests.Request with aiohttp for non-blocking token refresh
load_auth_async() - Async version of credential loading supporting all credential types (service accounts, authorized users, identity pools)
get_access_token_async() - Async token retrieval with proper credential caching
_handle_reauthentication_async() - Handles "Reauthentication is needed" errors in async context

Feature Flag:

Added LITELLM_USE_ASYNC_VERTEX_AUTH environment variable (default: false)
Can also be set programmatically via litellm.use_async_vertex_auth = True
Defaults to existing behavior for backward compatibility

Files Modified:

litellm/__init__.py - Added feature flag declaration
litellm/llms/vertex_ai/vertex_llm_base.py - Added all async authentication methods
tests/test_litellm/llms/vertex_ai/test_vertex_llm_base.py - Added 8 comprehensive test cases

Benefits

Performance:

True async I/O instead of blocking thread pool workers during network calls
Better resource utilization: handles thousands of concurrent requests without exhausting thread pool
Reduced memory footprint (1 event loop vs. N threads)

Reliability:

Explicit aiohttp session management with async with context manager
Eliminates potential "unclosed session" warnings
Proper cleanup guaranteed (not relying on garbage collection)

Scalability:

Can handle high concurrent load without thread pool saturation
Event loop efficiently manages waiting requests
No thread context switching overhead

Compatibility:

Fully backward compatible (feature flag defaults to false)
Shared credential cache between sync and async paths
No breaking changes to existing code

Testing

New Tests Added (8 comprehensive test cases):

test_async_auth_with_feature_flag_enabled - Verifies async methods are used when flag is enabled
test_async_auth_with_feature_flag_disabled - Verifies fallback to asyncify when flag is disabled
test_refresh_auth_async_with_aiohttp - Tests async token refresh
test_load_auth_async_service_account - Tests async credential loading for service accounts
test_async_token_refresh_when_expired - Tests expired token refresh in async path
test_async_caching_with_new_implementation - Verifies credential caching works correctly
test_async_and_sync_share_same_cache - Confirms sync and async share credential cache
test_load_auth_async_authorized_user - Tests async loading for authorized user credentials

Test Results:

✅ All 47 tests passing (8 new + 39 existing)
✅ No regressions
✅ Feature flag behavior verified
✅ Caching functionality confirmed
✅ Reauthentication error handling tested

Usage

Enable via environment variable:

export LITELLM_USE_ASYNC_VERTEX_AUTH=true

Enable programmatically:

import litellm
litellm.use_async_vertex_auth = True

# Then use acompletion as normal
response = await litellm.acompletion(
    model="vertex_ai/gemini-pro",
    messages=[{"role": "user", "content": "Hello"}],
    vertex_credentials="/path/to/credentials.json",
    vertex_project="my-project"
)

Technical Notes

Why aiohttp?

The old approach used asyncify which runs sync requests library in a thread pool
During network I/O (token refresh), threads are blocked waiting for response
New approach uses aiohttp for true async I/O - event loop is not blocked during network calls
Significantly better for high-concurrency scenarios

Session Management:

# Properly managed with async context manager
async with aiohttp.ClientSession(auto_decompress=False) as session:
    request = Request(session)
    await asyncio.get_event_loop().run_in_executor(
        None, credentials.refresh, request
    )
# Session automatically closed here

Credential Types Supported:

✅ Service accounts
✅ Authorized users (gcloud auth)
✅ Identity pools (Workload Identity Federation)
✅ AWS identity pools
✅ Default application credentials

Backward Compatibility

Default behavior unchanged (LITELLM_USE_ASYNC_VERTEX_AUTH=false)
Existing code continues to work without modifications
Opt-in feature flag allows gradual rollout
Both sync and async paths share same credential cache

Implement truly async token retrieval for Vertex AI credentials using aiohttp instead of running sync code in thread pools via asyncify. Changes: - Add refresh_auth_async() using aiohttp for non-blocking token refresh - Add load_auth_async() for async credential loading - Add get_access_token_async() for async token retrieval with caching - Add _handle_reauthentication_async() for proper async error handling - Add LITELLM_USE_ASYNC_VERTEX_AUTH feature flag (default: false) Benefits: - True async I/O instead of blocking thread pool workers - Better resource utilization under high concurrent load - Explicit session management (no unclosed session warnings) - Improved scalability (handles thousands of concurrent requests) - Backward compatible (defaults to existing asyncify behavior) Testing: - Added 8 comprehensive test cases covering all scenarios - All 47 existing tests pass (no regressions) - Tests verify feature flag behavior, caching, and reauthentication

vercel · 2025-10-24T07:22:36Z

@dharamendrak is attempting to deploy a commit to the CLERKIEAI Team on Vercel.

A member of the Team first needs to authorize it.

litellm/__init__.py

krrishdholakia · 2025-10-29T02:25:55Z

litellm/llms/vertex_ai/vertex_llm_base.py

+            return
+
+        # Create an aiohttp session for the token request
+        async with aiohttp.ClientSession(auto_decompress=False) as session:


instead of using aiohttp directly, can you use our http handler -

litellm/litellm/llms/custom_httpx/http_handler.py

Line 975 in 33371d1

def get_async_httpx_client(

this will prevent creating a client on each request and ensure this works with any system settings the user sets

@krrishdholakia The problem in using http_handler it doesn't have auto_decompress=False . Google auth only uses session with auto_decompress=False. I can introduce one in http_handler, with this property.

@krrishdholakia Due to Google Auth library limitation, we need session with auto_decompress=False. I created method that will make session as cls attribute.

krrishdholakia · 2025-10-31T01:14:11Z

Hi @dharamendrak changes look fine, can you share the perf impact you see with the changes?

dharamendrak · 2025-10-31T09:02:33Z

Hi @dharamendrak changes look fine, can you share the perf impact you see with the changes?

@krrishdholakia Here performance test:

Vertex AI Async Authentication - Real Test Results

Test Summary

Date: October 31, 2025
Status: ✅ ALL TESTS PASSED
Performance Improvement: 65.4% faster (2.89x speedup)

Test Configuration

Credentials: Service Account JSON
Credential Type: google.oauth2._service_account_async.Credentials
Transport: google.auth.transport._aiohttp_requests.Request (OLD async-compatible)

Test Results

TEST 1: Load Async Credentials ✅

Time: 422.74ms
Type: google.oauth2._service_account_async.Credentials
Async refresh: True

Verification: TRUE ASYNC CREDENTIALS confirmed!

TEST 2: Async Token Refresh ✅

Refresh 1: 91.02ms - Token: ya29.c.c0ASRK0GaZp0c...
Refresh 2: 97.10ms - Token: ya29.c.c0ASRK0GbpODH...
Refresh 3: 95.46ms - Token: ya29.c.c0ASRK0GYRBAv...

Average refresh time: 94.53ms

Verification: Multiple refreshes working correctly, generating new tokens each time.

TEST 2B: Force Token Expiration & Auto-Refresh ✅

Key Finding: ✅ Direct expiry assignment works!

creds.expiry = datetime.datetime.utcnow() - datetime.timedelta(seconds=1)

Results:

Direct Expiry Manipulation:

✅ Successfully set expiry to past time
Credentials expired: True

Manual Refresh After Expiration:

New token: ya29.c.c0ASRK0GaYQYTNb2W21lF-m...
Refresh took: 93.65ms
✅ Token refreshed successfully!

Auto-Refresh via get_access_token_async():

✅ Auto-refresh worked! Got token in 140.25ms
Token: ya29.c.c0ASRK0GYzyHXXXjFP_fagI...

Verification: Token expiration detection and automatic refresh working perfectly!

TEST 3: Persistent Session Verification ✅

Session: aiohttp.ClientSession
Session ID: 4657367440
Auto decompress: False
Closed: False
✅ Session reused correctly!

Verification: Same aiohttp session is reused across multiple refreshes for efficiency.

TEST 4: Cache Behavior with Expired Tokens ✅

Cache Performance:

First call (cache miss):  138.72ms
Second call (cache hit):    0.02ms
Cache speedup: 6926.8x faster

Expiration Handling:

✅ Set expiry on cached credentials
Third call (expired, auto-refresh): 96.84ms
Token: ya29.c.c0ASRK0GaDjlnHV6riaPdRL...
✅ Auto-refresh detected and handled!

Verification:

Credential caching working correctly
Expired cached credentials automatically refreshed
Cache invalidation on expiration working as expected

TEST 5: Concurrent Async Refreshes ✅

10 concurrent refreshes completed in 280.89ms
Average per refresh: 28.09ms

Verification:

Concurrent refreshes handled efficiently
Significant performance benefit from async (28ms vs 94ms for sequential)
No race conditions or blocking

TEST 6: Get Access Token (Full Flow) ✅

Time: 0.03ms (cache hit)
Token: ya29.c.c0ASRK0GaDjlnHV6riaPdRL...

Verification: End-to-end authentication flow working with caching.

Performance Comparison: Sync vs Async

Sequential Refresh Performance:

Method	Average Time	Performance
SYNC	280.90ms	Baseline
ASYNC	97.23ms	2.89x faster

Key Findings:

✅ 65.4% performance improvement with async
✅ Async eliminates blocking I/O during token refresh
✅ Persistent aiohttp session reduces overhead
✅ Concurrent refreshes benefit even more from async (28ms average)

Key Technical Achievements

1. True Async Implementation ✅

Using google.oauth2._service_account_async.Credentials
Async refresh() method with await
No blocking run_in_executor() calls in the happy path

2. Compatible Transport ✅

Using google.auth.transport._aiohttp_requests.Request (OLD transport)
Compatible with OLD async credentials
Supports persistent session reuse

3. Token Expiration Handling ✅

Direct expiry assignment works (no need for mocking in production)
Automatic refresh detection via credentials.expired property
Cache invalidation on expiration

4. Persistent Session Management ✅

Single aiohttp.ClientSession reused across all refreshes
Proper cleanup with close_token_refresh_session()
Significant performance benefit (6926x faster cache hits)

5. Credential Caching ✅

Credentials cached by (credentials_json, project_id) key
Cache hits are extremely fast (0.02-0.03ms)
Expired credentials automatically refreshed

Recommendations

✅ Ready for Production

This async implementation is ready for production use:

Performance: 2.89x faster than sync, 65.4% improvement
Correctness: All token refresh and expiration scenarios handled
Efficiency: Persistent sessions and caching working correctly
Concurrency: Handles concurrent refreshes without blocking
Reliability: True async I/O, no executor fallbacks needed

Migration Path

Existing code using sync methods will continue to work:

load_auth() → uses sync credentials
refresh_auth() → uses sync transport

New async code should use:

load_auth_async() → uses async credentials
refresh_auth_async() → uses async transport with persistent session
get_access_token_async() → full async flow with caching

Test Environment

Python: 3.11
Platform: macOS (darwin 25.0.0)
google-auth: Latest version with async support
aiohttp: Latest version
Repository: LiteLLM (Vertex AI integration)

Conclusion

✅ The async Vertex AI authentication implementation is production-ready with:

Verified 2.89x performance improvement
True async I/O without blocking
Proper token expiration and refresh handling
Efficient caching and session management
Full backward compatibility with sync methods

The implementation successfully uses the OLD async credentials (google.oauth2._service_account_async) with their compatible OLD transport (google.auth.transport._aiohttp_requests.Request), avoiding the incompatibility issues with the NEW transport API while maintaining true async behavior.

dharamendrak · 2025-11-02T14:01:48Z

@krrishdholakia Let me know if we are good to merge.

krrishdholakia reviewed Oct 29, 2025

View reviewed changes

litellm/__init__.py Outdated Show resolved Hide resolved

krrishdholakia reviewed Oct 29, 2025

View reviewed changes

Reuse aiohttp session

0c2d9db

Fix code and run performance

743c87f

dharamendrak added 3 commits October 31, 2025 02:22

NIT

5846af3

NIT removed

3ed3cb6

Merge branch 'main' into feat/async-vertex-ai-auth

375d864

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: Add native async authentication for Vertex AI with aiohttp #15888

feat: Add native async authentication for Vertex AI with aiohttp #15888

dharamendrak commented Oct 24, 2025

Uh oh!

vercel bot commented Oct 24, 2025

Uh oh!

Uh oh!

krrishdholakia Oct 29, 2025

Uh oh!

dharamendrak Oct 29, 2025

Uh oh!

dharamendrak Oct 30, 2025

Uh oh!

krrishdholakia commented Oct 31, 2025

Uh oh!

dharamendrak commented Oct 31, 2025 •

edited

Loading

Uh oh!

dharamendrak commented Nov 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat: Add native async authentication for Vertex AI with aiohttp #15888

Are you sure you want to change the base?

feat: Add native async authentication for Vertex AI with aiohttp #15888

Conversation

dharamendrak commented Oct 24, 2025

Title

Relevant issues

Pre-Submission checklist

Type

Changes

Summary

Implementation Details

Benefits

Testing

Usage

Technical Notes

Backward Compatibility

Uh oh!

vercel bot commented Oct 24, 2025

Uh oh!

Uh oh!

krrishdholakia Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

dharamendrak Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

dharamendrak Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

krrishdholakia commented Oct 31, 2025

Uh oh!

dharamendrak commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Vertex AI Async Authentication - Real Test Results

Test Summary

Test Configuration

Test Results

TEST 1: Load Async Credentials ✅

TEST 2: Async Token Refresh ✅

TEST 2B: Force Token Expiration & Auto-Refresh ✅

Results:

TEST 3: Persistent Session Verification ✅

TEST 4: Cache Behavior with Expired Tokens ✅

Cache Performance:

Expiration Handling:

TEST 5: Concurrent Async Refreshes ✅

TEST 6: Get Access Token (Full Flow) ✅

Performance Comparison: Sync vs Async

Sequential Refresh Performance:

Key Findings:

Key Technical Achievements

1. True Async Implementation ✅

2. Compatible Transport ✅

3. Token Expiration Handling ✅

4. Persistent Session Management ✅

5. Credential Caching ✅

Recommendations

✅ Ready for Production

Migration Path

Test Environment

Conclusion

Uh oh!

dharamendrak commented Nov 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dharamendrak commented Oct 31, 2025 •

edited

Loading