-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
feat: Add native async authentication for Vertex AI with aiohttp #15888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implement truly async token retrieval for Vertex AI credentials using aiohttp instead of running sync code in thread pools via asyncify. Changes: - Add refresh_auth_async() using aiohttp for non-blocking token refresh - Add load_auth_async() for async credential loading - Add get_access_token_async() for async token retrieval with caching - Add _handle_reauthentication_async() for proper async error handling - Add LITELLM_USE_ASYNC_VERTEX_AUTH feature flag (default: false) Benefits: - True async I/O instead of blocking thread pool workers - Better resource utilization under high concurrent load - Explicit session management (no unclosed session warnings) - Improved scalability (handles thousands of concurrent requests) - Backward compatible (defaults to existing asyncify behavior) Testing: - Added 8 comprehensive test cases covering all scenarios - All 47 existing tests pass (no regressions) - Tests verify feature flag behavior, caching, and reauthentication
|
@dharamendrak is attempting to deploy a commit to the CLERKIEAI Team on Vercel. A member of the Team first needs to authorize it. |
| return | ||
|
|
||
| # Create an aiohttp session for the token request | ||
| async with aiohttp.ClientSession(auto_decompress=False) as session: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of using aiohttp directly, can you use our http handler -
| def get_async_httpx_client( |
this will prevent creating a client on each request and ensure this works with any system settings the user sets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krrishdholakia The problem in using http_handler it doesn't have auto_decompress=False . Google auth only uses session with auto_decompress=False. I can introduce one in http_handler, with this property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krrishdholakia Due to Google Auth library limitation, we need session with auto_decompress=False. I created method that will make session as cls attribute.
|
Hi @dharamendrak changes look fine, can you share the perf impact you see with the changes? |
@krrishdholakia Here performance test: Vertex AI Async Authentication - Real Test ResultsTest SummaryDate: October 31, 2025 Test Configuration
Test ResultsTEST 1: Load Async Credentials ✅Verification: TRUE ASYNC CREDENTIALS confirmed! TEST 2: Async Token Refresh ✅Verification: Multiple refreshes working correctly, generating new tokens each time. TEST 2B: Force Token Expiration & Auto-Refresh ✅Key Finding: ✅ Direct creds.expiry = datetime.datetime.utcnow() - datetime.timedelta(seconds=1)Results:
Verification: Token expiration detection and automatic refresh working perfectly! TEST 3: Persistent Session Verification ✅Verification: Same aiohttp session is reused across multiple refreshes for efficiency. TEST 4: Cache Behavior with Expired Tokens ✅Cache Performance:Expiration Handling:Verification:
TEST 5: Concurrent Async Refreshes ✅Verification:
TEST 6: Get Access Token (Full Flow) ✅Verification: End-to-end authentication flow working with caching. Performance Comparison: Sync vs AsyncSequential Refresh Performance:
Key Findings:
Key Technical Achievements1. True Async Implementation ✅
2. Compatible Transport ✅
3. Token Expiration Handling ✅
4. Persistent Session Management ✅
5. Credential Caching ✅
Recommendations✅ Ready for ProductionThis async implementation is ready for production use:
Migration PathExisting code using sync methods will continue to work:
New async code should use:
Test Environment
Conclusion✅ The async Vertex AI authentication implementation is production-ready with:
The implementation successfully uses the OLD async credentials ( |
|
@krrishdholakia Let me know if we are good to merge. |
Title
feat: Add native async authentication for Vertex AI with aiohttp
Relevant issues
Addresses scalability and resource utilization issues with Vertex AI authentication in high-concurrency async environments.
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unitType
🆕 New Feature
✅ Test
Changes
Summary
Implement truly async token retrieval for Vertex AI credentials using aiohttp instead of running sync code in thread pools via asyncify. This provides better scalability and resource utilization under high concurrent load.
Implementation Details
New Async Methods:
refresh_auth_async()- Usesgoogle.auth.transport._aiohttp_requests.Requestwith aiohttp for non-blocking token refreshload_auth_async()- Async version of credential loading supporting all credential types (service accounts, authorized users, identity pools)get_access_token_async()- Async token retrieval with proper credential caching_handle_reauthentication_async()- Handles "Reauthentication is needed" errors in async contextFeature Flag:
LITELLM_USE_ASYNC_VERTEX_AUTHenvironment variable (default:false)litellm.use_async_vertex_auth = TrueFiles Modified:
litellm/__init__.py- Added feature flag declarationlitellm/llms/vertex_ai/vertex_llm_base.py- Added all async authentication methodstests/test_litellm/llms/vertex_ai/test_vertex_llm_base.py- Added 8 comprehensive test casesBenefits
Performance:
Reliability:
async withcontext managerScalability:
Compatibility:
false)Testing
New Tests Added (8 comprehensive test cases):
test_async_auth_with_feature_flag_enabled- Verifies async methods are used when flag is enabledtest_async_auth_with_feature_flag_disabled- Verifies fallback to asyncify when flag is disabledtest_refresh_auth_async_with_aiohttp- Tests async token refreshtest_load_auth_async_service_account- Tests async credential loading for service accountstest_async_token_refresh_when_expired- Tests expired token refresh in async pathtest_async_caching_with_new_implementation- Verifies credential caching works correctlytest_async_and_sync_share_same_cache- Confirms sync and async share credential cachetest_load_auth_async_authorized_user- Tests async loading for authorized user credentialsTest Results:
Usage
Enable via environment variable:
export LITELLM_USE_ASYNC_VERTEX_AUTH=trueEnable programmatically:
Technical Notes
Why aiohttp?
asyncifywhich runs syncrequestslibrary in a thread poolSession Management:
Credential Types Supported:
Backward Compatibility
LITELLM_USE_ASYNC_VERTEX_AUTH=false)