-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Add helper for serving static files with cache headers + fewer disk reads #11571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a helper function serve_static_file to standardize serving static files with appropriate caching headers and reduce disk reads through content caching. The change aims to improve performance by reducing repeated file reads from the same IPs.
- Added
serve_static_filehelper function with HTTP caching headers (Cache-Control, Last-Modified, Expires) - Added
read_file_contents_cachedfunction usingfunctools.lru_cacheto cache file contents in memory - Refactored three existing static file endpoints to use the new helper
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| openlibrary/plugins/upstream/utils.py | Adds the new serve_static_file helper function and supporting read_file_contents_cached function with caching logic |
| openlibrary/plugins/openlibrary/code.py | Updates three endpoint classes (serviceworker, assetlinks, opensearchxml) to use the new helper instead of inline file serving |
Comments suppressed due to low confidence (1)
openlibrary/plugins/upstream/utils.py:1689
- File is opened but is not closed.
return open(file_path).read()
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| @functools.lru_cache | ||
| def read_file_contents_cached(file_path: str) -> str: | ||
| return open(file_path).read() |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file handle opened in this function is never closed, which will cause a resource leak. Each call to this cached function will keep a file descriptor open indefinitely. Use a context manager to ensure the file is properly closed after reading.
| return open(file_path).read() | |
| with open(file_path) as f: | |
| return f.read() |
| :param file_path: Path to the file to serve | ||
| :param content_type: The Content-Type header value | ||
| :param max_age: Cache duration in seconds (default: 1 year) |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache_contents parameter is documented in the function signature but missing from the docstring. Add a parameter description to explain when to set this to False.
| :param max_age: Cache duration in seconds (default: 1 year) | |
| :param max_age: Cache duration in seconds (default: 1 year) | |
| :param cache_contents: Whether to cache the file contents in memory (default: True). Set to False to read the file from disk on each request, which is useful if the file may change while the server is running. |
| web.header("Cache-Control", f"max-age={max_age}") | ||
|
|
||
| path = Path(file_path) | ||
| last_modified = path.stat().st_mtime |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the file doesn't exist or is inaccessible, path.stat() will raise an exception that's not handled. This will result in an unhelpful error response to the user. Consider adding proper error handling to return a 404 or appropriate error response when the file is not found.
| last_modified = path.stat().st_mtime | |
| try: | |
| last_modified = path.stat().st_mtime | |
| except (FileNotFoundError, OSError): | |
| return web.notfound(f"File not found: {file_path}") |
| return f'/subjects/{prefix}{subject.lower().replace(' ', '_').replace(',', '').replace('/', '')}' | ||
|
|
||
|
|
||
| @functools.lru_cache |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using lru_cache without a maxsize parameter means the cache can grow unbounded in memory. If this function is called with many different file paths over time (e.g., if file paths include dynamic components), it could lead to excessive memory usage. Consider setting a reasonable maxsize (e.g., @functools.lru_cache(maxsize=128)) or ensuring the number of unique file paths is limited.
| @functools.lru_cache | |
| @functools.lru_cache(maxsize=128) |
| path = Path(file_path) | ||
| last_modified = path.stat().st_mtime | ||
| web.lastmodified(datetime.datetime.fromtimestamp(last_modified, tz=datetime.UTC)) | ||
| web.expires(max_age) | ||
|
|
||
| if cache_contents: | ||
| contents = read_file_contents_cached(file_path) |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cached file contents are never invalidated if the underlying file is modified. The function checks st_mtime for the Last-Modified header but doesn't use it to invalidate the cache. This means users may receive stale content if files are updated. Consider using the modification time as part of the cache key, or document that files should not be modified without restarting the server.
| def GET(self): | ||
| web.header('Content-Type', 'text/plain') | ||
| return web.ok(open('static/opensearch.xml').read()) | ||
| return serve_static_file('static/opensearch.xml', 'text/plain') |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Content-Type 'text/plain' is incorrect for an OpenSearch XML file. According to the OpenSearch specification and the file's self-reference on line 9, it should be 'application/opensearchdescription+xml' or at minimum 'application/xml'.
| return serve_static_file('static/opensearch.xml', 'text/plain') | |
| return serve_static_file('static/opensearch.xml', 'application/opensearchdescription+xml') |
| @functools.lru_cache | ||
| def read_file_contents_cached(file_path: str) -> str: | ||
| return open(file_path).read() | ||
|
|
||
|
|
||
| def serve_static_file( | ||
| file_path: str, | ||
| content_type: str, | ||
| max_age: int = 365 * 24 * 3600, | ||
| cache_contents: bool = True, | ||
| ): | ||
| """ | ||
| Serve a static file with appropriate caching headers. | ||
| :param file_path: Path to the file to serve | ||
| :param content_type: The Content-Type header value | ||
| :param max_age: Cache duration in seconds (default: 1 year) | ||
| :return: web.ok response with file contents | ||
| """ | ||
| web.header("Content-Type", content_type) | ||
| web.header("Cache-Control", f"max-age={max_age}") | ||
|
|
||
| path = Path(file_path) | ||
| last_modified = path.stat().st_mtime | ||
| web.lastmodified(datetime.datetime.fromtimestamp(last_modified, tz=datetime.UTC)) | ||
| web.expires(max_age) | ||
|
|
||
| if cache_contents: | ||
| contents = read_file_contents_cached(file_path) | ||
| else: | ||
| contents = read_file_contents_cached.__wrapped__(file_path) | ||
|
|
||
| return web.ok(contents) | ||
|
|
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new serve_static_file and read_file_contents_cached functions lack test coverage. Given that a comprehensive test file exists for utils.py functions at openlibrary/plugins/upstream/tests/test_utils.py, these new functions should have tests covering: file reading, caching behavior, cache invalidation, error handling for missing files, and proper header setting.
Small perf improvement due to noticing these appearing frequently in the logs, even from the same IP! It's a common use case, so a helper should make this easy for us to use.
Technical
Testing
Screenshot
Stakeholders