Skip to content

Conversation

@cdrini
Copy link
Collaborator

@cdrini cdrini commented Dec 9, 2025

Small perf improvement due to noticing these appearing frequently in the logs, even from the same IP! It's a common use case, so a helper should make this easy for us to use.

Technical

Testing

Screenshot

Stakeholders

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a helper function serve_static_file to standardize serving static files with appropriate caching headers and reduce disk reads through content caching. The change aims to improve performance by reducing repeated file reads from the same IPs.

  • Added serve_static_file helper function with HTTP caching headers (Cache-Control, Last-Modified, Expires)
  • Added read_file_contents_cached function using functools.lru_cache to cache file contents in memory
  • Refactored three existing static file endpoints to use the new helper

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
openlibrary/plugins/upstream/utils.py Adds the new serve_static_file helper function and supporting read_file_contents_cached function with caching logic
openlibrary/plugins/openlibrary/code.py Updates three endpoint classes (serviceworker, assetlinks, opensearchxml) to use the new helper instead of inline file serving
Comments suppressed due to low confidence (1)

openlibrary/plugins/upstream/utils.py:1689

  • File is opened but is not closed.
    return open(file_path).read()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


@functools.lru_cache
def read_file_contents_cached(file_path: str) -> str:
return open(file_path).read()
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file handle opened in this function is never closed, which will cause a resource leak. Each call to this cached function will keep a file descriptor open indefinitely. Use a context manager to ensure the file is properly closed after reading.

Suggested change
return open(file_path).read()
with open(file_path) as f:
return f.read()

Copilot uses AI. Check for mistakes.
:param file_path: Path to the file to serve
:param content_type: The Content-Type header value
:param max_age: Cache duration in seconds (default: 1 year)
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache_contents parameter is documented in the function signature but missing from the docstring. Add a parameter description to explain when to set this to False.

Suggested change
:param max_age: Cache duration in seconds (default: 1 year)
:param max_age: Cache duration in seconds (default: 1 year)
:param cache_contents: Whether to cache the file contents in memory (default: True). Set to False to read the file from disk on each request, which is useful if the file may change while the server is running.

Copilot uses AI. Check for mistakes.
web.header("Cache-Control", f"max-age={max_age}")

path = Path(file_path)
last_modified = path.stat().st_mtime
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the file doesn't exist or is inaccessible, path.stat() will raise an exception that's not handled. This will result in an unhelpful error response to the user. Consider adding proper error handling to return a 404 or appropriate error response when the file is not found.

Suggested change
last_modified = path.stat().st_mtime
try:
last_modified = path.stat().st_mtime
except (FileNotFoundError, OSError):
return web.notfound(f"File not found: {file_path}")

Copilot uses AI. Check for mistakes.
return f'/subjects/{prefix}{subject.lower().replace(' ', '_').replace(',', '').replace('/', '')}'


@functools.lru_cache
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using lru_cache without a maxsize parameter means the cache can grow unbounded in memory. If this function is called with many different file paths over time (e.g., if file paths include dynamic components), it could lead to excessive memory usage. Consider setting a reasonable maxsize (e.g., @functools.lru_cache(maxsize=128)) or ensuring the number of unique file paths is limited.

Suggested change
@functools.lru_cache
@functools.lru_cache(maxsize=128)

Copilot uses AI. Check for mistakes.
Comment on lines +1709 to +1715
path = Path(file_path)
last_modified = path.stat().st_mtime
web.lastmodified(datetime.datetime.fromtimestamp(last_modified, tz=datetime.UTC))
web.expires(max_age)

if cache_contents:
contents = read_file_contents_cached(file_path)
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cached file contents are never invalidated if the underlying file is modified. The function checks st_mtime for the Last-Modified header but doesn't use it to invalidate the cache. This means users may receive stale content if files are updated. Consider using the modification time as part of the cache key, or document that files should not be modified without restarting the server.

Copilot uses AI. Check for mistakes.
def GET(self):
web.header('Content-Type', 'text/plain')
return web.ok(open('static/opensearch.xml').read())
return serve_static_file('static/opensearch.xml', 'text/plain')
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Content-Type 'text/plain' is incorrect for an OpenSearch XML file. According to the OpenSearch specification and the file's self-reference on line 9, it should be 'application/opensearchdescription+xml' or at minimum 'application/xml'.

Suggested change
return serve_static_file('static/opensearch.xml', 'text/plain')
return serve_static_file('static/opensearch.xml', 'application/opensearchdescription+xml')

Copilot uses AI. Check for mistakes.
Comment on lines +1687 to +1720
@functools.lru_cache
def read_file_contents_cached(file_path: str) -> str:
return open(file_path).read()


def serve_static_file(
file_path: str,
content_type: str,
max_age: int = 365 * 24 * 3600,
cache_contents: bool = True,
):
"""
Serve a static file with appropriate caching headers.
:param file_path: Path to the file to serve
:param content_type: The Content-Type header value
:param max_age: Cache duration in seconds (default: 1 year)
:return: web.ok response with file contents
"""
web.header("Content-Type", content_type)
web.header("Cache-Control", f"max-age={max_age}")

path = Path(file_path)
last_modified = path.stat().st_mtime
web.lastmodified(datetime.datetime.fromtimestamp(last_modified, tz=datetime.UTC))
web.expires(max_age)

if cache_contents:
contents = read_file_contents_cached(file_path)
else:
contents = read_file_contents_cached.__wrapped__(file_path)

return web.ok(contents)

Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new serve_static_file and read_file_contents_cached functions lack test coverage. Given that a comprehensive test file exists for utils.py functions at openlibrary/plugins/upstream/tests/test_utils.py, these new functions should have tests covering: file reading, caching behavior, cache invalidation, error handling for missing files, and proper header setting.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant