Cache Filtering#

In many cases you will want to choose what you want to cache instead of just caching everything. By default, all read-only (GET and HEAD) requests with a 200 response code are cached. A few options are available to modify this behavior.

Note

When using CachedSession, any requests that you don’t want to cache can also be made with a regular requests.Session object, or wrapper functions like requests.get(), etc.

Filter by HTTP Methods#

To cache additional HTTP methods, specify them with allowable_methods:

>>> session = CachedSession(allowable_methods=('GET', 'POST'))
>>> session.post('https://httpbin.org/post', json={'param': 'value'})

For example, some APIs use the POST method to request data via a JSON-formatted request body, for requests that may exceed the max size of a GET request. You may also want to cache POST requests to ensure you don’t send the exact same data multiple times.

Filter by Status Codes#

To cache additional status codes, specify them with allowable_codes

>>> session = CachedSession(allowable_codes=(200, 418))
>>> session.get('https://httpbin.org/teapot')

Filter by URLs#

You can use URL patterns to define an allowlist for selective caching, by using a expiration value of requests_cache.DO_NOT_CACHE for non-matching request URLs:

>>> from requests_cache import DO_NOT_CACHE, NEVER_EXPIRE, CachedSession
>>> urls_expire_after = {
...     '*.site_1.com': 30,
...     'site_2.com/static': NEVER_EXPIRE,
...     '*': DO_NOT_CACHE,
... }
>>> session = CachedSession(urls_expire_after=urls_expire_after)

Note that the catch-all rule above ('*') will behave the same as setting the session-level expiration to 0:

>>> urls_expire_after = {'*.site_1.com': 30, 'site_2.com/static': -1}
>>> session = CachedSession(urls_expire_after=urls_expire_after, expire_after=0)

Custom Cache Filtering#

If you need more advanced behavior for choosing what to cache, you can provide a custom filtering function via the filter_fn param. This can by any function that takes a requests.Response object and returns a boolean indicating whether or not that response should be cached. It will be applied to both new responses (on write) and previously cached responses (on read):

>>> from sys import getsizeof
>>> from requests_cache import CachedSession

>>> def filter_by_size(response: Response) -> bool:
>>>     """Don't cache responses with a body over 1 MB"""
>>>     return getsizeof(response.content) <= 1024 * 1024

>>> session = CachedSession(filter_fn=filter_by_size)

Note

filter_fn() will be used in addition to other filtering options.