Advanced Usage¶
This section covers some more advanced and use-case-specific features.
Custom Response Filtering¶
If you need more advanced behavior for determining what to cache, you can provide a custom filtering
function via the filter_fn
param. This can by any function that takes a requests.Response
object and returns a boolean indicating whether or not that response should be cached. It will be applied
to both new responses (on write) and previously cached responses (on read). Example:
>>> from sys import getsizeof
>>> from requests_cache import CachedSession
>>>
>>> def filter_by_size(response):
>>> """Don't cache responses with a body over 1 MB"""
>>> return getsizeof(response.content) <= 1024 * 1024
>>>
>>> session = CachedSession(filter_fn=filter_by_size)
Cache Inspection¶
Here are some ways to get additional information out of the cache session, backend, and responses:
Response Attributes¶
The following attributes are available on responses:
* from_cache
: indicates if the response came from the cache
* created_at
: datetime
of when the cached response was created or last updated
* expires
: datetime
after which the cached response will expire
* is_expired
: indicates if the cached response is expired (if an old response was returned due to a request error)
Examples:
>>> from requests_cache import CachedSession
>>> session = CachedSession(expire_after=timedelta(days=1))
>>> # Placeholders are added for non-cached responses
>>> r = session.get('http://httpbin.org/get')
>>> print(r.from_cache, r.created_at, r.expires, r.is_expired)
False None None None
>>> # Values will be populated for cached responses
>>> r = session.get('http://httpbin.org/get')
>>> print(r.from_cache, r.created_at, r.expires, r.is_expired)
True 2021-01-01 18:00:00 2021-01-02 18:00:00 False
Cache Contents¶
You can use CachedSession.cache.urls()
to see all URLs currently in the cache:
>>> session = CachedSession()
>>> print(session.cache.urls)
['https://httpbin.org/get', 'https://httpbin.org/stream/100']
If needed, you can get more details on cached responses via CachedSession.cache.responses
, which
is a dict-like interface to the cache backend. See CachedResponse
for a full list of
attributes available.
For example, if you wanted to to see all URLs requested with a specific method:
>>> post_urls = [
>>> response.url for response in session.cache.responses.values()
>>> if response.request.method == 'POST'
>>> ]
You can also inspect CachedSession.cache.redirects
, which maps redirect URLs to keys of the
responses they redirect to.
Custom Backends¶
If the built-in Cache Backends
don’t suit your needs, you can
create your own by making subclasses of BaseCache
and BaseStorage
:
>>> from requests_cache import CachedSession
>>> from requests_cache.backends import BaseCache, BaseStorage
>>>
>>> class CustomCache(BaseCache):
... """Wrapper for higher-level cache operations. In most cases, the only thing you need
... to specify here is which storage class(es) to use.
... """
... def __init__(self, **kwargs):
... super().__init__(**kwargs)
... self.redirects = CustomStorage(**kwargs)
... self.responses = CustomStorage(**kwargs)
>>>
>>> class CustomStorage(BaseStorage):
... """Dict-like interface for lower-level backend storage operations"""
... def __init__(self, **kwargs):
... super().__init__(**kwargs)
...
... def __getitem__(self, key):
... pass
...
... def __setitem__(self, key, value):
... pass
...
... def __delitem__(self, key):
... pass
...
... def __iter__(self):
... pass
...
... def __len__(self):
... pass
...
... def clear(self):
... pass
You can then use your custom backend in a CachedSession
with the backend
parameter:
>>> session = CachedSession(backend=CustomCache())
Usage with other requests features¶
Request Hooks¶
Requests has an Event Hook system that can be used to add custom behavior into different parts of the request process. It can be used, for example, for request throttling:
>>> import time
>>> import requests
>>> from requests_cache import CachedSession
>>>
>>> def make_throttle_hook(timeout=1.0):
>>> """Make a request hook function that adds a custom delay for non-cached requests"""
>>> def hook(response, *args, **kwargs):
>>> if not getattr(response, 'from_cache', False):
>>> print('sleeping')
>>> time.sleep(timeout)
>>> return response
>>> return hook
>>>
>>> session = CachedSession()
>>> session.hooks['response'].append(make_throttle_hook(0.1))
>>> # The first (real) request will have an added delay
>>> session.get('http://httpbin.org/get')
>>> session.get('http://httpbin.org/get')
Streaming Requests¶
If you use streaming requests, you can use the same code to iterate over both cached and non-cached requests. A cached request will, of course, have already been read, but will use a file-like object containing the content. Example:
>>> from requests_cache import CachedSession
>>>
>>> session = CachedSession()
>>> for i in range(2):
... r = session.get('https://httpbin.org/stream/20', stream=True)
... for chunk in r.iter_lines():
... print(chunk.decode('utf-8'))
Usage with other requests-based libraries¶
This library works by patching and/or extending requests.Session
. Many other libraries out there
do the same thing, making it potentially difficult to combine them. For that scenario, a mixin class
is provided, so you can create a custom class with behavior from multiple Session-modifying libraries:
>>> from requests import Session
>>> from requests_cache import CacheMixin
>>> from some_other_lib import SomeOtherMixin
>>>
>>> class CustomSession(CacheMixin, SomeOtherMixin ClientSession):
... """Session class with features from both requests-html and requests-cache"""
Requests-HTML¶
Example with requests-html:
>>> import requests
>>> from requests_cache import CacheMixin, install_cache
>>> from requests_html import HTMLSession
>>>
>>> class CachedHTMLSession(CacheMixin, HTMLSession):
... """Session with features from both CachedSession and HTMLSession"""
>>>
>>> session = CachedHTMLSession()
>>> r = session.get('https://github.com/')
>>> print(r.from_cache, r.html.links)
Or, using the monkey-patch method:
>>> install_cache(session_factory=CachedHTMLSession)
>>> r = requests.get('https://github.com/')
>>> print(r.from_cache, r.html.links)
The same approach can be used with other libraries that subclass requests.Session
.
Requests-futures¶
Example with requests-futures:
Some libraries, including requests-futures
, support wrapping an existing session object:
>>> session = FutureSession(session=CachedSession())
In this case, FutureSession
must wrap CachedSession
rather than the other way around, since
FutureSession
returns (as you might expect) futures rather than response objects.
See issue #135 for more notes on this.
Requests-mock¶
Example with requests-mock:
Requests-mock works a bit differently. It has multiple methods of mocking requests, and the method most compatible with requests-cache is attaching its adapter to a CachedSession:
>>> import requests
>>> from requests_mock import Adapter
>>> from requests_cache import CachedSession
>>>
>>> # Set up a CachedSession that will make mock requests where it would normally make real requests
>>> adapter = Adapter()
>>> adapter.register_uri(
... 'GET',
... 'mock://some_test_url',
... headers={'Content-Type': 'text/plain'},
... text='mock response',
... status_code=200,
... )
>>> session = CachedSession()
>>> session.mount('mock://', adapter)
>>>
>>> session.get('mock://some_test_url', text='mock_response')
>>> response = session.get('mock://some_test_url')
>>> print(response.text)
Internet Archive¶
Example with internetarchive:
Usage is the same as other libraries that subclass requests.Session:
>>> from requests_cache import CacheMixin
>>> from internetarchive.session import ArchiveSession
>>>
>>> class CachedArchiveSession(CacheMixin, ArchiveSession):
... """Session with features from both CachedSession and ArchiveSession"""