Advanced Usage¶
This section covers some more advanced and use-case-specific features.
Cache Inspection¶
Here are some ways to get additional information out of the cache session, backend, and responses:
Response Details¶
The following attributes are available on responses:
from_cache
: indicates if the response came from the cachecreated_at
:datetime
of when the cached response was created or last updatedexpires
:datetime
after which the cached response will expireis_expired
: indicates if the cached response is expired (if an old response was returned due to a request error)
Examples:
Example code
>>> from requests_cache import CachedSession
>>> session = CachedSession(expire_after=timedelta(days=1))
>>> # Placeholders are added for non-cached responses
>>> response = session.get('http://httpbin.org/get')
>>> print(response.from_cache, response.created_at, response.expires, response.is_expired)
False None None None
>>> # Values will be populated for cached responses
>>> response = session.get('http://httpbin.org/get')
>>> print(response.from_cache, response.created_at, response.expires, response.is_expired)
True 2021-01-01 18:00:00 2021-01-02 18:00:00 False
>>> # Print a response object to get general information about it
>>> print(response)
'request: GET https://httpbin.org/get, response: 200 (308 bytes), created: 2021-01-01 22:45:00 IST, expires: 2021-01-02 18:45:00 IST (fresh)'
Cache Contents¶
You can use CachedSession.cache.urls
to see all URLs currently in the cache:
>>> session = CachedSession()
>>> print(session.cache.urls)
['https://httpbin.org/get', 'https://httpbin.org/stream/100']
If needed, you can get more details on cached responses via CachedSession.cache.responses
, which
is a dict-like interface to the cache backend. See CachedResponse
for a full list of
attributes available.
For example, if you wanted to to see all URLs requested with a specific method:
>>> post_urls = [
... response.url for response in session.cache.responses.values()
... if response.request.method == 'POST'
... ]
You can also inspect CachedSession.cache.redirects
, which maps redirect URLs to keys of the
responses they redirect to.
Additional keys()
and values()
wrapper methods are available on BaseCache
to get
combined keys and responses.
>>> print('All responses:')
>>> for response in session.cache.values():
>>> print(response)
>>> print('All cache keys for redirects and responses combined:')
>>> print(list(session.cache.keys()))
Both methods also take a check_expiry
argument to exclude expired responses:
>>> print('All unexpired responses:')
>>> for response in session.cache.values(check_expiry=True):
>>> print(response)
Similarly, you can get a count of responses with BaseCache.response_count()
, and optionally
exclude expired responses:
>>> print(f'Total responses: {session.cache.response_count()}')
>>> print(f'Unexpired responses: {session.cache.response_count(check_expiry=True)}')
Custom Response Filtering¶
If you need more advanced behavior for determining what to cache, you can provide a custom filtering
function via the filter_fn
param. This can by any function that takes a requests.Response
object and returns a boolean indicating whether or not that response should be cached. It will be applied
to both new responses (on write) and previously cached responses (on read):
Example code
>>> from sys import getsizeof
>>> from requests_cache import CachedSession
>>> def filter_by_size(response):
>>> """Don't cache responses with a body over 1 MB"""
>>> return getsizeof(response.content) <= 1024 * 1024
>>> session = CachedSession(filter_fn=filter_by_size)
Custom Backends¶
If the built-in Cache Backends
don’t suit your needs, you can
create your own by making subclasses of BaseCache
and BaseStorage
:
Example code
>>> from requests_cache import CachedSession
>>> from requests_cache.backends import BaseCache, BaseStorage
>>> class CustomCache(BaseCache):
... """Wrapper for higher-level cache operations. In most cases, the only thing you need
... to specify here is which storage class(es) to use.
... """
... def __init__(self, **kwargs):
... super().__init__(**kwargs)
... self.redirects = CustomStorage(**kwargs)
... self.responses = CustomStorage(**kwargs)
>>> class CustomStorage(BaseStorage):
... """Dict-like interface for lower-level backend storage operations"""
... def __init__(self, **kwargs):
... super().__init__(**kwargs)
...
... def __getitem__(self, key):
... pass
...
... def __setitem__(self, key, value):
... pass
...
... def __delitem__(self, key):
... pass
...
... def __iter__(self):
... pass
...
... def __len__(self):
... pass
...
... def clear(self):
... pass
You can then use your custom backend in a CachedSession
with the backend
parameter:
>>> session = CachedSession(backend=CustomCache())
Custom Serializers¶
If the built-in Serializers don’t suit your needs, you can create your own. For example, if
you had a imaginary custom_pickle
module that provides dumps
and loads
functions:
>>> import custom_pickle
>>> from requests_cache import CachedSession
>>> session = CachedSession(serializer=custom_pickle)
Serializer Pipelines¶
More complex serialization can be done with SerializerPipeline
. Use cases include
text-based serialization, compression, encryption, and any other intermediate steps you might want
to add.
Any combination of these can be composed with a SerializerPipeline
, which starts with a
CachedResponse
and ends with a str
or bytes
object. Each stage
of the pipeline can be any object or module with dumps
and loads
functions. If the object has
similar methods with different names (e.g. compress
/ decompress
), those can be aliased using
Stage
.
For example, a compressed pickle serializer can be built as:
Example code
>>> import pickle, gzip
>>> from requests_cache.serialzers import SerializerPipeline, Stage
>>> compressed_serializer = SerializerPipeline([
... pickle,
... Stage(gzip, dumps='compress', loads='decompress'),
...])
>>> session = CachedSession(serializer=compressed_serializer)
Text-based Serializers¶
If you’re using a text-based serialization format like JSON or YAML, some extra steps are needed to
encode binary data and non-builtin types. The cattrs library can do
the majority of the work here, and some pre-configured converters are included for serveral common
formats in the preconf
module.
For example, a compressed JSON pipeline could be built as follows:
Example code
>>> import json, gzip, codecs
>>> from requests_cache.serializers import SerializerPipeline, Stage, json_converter
>>> comp_json_serializer = SerializerPipeline([
... json_converter, # Serialize to a JSON string
... Stage(codecs.utf_8, dumps='encode', loads='decode'), # Encode to bytes
... Stage(gzip, dumps='compress', loads='decompress'), # Compress
...])
Note
If you want to use a different format that isn’t included in preconf
, you can use
CattrStage
as a starting point.
Note
If you want to convert a string representation to bytes (e.g. for compression), you must use a codec
from codecs
(typically codecs.utf_8
)
Additional Serialization Steps¶
Some other tools that could be used as a stage in a SerializerPipeline
include:
class |
loads |
dumps |
---|---|---|
encode |
decode |
|
compress |
decompress |
|
compress |
decompress |
|
compress |
decompress |
|
compress |
decompress |
|
dumps |
loads |
|
sign |
unsign |
|
loads |
dumps |
|
encrypt |
decrypt |
Usage with other requests features¶
Request Hooks¶
Requests has an Event Hook system that can be used to add custom behavior into different parts of the request process. It can be used, for example, for request throttling:
Example code
>>> import time
>>> import requests
>>> from requests_cache import CachedSession
>>>
>>> def make_throttle_hook(timeout=1.0):
>>> """Make a request hook function that adds a custom delay for non-cached requests"""
>>> def hook(response, *args, **kwargs):
>>> if not getattr(response, 'from_cache', False):
>>> print('sleeping')
>>> time.sleep(timeout)
>>> return response
>>> return hook
>>>
>>> session = CachedSession()
>>> session.hooks['response'].append(make_throttle_hook(0.1))
>>> # The first (real) request will have an added delay
>>> session.get('http://httpbin.org/get')
>>> session.get('http://httpbin.org/get')
Streaming Requests¶
Note
This feature requires requests >= 2.19
If you use streaming requests, you can use the same code to iterate over both cached and non-cached requests. A cached request will, of course, have already been read, but will use a file-like object containing the content:
Example code
>>> from requests_cache import CachedSession
>>>
>>> session = CachedSession()
>>> for i in range(2):
... response = session.get('https://httpbin.org/stream/20', stream=True)
... for chunk in response.iter_lines():
... print(chunk.decode('utf-8'))
Usage with other requests-based libraries¶
This library works by patching and/or extending requests.Session
. Many other libraries out there
do the same thing, making it potentially difficult to combine them.
For that scenario, a mixin class is provided, so you can create a custom class with behavior from multiple Session-modifying libraries:
>>> from requests import Session
>>> from requests_cache import CacheMixin
>>> from some_other_lib import SomeOtherMixin
>>>
>>> class CustomSession(CacheMixin, SomeOtherMixin, Session):
... """Session class with features from both some_other_lib and requests-cache"""
Requests-HTML¶
requests-html is one library that works with this method:
Example code
>>> import requests
>>> from requests_cache import CacheMixin, install_cache
>>> from requests_html import HTMLSession
>>>
>>> class CachedHTMLSession(CacheMixin, HTMLSession):
... """Session with features from both CachedSession and HTMLSession"""
>>>
>>> session = CachedHTMLSession()
>>> response = session.get('https://github.com/')
>>> print(response.from_cache, response.html.links)
Or if you are using install_cache()
, you can use the session_factory
argument:
Example code
>>> install_cache(session_factory=CachedHTMLSession)
>>> response = requests.get('https://github.com/')
>>> print(response.from_cache, response.html.links)
The same approach can be used with other libraries that subclass requests.Session
.
Requests-futures¶
Some libraries, including requests-futures, support wrapping an existing session object:
>>> session = FutureSession(session=CachedSession())
In this case, FutureSession
must wrap CachedSession
rather than the other way around, since
FutureSession
returns (as you might expect) futures rather than response objects.
See issue #135 for more notes on this.
Internet Archive¶
Usage with internetarchive is the same as other libraries
that subclass requests.Session
:
Example code
>>> from requests_cache import CacheMixin
>>> from internetarchive.session import ArchiveSession
>>>
>>> class CachedArchiveSession(CacheMixin, ArchiveSession):
... """Session with features from both CachedSession and ArchiveSession"""
Requests-mock¶
requests-mock has multiple methods for mocking requests, including a contextmanager, decorator, fixture, and adapter. There are a few different options for using it with requests-cache, depending on how you want your tests to work.
Disabling requests-cache¶
If you have an application that uses requests-cache and you just want to use requests-mock in your tests, the easiest thing to do is to disable requests-cache.
For example, if you are using install_cache()
in your application and the
requests-mock pytest fixture in your
tests, you could wrap it in another fixture that uses uninstall_cache()
or disabled()
:
Example code
"""Example of using requests-cache with the requests-mock library"""
import pytest
import requests
import requests_cache
@pytest.fixture(scope='function')
def requests_cache_mock(requests_mock):
with requests_cache.disabled():
yield requests_mock
def test_requests_cache_mock(requests_cache_mock):
"""Within this test function, requests will be mocked and not cached"""
url = 'https://example.com'
requests_cache_mock.get(url, text='Mock response!')
# Make sure the mocker is used
response_1 = requests.get(url)
assert response_1.text == 'Mock response!'
# Make sure the cache is not used
response_2 = requests.get(url)
assert getattr(response_2, 'from_cache', False) is False
Or if you use a CachedSession
object, you could replace it with a regular Session
, for example:
Example code
import unittest
import pytest
import requests
@pytest.fixure(scope='function', autouse=True)
def disable_requests_cache():
"""Replace CachedSession with a regular Session for all test functions"""
with unittest.mock.patch('requests_cache.CachedSession', requests.Session):
yield
Combining requests-cache with requests-mock¶
If you want both caching and mocking features at the same time, you can attach requests-mock’s
adapter to a CachedSession
:
Example code
"""Example of using requests-cache with the requests-mock library"""
import pytest
from requests_mock import Adapter
from requests_cache import CachedSession
URL = 'https://some_test_url'
@pytest.fixture(scope='function')
def mock_session():
"""Fixture that provides a CachedSession that will make mock requests where it would normally
make real requests"""
adapter = Adapter()
adapter.register_uri(
'GET',
URL,
headers={'Content-Type': 'text/plain'},
text='Mock response!',
status_code=200,
)
session = CachedSession(backend='memory')
session.mount('https://', adapter)
yield session
def test_mock_session(mock_session):
"""Test that the mock_session fixture is working as expected"""
response_1 = mock_session.get(URL)
assert response_1.text == 'Mock response!'
assert getattr(response_1, 'from_cache', False) is False
response_2 = mock_session.get(URL)
assert response_2.text == 'Mock response!'
assert response_2.from_cache is True
Building a mocker using requests-cache data¶
Another approach is to use cached data to dynamically define mock requests + responses. This has the advantage of only using request-mock’s behavior for request matching.
Example code
@pytest.fixture(scope='session')
def mock_session():
"""Fixture that provides a session with mocked URLs and responses based on cache data"""
adapter = Adapter()
cache = CachedSession(TEST_DB).cache
for response in cache.values():
adapter.register_uri(
response.request.method,
response.request.url,
content=response.content,
headers=response.headers,
status_code=response.status_code,
)
print(f'Added mock response: {response}')
session = Session()
session.mount('http://', adapter)
session.mount('https://', adapter)
yield session
To turn that into a complete example:
Example code
"""Example of using requests-cache with the requests-mock library"""
from os.path import dirname, join
from unittest.mock import patch
import pytest
import requests
from requests import Session
from requests_mock import Adapter, NoMockAddress
from requests_cache import CachedSession
TEST_DB = join(dirname(__file__), 'httpbin_sample.test-db')
TEST_URLS = [
'https://httpbin.org/get',
'https://httpbin.org/html',
'https://httpbin.org/json',
]
UNMOCKED_URL = 'https://httpbin.org/ip'
@pytest.fixture(scope='session')
def mock_session():
"""Fixture that provides a session with mocked URLs and responses based on cache data"""
adapter = Adapter()
cache = CachedSession(TEST_DB).cache
for response in cache.values():
adapter.register_uri(
response.request.method,
response.request.url,
content=response.content,
headers=response.headers,
status_code=response.status_code,
)
print(f'Added mock response: {response}')
session = Session()
session.mount('http://', adapter)
session.mount('https://', adapter)
yield session
@patch.object(requests.adapters.HTTPAdapter, 'send', side_effect=ValueError('Real request made!'))
def test_mock_session(mock_http_adapter, mock_session):
"""Test that the mock_session fixture is working as expected"""
# An error will be raised if a real request is made
with pytest.raises(ValueError):
requests.get(TEST_URLS[0])
# All mocked URLs will return a response based on requests-cache data
for url in TEST_URLS:
response = mock_session.get(url)
assert getattr(response, 'from_cache', False) is False
# requests-mock will raise an error for an unmocked URL, as usual
with pytest.raises(NoMockAddress):
mock_session.get(UNMOCKED_URL)
def save_test_data():
"""Run once to save data to reuse for tests, for demo purposes.
In practice, you could just run your application or tests with requests-cache installed.
"""
session = CachedSession(TEST_DB)
for url in TEST_URLS:
session.get(url)
if __name__ == '__main__':
save_test_data()
Responses¶
Usage with the responses library is similar to the requests-mock examples above.
Example code
"""Example of using requests-cache with the responses library"""
from contextlib import contextmanager
from os.path import dirname, join
from unittest.mock import patch
import pytest
import requests
from requests.exceptions import ConnectionError
from responses import RequestsMock, Response
from requests_cache import CachedSession
TEST_DB = join(dirname(__file__), 'httpbin_sample.test-db')
TEST_URLS = [
'https://httpbin.org/get',
'https://httpbin.org/html',
'https://httpbin.org/json',
]
PASSTHRU_URL = 'https://httpbin.org/gzip'
UNMOCKED_URL = 'https://httpbin.org/ip'
@contextmanager
def get_responses():
"""Contextmanager that provides a RequestsMock object mocked URLs and responses
based on cache data
"""
with RequestsMock() as mocker:
cache = CachedSession(TEST_DB).cache
for response in cache.values():
mocker.add(
Response(
response.request.method,
response.request.url,
body=response.content,
headers=response.headers,
status=response.status_code,
)
)
mocker.add_passthru(PASSTHRU_URL)
yield mocker
# responses patches HTTPAdapter.send(), so we need to patch one level lower to verify request mocking
@patch.object(
requests.adapters.HTTPAdapter, 'get_connection', side_effect=ValueError('Real request made!')
)
def test_mock_session(mock_http_adapter):
"""Test that the mock_session fixture is working as expected"""
with get_responses():
# An error will be raised if a real request is made
with pytest.raises(ValueError):
requests.get(PASSTHRU_URL)
# All mocked URLs will return a response based on requests-cache data
for url in TEST_URLS:
response = requests.get(url)
assert getattr(response, 'from_cache', False) is False
# responses will raise an error for an unmocked URL, as usual
with pytest.raises(ConnectionError):
requests.get(UNMOCKED_URL)