Advanced Usage

This section covers some more advanced and use-case-specific features.

Cache Inspection

Here are some ways to get additional information out of the cache session, backend, and responses:

Response Details

The following attributes are available on responses:

  • from_cache: indicates if the response came from the cache

  • created_at: datetime of when the cached response was created or last updated

  • expires: datetime after which the cached response will expire

  • is_expired: indicates if the cached response is expired (if an old response was returned due to a request error)

Examples:

Example code

>>> from requests_cache import CachedSession
>>> session = CachedSession(expire_after=timedelta(days=1))

>>> # Placeholders are added for non-cached responses
>>> response = session.get('http://httpbin.org/get')
>>> print(response.from_cache, response.created_at, response.expires, response.is_expired)
False None None None

>>> # Values will be populated for cached responses
>>> response = session.get('http://httpbin.org/get')
>>> print(response.from_cache, response.created_at, response.expires, response.is_expired)
True 2021-01-01 18:00:00 2021-01-02 18:00:00 False

>>> # Print a response object to get general information about it
>>> print(response)
'request: GET https://httpbin.org/get, response: 200 (308 bytes), created: 2021-01-01 22:45:00 IST, expires: 2021-01-02 18:45:00 IST (fresh)'

Cache Contents

You can use CachedSession.cache.urls to see all URLs currently in the cache:

>>> session = CachedSession()
>>> print(session.cache.urls)
['https://httpbin.org/get', 'https://httpbin.org/stream/100']

If needed, you can get more details on cached responses via CachedSession.cache.responses, which is a dict-like interface to the cache backend. See CachedResponse for a full list of attributes available.

For example, if you wanted to to see all URLs requested with a specific method:

>>> post_urls = [
...     response.url for response in session.cache.responses.values()
...     if response.request.method == 'POST'
... ]

You can also inspect CachedSession.cache.redirects, which maps redirect URLs to keys of the responses they redirect to.

Additional keys() and values() wrapper methods are available on BaseCache to get combined keys and responses.

>>> print('All responses:')
>>> for response in session.cache.values():
>>>     print(response)

>>> print('All cache keys for redirects and responses combined:')
>>> print(list(session.cache.keys()))

Both methods also take a check_expiry argument to exclude expired responses:

>>> print('All unexpired responses:')
>>> for response in session.cache.values(check_expiry=True):
>>>     print(response)

Similarly, you can get a count of responses with BaseCache.response_count(), and optionally exclude expired responses:

>>> print(f'Total responses: {session.cache.response_count()}')
>>> print(f'Unexpired responses: {session.cache.response_count(check_expiry=True)}')

Custom Response Filtering

If you need more advanced behavior for determining what to cache, you can provide a custom filtering function via the filter_fn param. This can by any function that takes a requests.Response object and returns a boolean indicating whether or not that response should be cached. It will be applied to both new responses (on write) and previously cached responses (on read):

Example code

>>> from sys import getsizeof
>>> from requests_cache import CachedSession

>>> def filter_by_size(response):
>>>     """Don't cache responses with a body over 1 MB"""
>>>     return getsizeof(response.content) <= 1024 * 1024

>>> session = CachedSession(filter_fn=filter_by_size)

Custom Backends

If the built-in Cache Backends don’t suit your needs, you can create your own by making subclasses of BaseCache and BaseStorage:

Example code

>>> from requests_cache import CachedSession
>>> from requests_cache.backends import BaseCache, BaseStorage

>>> class CustomCache(BaseCache):
...     """Wrapper for higher-level cache operations. In most cases, the only thing you need
...     to specify here is which storage class(es) to use.
...     """
...     def __init__(self, **kwargs):
...         super().__init__(**kwargs)
...         self.redirects = CustomStorage(**kwargs)
...         self.responses = CustomStorage(**kwargs)

>>> class CustomStorage(BaseStorage):
...     """Dict-like interface for lower-level backend storage operations"""
...     def __init__(self, **kwargs):
...         super().__init__(**kwargs)
...
...     def __getitem__(self, key):
...         pass
...
...     def __setitem__(self, key, value):
...         pass
...
...     def __delitem__(self, key):
...         pass
...
...     def __iter__(self):
...         pass
...
...     def __len__(self):
...         pass
...
...     def clear(self):
...         pass

You can then use your custom backend in a CachedSession with the backend parameter:

>>> session = CachedSession(backend=CustomCache())

Custom Serializers

If the built-in Serializers don’t suit your needs, you can create your own. For example, if you had a imaginary custom_pickle module that provides dumps and loads functions:

>>> import custom_pickle
>>> from requests_cache import CachedSession
>>> session = CachedSession(serializer=custom_pickle)

Serializer Pipelines

More complex serialization can be done with SerializerPipeline. Use cases include text-based serialization, compression, encryption, and any other intermediate steps you might want to add.

Any combination of these can be composed with a SerializerPipeline, which starts with a CachedResponse and ends with a str or bytes object. Each stage of the pipeline can be any object or module with dumps and loads functions. If the object has similar methods with different names (e.g. compress / decompress), those can be aliased using Stage.

For example, a compressed pickle serializer can be built as:

Example code

>>> import pickle, gzip
>>> from requests_cache.serialzers import SerializerPipeline, Stage
>>> compressed_serializer = SerializerPipeline([
...     pickle,
...     Stage(gzip, dumps='compress', loads='decompress'),
...])
>>> session = CachedSession(serializer=compressed_serializer)

Text-based Serializers

If you’re using a text-based serialization format like JSON or YAML, some extra steps are needed to encode binary data and non-builtin types. The cattrs library can do the majority of the work here, and some pre-configured converters are included for serveral common formats in the preconf module.

For example, a compressed JSON pipeline could be built as follows:

Example code

>>> import json, gzip, codecs
>>> from requests_cache.serializers import SerializerPipeline, Stage, json_converter
>>> comp_json_serializer = SerializerPipeline([
...     json_converter, # Serialize to a JSON string
...     Stage(codecs.utf_8, dumps='encode', loads='decode'), # Encode to bytes
...     Stage(gzip, dumps='compress', loads='decompress'), # Compress
...])

Note

If you want to use a different format that isn’t included in preconf, you can use CattrStage as a starting point.

Note

If you want to convert a string representation to bytes (e.g. for compression), you must use a codec from codecs (typically codecs.utf_8)

Additional Serialization Steps

Some other tools that could be used as a stage in a SerializerPipeline include:

class

loads

dumps

codecs.*

encode

decode

bz2

compress

decompress

gzip

compress

decompress

lzma

compress

decompress

zlib

compress

decompress

pickle

dumps

loads

itsdangerous.signer.Signer

sign

unsign

itsdangerous.serializer.Serializer

loads

dumps

cryptography.fernet.Fernet

encrypt

decrypt

Usage with other requests features

Request Hooks

Requests has an Event Hook system that can be used to add custom behavior into different parts of the request process. It can be used, for example, for request throttling:

Example code

>>> import time
>>> import requests
>>> from requests_cache import CachedSession
>>>
>>> def make_throttle_hook(timeout=1.0):
>>>     """Make a request hook function that adds a custom delay for non-cached requests"""
>>>     def hook(response, *args, **kwargs):
>>>         if not getattr(response, 'from_cache', False):
>>>             print('sleeping')
>>>             time.sleep(timeout)
>>>         return response
>>>     return hook
>>>
>>> session = CachedSession()
>>> session.hooks['response'].append(make_throttle_hook(0.1))
>>> # The first (real) request will have an added delay
>>> session.get('http://httpbin.org/get')
>>> session.get('http://httpbin.org/get')

Streaming Requests

Note

This feature requires requests >= 2.19

If you use streaming requests, you can use the same code to iterate over both cached and non-cached requests. A cached request will, of course, have already been read, but will use a file-like object containing the content:

Example code

>>> from requests_cache import CachedSession
>>>
>>> session = CachedSession()
>>> for i in range(2):
...     response = session.get('https://httpbin.org/stream/20', stream=True)
...     for chunk in response.iter_lines():
...         print(chunk.decode('utf-8'))

Usage with other requests-based libraries

This library works by patching and/or extending requests.Session. Many other libraries out there do the same thing, making it potentially difficult to combine them.

For that scenario, a mixin class is provided, so you can create a custom class with behavior from multiple Session-modifying libraries:

>>> from requests import Session
>>> from requests_cache import CacheMixin
>>> from some_other_lib import SomeOtherMixin
>>>
>>> class CustomSession(CacheMixin, SomeOtherMixin, Session):
...     """Session class with features from both some_other_lib and requests-cache"""

Requests-HTML

requests-html is one library that works with this method:

Example code

>>> import requests
>>> from requests_cache import CacheMixin, install_cache
>>> from requests_html import HTMLSession
>>>
>>> class CachedHTMLSession(CacheMixin, HTMLSession):
...     """Session with features from both CachedSession and HTMLSession"""
>>>
>>> session = CachedHTMLSession()
>>> response = session.get('https://github.com/')
>>> print(response.from_cache, response.html.links)

Or if you are using install_cache(), you can use the session_factory argument:

Example code

>>> install_cache(session_factory=CachedHTMLSession)
>>> response = requests.get('https://github.com/')
>>> print(response.from_cache, response.html.links)

The same approach can be used with other libraries that subclass requests.Session.

Requests-futures

Some libraries, including requests-futures, support wrapping an existing session object:

>>> session = FutureSession(session=CachedSession())

In this case, FutureSession must wrap CachedSession rather than the other way around, since FutureSession returns (as you might expect) futures rather than response objects. See issue #135 for more notes on this.

Internet Archive

Usage with internetarchive is the same as other libraries that subclass requests.Session:

Example code

>>> from requests_cache import CacheMixin
>>> from internetarchive.session import ArchiveSession
>>>
>>> class CachedArchiveSession(CacheMixin, ArchiveSession):
...     """Session with features from both CachedSession and ArchiveSession"""

Requests-mock

requests-mock has multiple methods for mocking requests, including a contextmanager, decorator, fixture, and adapter. There are a few different options for using it with requests-cache, depending on how you want your tests to work.

Disabling requests-cache

If you have an application that uses requests-cache and you just want to use requests-mock in your tests, the easiest thing to do is to disable requests-cache.

For example, if you are using install_cache() in your application and the requests-mock pytest fixture in your tests, you could wrap it in another fixture that uses uninstall_cache() or disabled():

Example code

"""Example of using requests-cache with the requests-mock library"""
import pytest
import requests

import requests_cache


@pytest.fixture(scope='function')
def requests_cache_mock(requests_mock):
    with requests_cache.disabled():
        yield requests_mock


def test_requests_cache_mock(requests_cache_mock):
    """Within this test function, requests will be mocked and not cached"""
    url = 'https://example.com'
    requests_cache_mock.get(url, text='Mock response!')

    # Make sure the mocker is used
    response_1 = requests.get(url)
    assert response_1.text == 'Mock response!'

    # Make sure the cache is not used
    response_2 = requests.get(url)
    assert getattr(response_2, 'from_cache', False) is False

Or if you use a CachedSession object, you could replace it with a regular Session, for example:

Example code

import unittest
import pytest
import requests


@pytest.fixure(scope='function', autouse=True)
def disable_requests_cache():
    """Replace CachedSession with a regular Session for all test functions"""
    with unittest.mock.patch('requests_cache.CachedSession', requests.Session):
        yield

Combining requests-cache with requests-mock

If you want both caching and mocking features at the same time, you can attach requests-mock’s adapter to a CachedSession:

Example code

"""Example of using requests-cache with the requests-mock library"""
import pytest
from requests_mock import Adapter

from requests_cache import CachedSession

URL = 'https://some_test_url'


@pytest.fixture(scope='function')
def mock_session():
    """Fixture that provides a CachedSession that will make mock requests where it would normally
    make real requests"""
    adapter = Adapter()
    adapter.register_uri(
        'GET',
        URL,
        headers={'Content-Type': 'text/plain'},
        text='Mock response!',
        status_code=200,
    )

    session = CachedSession(backend='memory')
    session.mount('https://', adapter)
    yield session


def test_mock_session(mock_session):
    """Test that the mock_session fixture is working as expected"""
    response_1 = mock_session.get(URL)
    assert response_1.text == 'Mock response!'
    assert getattr(response_1, 'from_cache', False) is False

    response_2 = mock_session.get(URL)
    assert response_2.text == 'Mock response!'
    assert response_2.from_cache is True

Building a mocker using requests-cache data

Another approach is to use cached data to dynamically define mock requests + responses. This has the advantage of only using request-mock’s behavior for request matching.

Example code

@pytest.fixture(scope='session')
def mock_session():
    """Fixture that provides a session with mocked URLs and responses based on cache data"""
    adapter = Adapter()
    cache = CachedSession(TEST_DB).cache

    for response in cache.values():
        adapter.register_uri(
            response.request.method,
            response.request.url,
            content=response.content,
            headers=response.headers,
            status_code=response.status_code,
        )
        print(f'Added mock response: {response}')

    session = Session()
    session.mount('http://', adapter)
    session.mount('https://', adapter)
    yield session

To turn that into a complete example:

Example code

"""Example of using requests-cache with the requests-mock library"""
from os.path import dirname, join
from unittest.mock import patch

import pytest
import requests
from requests import Session
from requests_mock import Adapter, NoMockAddress

from requests_cache import CachedSession

TEST_DB = join(dirname(__file__), 'httpbin_sample.test-db')
TEST_URLS = [
    'https://httpbin.org/get',
    'https://httpbin.org/html',
    'https://httpbin.org/json',
]
UNMOCKED_URL = 'https://httpbin.org/ip'


@pytest.fixture(scope='session')
def mock_session():
    """Fixture that provides a session with mocked URLs and responses based on cache data"""
    adapter = Adapter()
    cache = CachedSession(TEST_DB).cache

    for response in cache.values():
        adapter.register_uri(
            response.request.method,
            response.request.url,
            content=response.content,
            headers=response.headers,
            status_code=response.status_code,
        )
        print(f'Added mock response: {response}')

    session = Session()
    session.mount('http://', adapter)
    session.mount('https://', adapter)
    yield session


@patch.object(requests.adapters.HTTPAdapter, 'send', side_effect=ValueError('Real request made!'))
def test_mock_session(mock_http_adapter, mock_session):
    """Test that the mock_session fixture is working as expected"""
    # An error will be raised if a real request is made
    with pytest.raises(ValueError):
        requests.get(TEST_URLS[0])

    # All mocked URLs will return a response based on requests-cache data
    for url in TEST_URLS:
        response = mock_session.get(url)
        assert getattr(response, 'from_cache', False) is False

    # requests-mock will raise an error for an unmocked URL, as usual
    with pytest.raises(NoMockAddress):
        mock_session.get(UNMOCKED_URL)


def save_test_data():
    """Run once to save data to reuse for tests, for demo purposes.
    In practice, you could just run your application or tests with requests-cache installed.
    """
    session = CachedSession(TEST_DB)
    for url in TEST_URLS:
        session.get(url)


if __name__ == '__main__':
    save_test_data()

Responses

Usage with the responses library is similar to the requests-mock examples above.

Example code

"""Example of using requests-cache with the responses library"""
from contextlib import contextmanager
from os.path import dirname, join
from unittest.mock import patch

import pytest
import requests
from requests.exceptions import ConnectionError
from responses import RequestsMock, Response

from requests_cache import CachedSession

TEST_DB = join(dirname(__file__), 'httpbin_sample.test-db')
TEST_URLS = [
    'https://httpbin.org/get',
    'https://httpbin.org/html',
    'https://httpbin.org/json',
]
PASSTHRU_URL = 'https://httpbin.org/gzip'
UNMOCKED_URL = 'https://httpbin.org/ip'


@contextmanager
def get_responses():
    """Contextmanager that provides a RequestsMock object mocked URLs and responses
    based on cache data
    """
    with RequestsMock() as mocker:
        cache = CachedSession(TEST_DB).cache
        for response in cache.values():
            mocker.add(
                Response(
                    response.request.method,
                    response.request.url,
                    body=response.content,
                    headers=response.headers,
                    status=response.status_code,
                )
            )
        mocker.add_passthru(PASSTHRU_URL)
        yield mocker


# responses patches HTTPAdapter.send(), so we need to patch one level lower to verify request mocking
@patch.object(
    requests.adapters.HTTPAdapter, 'get_connection', side_effect=ValueError('Real request made!')
)
def test_mock_session(mock_http_adapter):
    """Test that the mock_session fixture is working as expected"""
    with get_responses():
        # An error will be raised if a real request is made
        with pytest.raises(ValueError):
            requests.get(PASSTHRU_URL)

        # All mocked URLs will return a response based on requests-cache data
        for url in TEST_URLS:
            response = requests.get(url)
            assert getattr(response, 'from_cache', False) is False

        # responses will raise an error for an unmocked URL, as usual
        with pytest.raises(ConnectionError):
            requests.get(UNMOCKED_URL)