User Guide¶
This section covers the main features of requests-cache.
Installation¶
Install with pip:
pip install requests-cache
Or with Conda, if you prefer:
conda install -c conda-forge requests-cache
Requirements¶
Requires python 3.6+.
You may need additional dependencies depending on which backend you want to use. To install with extra dependencies for all supported Cache Backends:
pip install requests-cache[backends]
Optional Setup Steps¶
See Security for recommended setup steps for more secure cache serialization.
See Contributing Guide for setup steps for local development.
General Usage¶
There are two main ways of using requests-cache:
Sessions: (recommended) Use
CachedSession
to send your requestsPatching: Globally patch
requests
usinginstall_cache()
Sessions¶
CachedSession
can be used as a drop-in replacement for requests.Session
.
Basic usage looks like this:
>>> from requests_cache import CachedSession
>>>
>>> session = CachedSession()
>>> session.get('http://httpbin.org/get')
Any requests.Session
method can be used (but see HTTP Methods section
below for config details):
>>> session.request('GET', 'http://httpbin.org/get')
>>> session.head('http://httpbin.org/get')
Caching can be temporarily disabled with CachedSession.cache_disabled()
:
>>> with session.cache_disabled():
... session.get('http://httpbin.org/get')
The best way to clean up your cache is through Cache Expiration, but you can also
clear out everything at once with BaseCache.clear()
:
>>> session.cache.clear()
Patching¶
In some situations, it may not be possible or convenient to manage your own session object. In those
cases, you can use install_cache()
to add caching to all requests
functions:
>>> import requests
>>> import requests_cache
>>>
>>> requests_cache.install_cache()
>>> requests.get('http://httpbin.org/get')
As well as session methods:
>>> session = requests.Session()
>>> session.get('http://httpbin.org/get')
install_cache()
accepts all the same parameters as CachedSession
:
>>> requests_cache.install_cache(expire_after=360, allowable_methods=('GET', 'POST'))
It can be temporarily enabled()
:
>>> with requests_cache.enabled():
... requests.get('http://httpbin.org/get') # Will be cached
Or temporarily disabled()
:
>>> requests_cache.install_cache()
>>> with requests_cache.disabled():
... requests.get('http://httpbin.org/get') # Will not be cached
Or completely removed with uninstall_cache()
:
>>> requests_cache.uninstall_cache()
>>> requests.get('http://httpbin.org/get')
You can also clear out all responses in the cache with clear()
, and check if
requests-cache is currently installed with is_installed()
.
Limitations¶
Like any other utility that uses global patching, there are some scenarios where you won’t want to
use install_cache()
:
In a multi-threaded or multiprocess application
In an application that uses other packages that extend or modify
requests.Session
In a package that will be used by other packages or applications
Cache Backends¶
Several cache backends are included, which can be selected with
the backend
parameter for either CachedSession
or install_cache()
:
'sqlite'
: SQLite database (default)'redis'
: Redis cache (requiresredis
)'mongodb'
: MongoDB database (requirespymongo
)'gridfs'
: GridFS collections on a MongoDB database (requirespymongo
)'dynamodb'
: Amazon DynamoDB database (requiresboto3
)'filesystem'
: Stores responses as files on the local filesystem'memory'
: A non-persistent cache that just stores responses in memory
A backend can be specified either by name, class or instance:
>>> from requests_cache.backends import RedisCache
>>> from requests_cache import CachedSession
>>> # Backend name
>>> session = CachedSession(backend='redis', namespace='my-cache')
>>> # Backend class
>>> session = CachedSession(backend=RedisCache, namespace='my-cache')
>>> # Backend instance
>>> session = CachedSession(backend=RedisCache(namespace='my-cache'))
See requests_cache.backends
for more backend-specific usage details, and see
Custom Backends for details on creating your own implementation.
Cache Name¶
The cache_name
parameter will be used as follows depending on the backend:
sqlite
: Database path, e.g~/.cache/my_cache.sqlite
dynamodb
: Table namemongodb
andgridfs
: Database nameredis
: Namespace, meaning all keys will be prefixed with'<cache_name>:'
filesystem
: Cache directory
Cache Options¶
A number of options are available to modify which responses are cached and how they are cached.
HTTP Methods¶
By default, only GET and HEAD requests are cached. To cache additional HTTP methods, specify them
with allowable_methods
. For example, caching POST requests can be used to ensure you don’t send
the same data multiple times:
>>> session = CachedSession(allowable_methods=('GET', 'POST'))
>>> session.post('http://httpbin.org/post', json={'param': 'value'})
Status Codes¶
By default, only responses with a 200 status code are cached. To cache additional status codes,
specify them with allowable_codes
”
>>> session = CachedSession(allowable_codes=(200, 418))
>>> session.get('http://httpbin.org/teapot')
Request Parameters¶
By default, all request parameters are taken into account when caching responses. In some cases,
there may be request parameters that don’t affect the response data, for example authentication tokens
or credentials. If you want to ignore specific parameters, specify them with ignored_parameters
:
>>> session = CachedSession(ignored_parameters=['auth-token'])
>>> # Only the first request will be sent
>>> session.get('http://httpbin.org/get', params={'auth-token': '2F63E5DF4F44'})
>>> session.get('http://httpbin.org/get', params={'auth-token': 'D9FAEB3449D3'})
In addition to allowing the cache to ignore these parameters when fetching cached results, these
parameters will also be removed from the cache data, including in the request headers.
This makes ignored_parameters
a good way to prevent key material or other secrets from being
saved in the cache backend.
Request Headers¶
In some cases, different headers may result in different response data, so you may want to cache
them separately. To enable this, use include_get_headers
:
>>> session = CachedSession(include_get_headers=True)
>>> # Both of these requests will be sent and cached separately
>>> session.get('http://httpbin.org/headers', {'Accept': 'text/plain'})
>>> session.get('http://httpbin.org/headers', {'Accept': 'application/json'})
Cache Expiration¶
By default, cached responses will be stored indefinitely. There are a number of options for
specifying how long to store responses. The simplest option is to initialize the cache with an
expire_after
value:
>>> # Set expiration for the session using a value in seconds
>>> session = CachedSession(expire_after=360)
Expiration Precedence¶
Expiration can be set on a per-session, per-URL, or per-request basis, in addition to cache headers (see sections below for usage details). When there are multiple values provided for a given request, the following order of precedence is used:
Cache-Control request headers (if enabled)
Cache-Control response headers (if enabled)
Per-request expiration (
expire_after
argument forCachedSession.request()
)Per-URL expiration (
urls_expire_after
argument forCachedSession
)Per-session expiration (
expire_after
argument forCacheBackend
)
Expiration Values¶
expire_after
can be any of the following:
-1
(to never expire)0
(to “expire immediately,” e.g. bypass the cache)A positive number (in seconds)
A
datetime
Examples:
>>> # To specify a unit of time other than seconds, use a timedelta
>>> from datetime import timedelta
>>> session = CachedSession(expire_after=timedelta(days=30))
>>> # Update an existing session to disable expiration (i.e., store indefinitely)
>>> session.expire_after = -1
>>> # Disable caching by default, unless enabled by other settings
>>> session = CachedSession(expire_after=0)
URL Patterns¶
You can use urls_expire_after
to set different expiration values for different requests, based on
URL glob patterns. This allows you to customize caching based on what you know about the resources
you’re requesting. For example, you might request one resource that gets updated frequently, another
that changes infrequently, and another that never changes. Example:
>>> urls_expire_after = {
... '*.site_1.com': 30,
... 'site_2.com/resource_1': 60 * 2,
... 'site_2.com/resource_2': 60 * 60 * 24,
... 'site_2.com/static': -1,
... }
>>> session = CachedSession(urls_expire_after=urls_expire_after)
You can also use this to define a cache whitelist, so only the patterns you define will be cached:
>>> urls_expire_after = {
... '*.site_1.com': 30,
... 'site_2.com/static': -1,
... '*': 0, # Every other non-matching URL: do not cache
... }
Notes:
urls_expire_after
should be a dict in the format{'pattern': expire_after}
expire_after
accepts the same types asCachedSession.expire_after
Patterns will match request base URLs, so the pattern
site.com/resource/
is equivalent tohttp*://site.com/resource/**
If there is more than one match, the first match will be used in the order they are defined
If no patterns match a request,
CachedSession.expire_after
will be used as a default.
Cache-Control¶
Warning
This is not intended to be a thorough or strict implementation of header-based HTTP caching, e.g. according to RFC 2616.
Optional support is included for a simplified subset of
Cache-Control
and other cache headers in both requests and responses. To enable this behavior, use the
cache_control
option:
>>> session = CachedSession(cache_control=True)
Supported request headers:
Cache-Control: max-age
: Used as the expiration time in secondsCache-Control: no-cache
: Skips reading response data from the cacheCache-Control: no-store
: Skips reading and writing response data from/to the cache
Supported response headers:
Cache-Control: max-age
: Used as the expiration time in secondsCache-Control: no-store
Skips writing response data to the cacheExpires
: Used as an absolute expiration time
Notes:
Unlike a browser or proxy cache,
max-age=0
does not currently clear previously cached responses.If enabled, Cache-Control directives will take priority over any other
expire_after
value. See Expiration Precedence for the full order of precedence.
Removing Expired Responses¶
For better performance, expired responses won’t be removed immediately, but will be removed
(or replaced) the next time they are requested. To manually clear all expired responses, use
CachedSession.remove_expired_responses()
:
>>> session.remove_expired_responses()
Or, when using patching:
>>> requests_cache.remove_expired_responses()
You can also apply a different expire_after
to previously cached responses, which will
revalidate the cache with the new expiration time:
>>> session.remove_expired_responses(expire_after=timedelta(days=30))
Serializers¶
By default, responses are serialized using pickle
. Some other options are also available:
Note
These features require python 3.7+ and additional dependencies
JSON Serializer¶
Storing responses as JSON gives you the benefit of making them human-readable and editable, in exchange for a slight reduction in performance. This can be especially useful in combination with the filesystem backend.
Example JSON-serialized Response
{
"url": "https://httpbin.org/get",
"status_code": 200,
"reason": "OK",
"_content": "dkP>RB4Ki8b0Rt*dwnb*3LqdNXk}q!WpZ;OIv{%rARr(hB0*zgWpH#NIv^q{FDfD|APOKLARr<^V`F7-bS*`0V{c?>Zf7DoAR=daX>cqcWMyV-VRU68EFcOXARr(jNN;m=B03-<XmoUNVrgzJZ*pfMEFcOXARr(jRdZ!>EkS2xZge6#AR=&ibZBpGEplaXb!BsOb1yP3GBz$SA}k;ZARr(hB3La!ZF+7kRB~ZsWi3f$B03-<Qg3f`JuxjdFlIPmF)(2*GB7hXGcjZ_Gh<>hGc{o{G%#g2VK_22A_^cNeJmgfARr=da%pF2ZX!A$A~82JE-^VSF*GqQGBq+HEFcOXAR={gY$7@!B4~7UaC15@FKBdhaAIk0E^l&YFK1<RA_{#9",
"cache_key": "4dc151d95200ec91fa77021989f5194e9be47e87f8f228306f3a8d5434b9e547",
"created_at": "2021-07-21T22:34:50.343095",
"elapsed": 0.242198,
"encoding": "utf-8",
"headers": {
"Date": "Wed, 21 Jul 2021 22:34:50 GMT",
"Content-Type": "application/json",
"Content-Length": "308",
"Connection": "keep-alive",
"Server": "gunicorn/19.9.0",
"Access-Control-Allow-Origin": "*",
"Access-Control-Allow-Credentials": "true"
},
"request": {
"body": "PH%2y",
"headers": {
"User-Agent": "python-requests/2.26.0",
"Accept-Encoding": "gzip, deflate",
"Accept": "*/*",
"Connection": "keep-alive"
},
"method": "GET",
"url": "https://httpbin.org/get"
},
"raw": {
"decode_content": false,
"reason": "OK",
"status": 200,
"version": 11
}
}
You can install the extra dependencies for this serializer with:
pip install requests-cache[json]
YAML Serializer¶
YAML is another option if you need a human-readable/editable format, with the same tradeoffs as JSON.
Example YAML-serialized Response
url: https://httpbin.org/get
status_code: 200
reason: OK
_content: !!binary |
ewogICJhcmdzIjoge30sIAogICJoZWFkZXJzIjogewogICAgIkFjY2VwdCI6ICIqLyoiLCAKICAg
ICJBY2NlcHQtRW5jb2RpbmciOiAiZ3ppcCwgZGVmbGF0ZSIsIAogICAgIkhvc3QiOiAiaHR0cGJp
bi5vcmciLCAKICAgICJVc2VyLUFnZW50IjogInB5dGhvbi1yZXF1ZXN0cy8yLjI2LjAiLCAKICAg
ICJYLUFtem4tVHJhY2UtSWQiOiAiUm9vdD0xLTYwZjhhMDcxLTBiN2JmN2VjNGMyZTdmNjA2YWI4
ZDYyNCIKICB9LCAKICAib3JpZ2luIjogIjE3My4xOS4xNDEuMjUyIiwgCiAgInVybCI6ICJodHRw
czovL2h0dHBiaW4ub3JnL2dldCIKfQo=
cache_key: 4dc151d95200ec91fa77021989f5194e9be47e87f8f228306f3a8d5434b9e547
created_at: '2021-07-21T22:32:17.592974'
elapsed: 0.187586
encoding: utf-8
headers:
Access-Control-Allow-Credentials: 'true'
Access-Control-Allow-Origin: '*'
Connection: keep-alive
Content-Length: '308'
Content-Type: application/json
Date: Wed, 21 Jul 2021 22:32:17 GMT
Server: gunicorn/19.9.0
request:
method: GET
url: https://httpbin.org/get
body: !!binary |
Tm9uZQ==
headers:
Accept: '*/*'
Accept-Encoding: gzip, deflate
Connection: keep-alive
User-Agent: python-requests/2.26.0
raw:
decode_content: false
reason: OK
status: 200
version: 11
You can install the extra dependencies for this serializer with:
pip install requests-cache[yaml]
BSON Serializer¶
BSON is a serialization format originally created for
MongoDB, but it can also be used independently. Compared to JSON, it has better performance
(although still not as fast as pickle
), and adds support for additional data types. It is not
human-readable, but some tools support reading and editing it directly
(for example, bson-converter for Atom).
You can install the extra dependencies for this serializer with:
pip install requests-cache[mongo]
Or if you would like to use the standalone BSON codec for a different backend, without installing MongoDB dependencies:
pip install requests-cache[bson]
Error Handling¶
In some cases, you might cache a response, have it expire, but then encounter an error when
retrieving a new response. If you would like to use expired response data in these cases, use the
old_data_on_error
option:
>>> # Cache a test response that will expire immediately
>>> session = CachedSession(old_data_on_error=True)
>>> session.get('https://httpbin.org/get', expire_after=0.001)
>>> time.sleep(0.001)
Afterward, let’s say the page has moved and you get a 404, or the site is experiencing downtime and you get a 500. You will then get the expired cache data instead:
>>> response = session.get('https://httpbin.org/get')
>>> print(response.from_cache, response.is_expired)
True, True
In addition to error codes, old_data_on_error
also applies to exceptions (typically a
RequestException
). See requests documentation on
Errors and Exceptions
for more details on request errors in general.