Asynchronous HTTP client¶
Pulsar ships with a fully featured, HttpClient
class for multiple asynchronous HTTP requests. The client has an
has no dependencies and API
very similar to python requests library.
Getting Started¶
To get started, one builds a client for multiple sessions:
from pulsar.apps import http
sessions = http.HttpClient()
and than makes requests, in a coroutine:
async def mycoroutine():
...
response = await sessions.get('http://www.bbc.co.uk')
return response.text()
The response is an HttpResponse object which contains all the
information about the request and the result:
>>> request = response.request
>>> print(request.headers)
Connection: Keep-Alive
User-Agent: pulsar/0.8.2-beta.1
Accept-Encoding: deflate, gzip
Accept: */*
>>> response.status_code
200
>>> print(response.headers)
...
The request attribute of HttpResponse
is an instance of the original HttpRequest.
Passing Parameters In URLs¶
You can attach parameters to the url by passing the
params dictionary:
response = sessions.get('http://bla.com',
params={'page': 2, 'key': 'foo'})
response.url // 'http://bla.com?page=2&key=foo'
You can also pass a list of items as a value:
params = {key1': 'value1', 'key2': ['value2', 'value3']}
response = sessions.get('http://bla.com', params=params)
response.url // http://bla.com?key1=value1&key2=value2&key2=value3
Post data¶
Simple data¶
Posting data is as simple as passing the data parameter:
sessions.post(..., data={'entry1': 'bla', 'entry2': 'doo'})
JSON data¶
Posting data is as simple as passing the data parameter:
sessions.post(..., json={'entry1': 'bla', 'entry2': 'doo'})
File data¶
Posting data is as simple as passing the data parameter:
files = {'file': open('report.xls', 'rb')}
sessions.post(..., files=files)
Streaming data¶
It is possible to post streaming data too. Streaming data can be a simple generator:
sessions.post(..., data=(b'blabla' for _ in range(10)))
or a coroutine:
sessions.post(..., data=(b'blabla' for _ in range(10)))
Cookie support¶
Cookies are handled by storing cookies received with responses in a sessions
object. To disable cookie one can pass store_cookies=False during
HttpClient initialisation.
If a response contains some Cookies, you can get quick access to them:
response = await sessions.get(...)
type(response.cookies)
<type 'dict'>
To send your own cookies to the server, you can use the cookies parameter:
response = await sessions.get(..., cookies={'sessionid': 'test'})
Authentication¶
Authentication, either basic or digest, can be added
by passing the auth parameter during a request. For basic authentication:
sessions.get(..., auth=('<username>','<password>'))
same as:
from pulsar.apps.http import HTTPBasicAuth
sessions.get(..., auth=HTTPBasicAuth('<username>','<password>'))
or digest:
from pulsar.apps.http import HTTPDigestAuth
sessions.get(..., auth=HTTPDigestAuth('<username>','<password>'))
In either case the authentication is handled by adding additional headers to your requests.
TLS/SSL¶
Supported out of the box:
sessions.get('https://github.com/timeline.json')
The HttpClient can verify SSL certificates for HTTPS requests,
just like a web browser. To check a host’s SSL certificate, you can use the
verify argument:
sessions = HttpClient()
sessions.verify // True
sessions = HttpClient(verify=False)
sessions.verify // False
By default, verify is set to True.
You can override the verify argument during requests too:
sessions.get('https://github.com/timeline.json')
sessions.get('https://locahost:8020', verify=False)
You can pass verify the path to a CA_BUNDLE file or directory with
certificates of trusted CAs:
sessions.get('https://locahost:8020', verify='/path/to/ca_bundle')
Streaming¶
This is an event-driven client, therefore streaming support is native.
The raw stream¶
The easiest way to use streaming is to pass the stream=True parameter
during a request and access the HttpResponse.raw attribute.
For example:
async def body_coroutine(url):
# wait for response headers
response = await sessions.get(url, stream=True)
#
async for data in response.raw:
# data is a chunk of bytes
...
The raw attribute is an asynchronous iterable over bytes and it can be
iterated once only. When iterating over a raw attribute which has
been already iterated, StreamConsumedError is raised.
The attribute has the read method for reading the whole body at once:
await response.raw.read()
Data processed hook¶
Another approach to streaming is to use the data_processed event handler. For example:
def new_data(response, **kw):
if response.status_code == 200:
data = response.recv_body()
# do something with this data
response = sessions.get(..., data_processed=new_data)
The response recv_body() method fetches the parsed body
of the response and at the same time it flushes it.
Check the proxy server example for an
application using the HttpClient streaming capabilities.
WebSocket¶
The http client support websocket upgrades. First you need to have a
websocket handler, a class derived from WS:
from pulsar.apps import ws
class Echo(ws.WS):
def on_message(self, websocket, message):
websocket.write(message)
The websocket response is obtained by:
ws = await sessions.get('ws://...', websocket_handler=Echo())
Client Options¶
Several options are available to customise how the HTTP client works
Pool size¶
The HTTP client maintain connections _pools with remote hosts.
The parameter which control the
pool size for each domain is pool_size which is set
to 10 by default.
Redirects¶
By default Requests will perform location redirection for all verbs except HEAD.
The HttpResponse.history list contains the Response objects that were
created in order to complete the request. For example:
response = await sessions.get('http://github.com')
response.status_code # 200
response.history # [<Response [301]>]
If you’re using GET, OPTIONS, POST, PUT, PATCH or DELETE, you can disable
redirection handling with the allow_redirects parameter:
response = await sessions.get('http://github.com', allow_redirects=False)
response.status_code # 301
response.history # []
Decompression¶
Decompression of the response body is automatic.
To disable decompression pass the decompress parameter to a request:
response = await sessions.get('https://github.com', decompress=False)
response.status_code # 200
response.text() # UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
Alternatively, the decompress flag can be set at session level:
sessions = HttpClient(decompress=False)
response = await sessions.get('https://github.com')
response.status_code # 200
response.text() # UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
Synchronous Mode¶
Can be used in synchronous mode if the loop did not start, alternatively it is possible to use it in synchronous mode on a new thread:
sessions = HttpClient(loop=new_event_loop())
Events¶
Events control the behaviour of the
HttpClient when certain conditions occur. They are useful for
handling standard HTTP event such as redirects,
websocket upgrades,
streaming or anything your application
requires.
One time events¶
There are three one time events associated with an
HttpResponse object:
pre_request, fired before the request is sent to the server. Callbacks receive the response argument.on_headers, fired when response headers are available. Callbacks receive the response argument.post_request, fired when the response is done. Callbacks receive the response argument.
Adding event handlers can be done at sessions level:
def myheader_handler(response, exc=None):
if not exc:
print('got headers!')
sessions.bind_event('on_headers', myheader_handler)
or at request level:
sessions.get(..., on_headers=myheader_handler)
By default, the HttpClient has one pre_request callback for
handling HTTP tunneling, three on_headers callbacks for
handling 100 Continue, websocket upgrade and cookies,
and one post_request callback for handling redirects.
Many time events¶
In addition to the three one time events,
the HttpClient supports two additional
events which can occur several times while processing a given response:
data_receivedis fired when new data has been received but not yet parseddata_processedis fired just after the data has been parsed by theHttpResponse. This is the event one should bind to when performing http streaming.
both events support handlers with a signature:
def handler(response, data=None):
...
where response is the HttpResponse handling the request and
data is the raw data received.
API¶
The main classes here are the HttpClient, a subclass of
AbstractClient, the HttpResponse, returned by http
requests and the HttpRequest.
HTTP Client¶
-
class
pulsar.apps.http.HttpClient(proxies=None, headers=None, verify=True, cookies=None, store_cookies=True, max_redirects=10, decompress=True, version=None, websocket_handler=None, parser=None, trust_env=True, loop=None, client_version=None, timeout=None, stream=False, pool_size=10, frame_parser=None, logger=None, close_connections=False, keep_alive=None)[source]¶ A client for HTTP/HTTPS servers.
It handles pool of asynchronous connections.
Parameters: - pool_size – set the
pool_sizeattribute. - store_cookies – set the
store_cookiesattribute
-
headers¶ Default headers for this
HttpClient.Default:
DEFAULT_HTTP_HEADERS.
Default cookies for this
HttpClient.
If
Trueit remembers response cookies and sends them back to servers.Default:
True
-
timeout¶ Default timeout for requests. If None or 0, no timeout on requests
-
proxies¶ Dictionary of proxy servers for this client.
-
pool_size¶ The size of a pool of connection for a given host.
-
connection_pools¶ Dictionary of connection pools for different hosts
-
DEFAULT_HTTP_HEADERS¶ Default headers for this
HttpClient
-
connection_pool¶ alias of
Pool
-
delete(url, **kwargs)[source]¶ Sends a DELETE request and returns a
HttpResponseobject.Params url: url for the new HttpRequestobject.Parameters: **kwargs – Optional arguments for the request()method.
-
get(url, **kwargs)[source]¶ Sends a GET request and returns a
HttpResponseobject.Params url: url for the new HttpRequestobject.Parameters: **kwargs – Optional arguments for the request()method.
-
head(url, **kwargs)[source]¶ Sends a HEAD request and returns a
HttpResponseobject.Params url: url for the new HttpRequestobject.Parameters: **kwargs – Optional arguments for the request()method.
-
options(url, **kwargs)[source]¶ Sends a OPTIONS request and returns a
HttpResponseobject.Params url: url for the new HttpRequestobject.Parameters: **kwargs – Optional arguments for the request()method.
-
patch(url, **kwargs)[source]¶ Sends a PATCH request and returns a
HttpResponseobject.Params url: url for the new HttpRequestobject.Parameters: **kwargs – Optional arguments for the request()method.
-
post(url, **kwargs)[source]¶ Sends a POST request and returns a
HttpResponseobject.Params url: url for the new HttpRequestobject.Parameters: **kwargs – Optional arguments for the request()method.
-
put(url, **kwargs)[source]¶ Sends a PUT request and returns a
HttpResponseobject.Params url: url for the new HttpRequestobject.Parameters: **kwargs – Optional arguments for the request()method.
-
request(method, url, timeout=None, **params)[source]¶ Constructs and sends a request to a remote server.
It returns a
Futurewhich results in aHttpResponseobject.Parameters: - method – request method for the
HttpRequest. - url – URL for the
HttpRequest. - response – optional pre-existing
HttpResponsewhich starts a new request (for redirects, digest authentication and so forth). - params – optional parameters for the
HttpRequestinitialisation.
Return type: a
Future- method – request method for the
- pool_size – set the
HTTP Request¶
-
class
pulsar.apps.http.HttpRequest(client, url, method, inp_params=None, headers=None, data=None, files=None, json=None, history=None, auth=None, charset=None, max_redirects=10, source_address=None, allow_redirects=False, decompress=True, version=None, wait_continue=False, websocket_handler=None, cookies=None, params=None, stream=False, proxies=None, verify=True, **ignored)[source]¶ An
HttpClientrequest for an HTTP resource.This class has a similar interface to
urllib.request.Request.Parameters: - files – optional dictionary of name, file-like-objects.
- allow_redirects – allow the response to follow redirects.
-
method¶ The request method
-
version¶ HTTP version for this request, usually
HTTP/1.1
-
history¶ List of past
HttpResponse(collected during redirects).
-
wait_continue¶ if
True, theHttpRequestincludes theExpect: 100-Continueheader.
-
stream¶ Allow for streaming body
-
address¶ (host, port)tuple of the HTTP resource
-
encode()[source]¶ The bytes representation of this
HttpRequest.Called by
HttpResponsewhen it needs to encode thisHttpRequestbefore sending it to the HTTP resource.
-
proxy¶ Proxy server for this request.
-
ssl¶ Context for TLS connections.
If this is a tunneled request and the tunnel connection is not yet established, it returns
None.
-
tunnel¶ Tunnel for this request.
HTTP Response¶
-
class
pulsar.apps.http.HttpResponse(loop=None, one_time_events=None, many_times_events=None)[source]¶ A
ProtocolConsumerfor the HTTP client protocol.Initialised by a call to the
HttpClient.requestmethod.There are two events you can yield in a coroutine:
-
on_headers¶ fired once the response headers are received.
-
on_finished¶ Fired once the whole request has finished
Public API:
-
content¶ Content of the response, in bytes
-
content_string(charset=None, errors=None)¶ Decode content as a string.
Dictionary of cookies set by the server or
None.
-
history¶ List of
HttpResponseobjects from the history of the request. Any redirect responses will end up here. The list is sorted from the oldest to the most recent request.
-
links¶ Returns the parsed header links of the response, if any
-
raw¶ A raw asynchronous Http response
-
status_code¶ Numeric status code such as 200, 404 and so forth.
Available once the
on_headershas fired.
-
url¶ The request full url.
-
OAuth1¶
-
class
pulsar.apps.http.oauth.OAuth1(client_id=None, client=None, **kw)[source]¶ Add OAuth1 authentication to pulsar
HttpClient
OAuth2¶
-
class
pulsar.apps.http.oauth.OAuth2(client_id=None, client=None, **kw)[source]¶ Add OAuth2 authentication to pulsar
HttpClient