TOC 
DraftB. Fitzpatrick
 B. Slatkin
 Google, Inc
 M. Atkins
 Six Apart Ltd.
 July 8, 2009


PubSubHubbub Core 0.1 -- Working Draft

Abstract

An open, simple web-scale pubsub protocol, along with an open source reference implentation targetting Google App Engine. Notably, however, nothing in the protocol is centralized, or Google- or App Engine-specific. Anybody can play.

As opposed to more developed (and more complex) pubsub specs like Jabber Publish-Subscribe (Millard, P., Saint-Andre, P., and R. Meijer, “Publish-Subscribe,” .) [XEP‑0060] this spec's base profile (the barrier-to-entry to speak it) is dead simple. The fancy bits required for high-volume publishers and subscribers are optional. The base profile is HTTP-based, as opposed to XMPP (see more on this below).

To dramatically simplify this spec in several places where we had to choose between supporting A or B, we took it upon ourselves to say "only A", rather than making it an implementation decision.

We offer this spec in hopes that it fills a need or at least advances the state of the discussion in the pubsub space. Polling sucks. We think a decentralized pubsub layer is a fundamental, missing layer in the Internet architecture today and its existence, more than just enabling the obvious lower latency feed readers, would enable many cool applications, most of which we can't even imagine. But we're looking forward to decentralized social networking.



Table of Contents

1.  Notation and Conventions
2.  Definitions
3.  High-level protocol flow
4.  Atom Details
5.  Discovery
6.  Subscribing and Unsubscribing
    6.1.  Subscriber Sends Subscription Request
    6.2.  Hub Verifies Intent of the Subscriber
7.  Publishing
    7.1.  New Content Notification
    7.2.  Content Fetch
    7.3.  Content Distribution
    7.4.  Aggregated Content Distribution
8.  References
Appendix A.  Specification Feedback
§  Authors' Addresses




 TOC 

1.  Notation and Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119] (Bradner, B., “Key words for use in RFCs to Indicate Requirement Levels,” .). Domain name examples use [RFC2606] (Eastlake, D. and A. Panitz, “Reserved Top Level DNS Names,” .).



 TOC 

2.  Definitions

Topic:
An Atom (Nottingham, M., Ed. and R. Sayre, Ed., “The Atom Syndication Format,” .) [RFC4287] feed URL (Berners-Lee, T., “Uniform Resource Identifiers (URI): Generic Syntax,” .) [RFC3986]. The unit to which one can subscribe to changes. Further, this spec currently only addresses public (unauthenticated) Atom feed URLs. TODO: RSS (Winer, D., “RSS 2.0,” .) [RSS20] is also supported and working in the reference hub, but not yet well-defined in this specification.
Hub ("the hub"):
The server (URL) which implements both sides of this protocol. We have implemented this and are running a server at http://pubsubhubbub.appspot.com that's at least for now open for anybody to use, as either a publisher or subscriber. Any hub is free to implement its own policies on who can use it.
Publisher:
An owner of a topic. Notifies the pubsub hub when the topic (Atom feed) has been updated. It just notifies that it has been updated, but not how. As in almost all pubsub systems, the publisher is unaware of the subscribers, if any. Other pubsub systems might call the publisher the "source".
Subscriber:
An entity (person or program) that wants to be notified of changes on a topic. The subscriber must be directly network-accessible and is identified by its Subscriber Callback URL.
Subscription:
A unique relation to a topic by a subscriber that indicates it should receive updates for that topic. A subscription's unique key is the tuple (Topic URL, Subscriber Callback URL). Subscriptions may (at the hub's decision) have expiration times akin to DHCP leases which must be periodically renewed.
Subscriber Callback URL:
The URL (Berners-Lee, T., “Uniform Resource Identifiers (URI): Generic Syntax,” .) [RFC3986] at which a subscriber wishes to receive notifications.
Event:
An event that's visible to multiple topics. For each event that happens (e.g. "Brad posted to the Linux Community."), multiple topics could be affected (e.g. "Brad posted." and "Linux community has new post"). Publisher events cause topics to be updated and the hub looks up all subscriptions for affected topics, sending out notifications to subscribers.
Notification:
A payload describing how a topic's contents have changed. This difference (or "delta") is computed by the hub and sent to all subscribers. The format of the notification will be an Atom feed served by the publisher with only those entries present which are new or have changed. The notification can be the result of a publisher telling the hub of an update, or the hub proactively polling a topic feed, perhaps for a subscriber subscribing to a topic that's not pubsub-aware. Note also that a notification to a subscriber can be a payload consisting of updates for multiple topics. Publishers MAY choose to send multi-topic notifications as an optimization for heavy subscribers, but subscribers MUST understand them. See Section 7.3 (Content Distribution) for format details.


 TOC 

3.  High-level protocol flow

(This section is non-normative.)



 TOC 

4.  Atom Details

Notification and source formats will be Atom (Nottingham, M., Ed. and R. Sayre, Ed., “The Atom Syndication Format,” .) [RFC4287]. Explanation detail follows this example.

<?xml version="1.0"?>
<atom:feed>
  <!-- Normally here would be source, title, etc ... -->

  <link rel="hub" href="http://myhub.example.com/endpoint" />
  <link rel="self" href="http://publisher.example.com/happycats.xml" />
  <updated>2008-08-11T02:15:01Z</updated>

  <!-- Example of a full entry. -->
  <entry>
    <title>Heathcliff</title>
    <link href="http://publisher.example.com/happycat25.xml" />
    <id>http://publisher.example.com/happycat25.xml</id>
    <updated>2008-08-11T02:15:01Z</updated>
    <content>
      What a happy cat. Full content goes here.
    </content>
  </entry>

  <!-- Example of an entity that isn't full/is truncated. This is implied
       by the lack of a <content> element and a <summary> element instead. -->
  <entry >
    <title>Heathcliff</title>
    <link href="http://publisher.example.com/happycat25.xml" />
    <id>http://publisher.example.com/happycat25.xml</id>
    <updated>2008-08-11T02:15:01Z</updated>
    <summary>
      What a happy cat!
    </summary>
  </entry>

  <!-- Meta-data only; implied by the lack of <content> and
       <summary> elements. -->
  <entry>
    <title>Garfield</title>
    <link rel="alternate" href="http://publisher.example.com/happycat24.xml" />
    <id>http://publisher.example.com/happycat25.xml</id>
    <updated>2008-08-11T02:15:01Z</updated>
  </entry>

  <!-- Context entry that's meta-data only and not new. Implied because the
       update time on this entry is before the //atom:feed/updated time. -->
  <entry>
    <title>Nermal</title>
    <link rel="alternate" href="http://publisher.example.com/happycat23s.xml" />
    <id>http://publisher.example.com/happycat25.xml</id>
    <updated>2008-07-10T12:28:13Z</updated>
  </entry>

</atom:feed>

The Publisher makes the decision as to include full body, truncated body, or meta data of most recent event(s). One of:

The trade-off between including all content in outgoing notifications or having the thundering herd (by clients who fetch the //atom:feed/entry/link in response to a notification) is up to the publisher. Entries of the most recent events (for recipient to know whether or not they'd missed any recent items-- like TCP SACK) MAY be provided as context. This is implied by the difference between the //atom:feed/updated field and the //atom:feed/entry/updated fields.

The //atom:feed/link[@rel="self"] element MUST indicate the topic URL for the original event stream (with no truncation if available). Subscribers MUST use the self link when requesting a subscription from the hub. This is crucial for subscribers to detect feed moves.



 TOC 

5.  Discovery

A potential subscriber initiates discovery by retrieving the Atom feed to which it wants to subscribe. A feed that acts as a topic as per this specification MUST publish, as a child of atom:feed, an atom:link element whose rel attribute has the value hub and whose href attribute contains the hub's endpoint URL. Feeds MAY contain multiple atom:link[@rel="hub"] elements if the publisher wishes to notify multiple hubs. When a potential subscriber encounters one or more such links, that subscriber MAY subscribe to the feed using one or more hubs as described in Section 6 (Subscribing and Unsubscribing).

Example:

<?xml version="1.0"?>
<atom:feed>
  <!-- Normally here would be source, title, etc ... -->
  <link rel="hub" href="http://myhub.example.com/endpoint" />
  <link rel="self" href="http://publisher.example.com/topic.xml" />
  ....
  <entry>
     ....
  </entry>
  <entry>
     ....
  </entry>
</atom:feed>


 TOC 

6.  Subscribing and Unsubscribing

Subscribing to a topic URL consists of two parts that may occur immediately in sequence or have a delay.

Unsubscribing works in the same way, except with a single parameter changed to indicate the desire to unsubscribe.



 TOC 

6.1.  Subscriber Sends Subscription Request

Subscription is initiated by the subscriber making an HTTP (Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” .) [RFC2616] POST request to the hub URL. This request has a Content-type of application/x-www-form-urlencoded (described in Section 17.13.4 of [W3C.REC‑html401‑19991224] (Raggett, D., Hors, A., and I. Jacobs, “HTML 4.01 Specification,” December 1999.)) and the following parameters in its body:

hub.mode
REQUIRED. The literal string "subscribe" or "unsubscribe", depending on the goal of the request.
hub.callback
REQUIRED. The subscriber's callback URL. This URL should not contain query-string parameters or an anchor fragment.
hub.topic
REQUIRED. The topic URL that the subscriber wishes to subscribe to.
hub.verify
REQUIRED. Keyword describing verification modes supported by this subscriber, as described below. This parameter may be repeated to indicate multiple supported modes.
hub.verify_token
OPTIONAL. A subscriber-provided opaque token that will be echoed back in the verification request to assist the subscriber in identifying which subscription request is being verified. If this is not included, no token will be included in the verification request.
hub.lease_seconds
OPTIONAL. Number of seconds for which the subscriber would like to have the subscription active. Hubs SHOULD choose a default value of 30 days (2592000 seconds) if not supplied. Hubs MAY choose to respect this value or not, depending on their own policies. This parameter MAY be present for unsubscription requests and MUST be ignored by the hub in that case.

The following keywords are supported for hub.verify:

sync
The subscriber supports synchronous verification, where the verification request must occur before the subscription request's HTTP response is returned.
async
The subscriber supports asynchronous verification, where the verification request may occur at a later point after the subscription request has returned.

Where multiple keywords are used, their order indicates the subscriber's order of preference. Hubs MUST ignore verify mode keywords that they do not understand. Subscribers MUST use at least one of the modes indicated in the list above, but MAY include additional keywords defined by extension specifications. Hubs MUST ignore any verify modes they do not understand.

The hub MUST respond to a subscription request with an HTTP 204 "No Content" response to indicate that the request was already verified and that the subscription has been created. If the subscription has yet to be verified (i.e., the hub is using asynchronous verification), the hub MUST respond with a 202 "Accepted" code. Hubs MUST allow subscribers to re-request subscriptions that are already activated. This is required so subscribers can keep their subscriptions active before the lease seconds period is over.

In the case of any error, an appropriate HTTP error response code (4xx or 5xx) MUST be returned. In the event of an error, hubs SHOULD return a description of the error in the response body as plain text. Hubs MAY decide to reject some callback URLs or topic URLs based on their own policies (e.g., domain authorization, topic URL port numbers).

In synchronous mode, the verification (Section 6.2 (Hub Verifies Intent of the Subscriber)) MUST be completed before the hub returns a response. In asynchronous mode, the verification MAY be deferred until a later time. This is useful to enable hubs to defer work; this could allow them to alleviate servers under heavy load or do verification work in batches.



 TOC 

6.2.  Hub Verifies Intent of the Subscriber

In order to prevent an attacker from creating unwanted subscriptions on behalf of a subscriber (or unsubscribing desired ones), a hub must ensure that the subscriber did indeed send the subscription request.

The hub verifies a subscription request by sending an HTTP (Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” .) [RFC2616] GET request to the subscriber's callback URL as given in the subscription request. This request has the following query string arguments appended (format described in Section 17.13.4 of [W3C.REC‑html401‑19991224] (Raggett, D., Hors, A., and I. Jacobs, “HTML 4.01 Specification,” December 1999.)):

hub.mode
REQUIRED. The literal string "subscribe" or "unsubscribe", which matches the original request to the hub from the subscriber.
hub.topic
REQUIRED. The topic URL given in the corresponding subscription request.
hub.challenge
REQUIRED. A hub-generated, random string that MUST be echoed by the subscriber to verify the subscription.
hub.lease_seconds
REQUIRED/OPTIONAL. The hub-determined number of seconds that the subscription will stay active before expiring. Hubs SHOULD make this equal to whatever the subscriber passed in their subscription request but MAY change the value depending on the hub's policies. If the subscriber wishes to keep a persistent subscription, they MUST re-request the subscription before this many seconds has elapsed. Hubs MUST supply this parameter for subscription requests. This parameter MAY be present for unsubscribe requests and MUST be ignored by subscribers during unsubscription.
hub.verify_token
OPTIONAL. The subscriber-provided opaque token from the corresponding subscription request, if one was provided.

The subscriber MUST confirm that the topic and verify_token correspond to a pending subscription or unsubscription that it wishes to carry out. If so, the subscriber MUST respond with an HTTP success (2xx) code with a response body equal to the challenge parameter. If the subscriber does not agree with the action, the subscriber MUST respond with a 404 "Not Found" response. The hub MUST consider other client and server response codes (3xx, 4xx, and 5xx) to mean that the subscription is not verified, meaning the hub SHOULD retry verification until a definite acknowledgement (positive or negative) is received.



 TOC 

7.  Publishing

A publisher pings the hub with the topic URL(s) which have been updated and the hub schedules those topics to be fetched and delivered. Because it's just a ping to notify the hub of the topic URL (without a payload), no authentication from the publisher is required.



 TOC 

7.1.  New Content Notification

When new content is added to a feed, a notification is sent to the hub by the publisher. The hub MUST accept a POST request to the hub URL containing the notification. This request MUST have a Content-type of application/x-www-form-urlencoded (described in Section 17.13.4 of [W3C.REC‑html401‑19991224] (Raggett, D., Hors, A., and I. Jacobs, “HTML 4.01 Specification,” December 1999.)) and the following parameters in its body:

hub.mode
REQUIRED. The literal string "publish".
hub.url
REQUIRED. The topic URL of the topic that has been updated. This field may be repeated to indicate multiple topics that have been updated.

The new content notification is a signal to the hub that there is new content available. The hub SHOULD arrange for a content fetch request (Section 7.2 (Content Fetch)) to be performed in the near future to retrieve the new content. If the publication request was accepted, the hub MUST return a 204 No Content response. If the publication request is not accepted for some reason, the hub MUST return an appropriate HTTP error response code (4xx and 5xx). Hubs MUST return a 204 No Content response even when they do not have any subscribers for all of the specified topic URLs.



 TOC 

7.2.  Content Fetch

When the hub wishes to retrieve new content for a topic, the hub sends an HTTP GET request to the topic URL. The request SHOULD include a header field X-Hub-Subscribers whose value is an integer number, possibly approximate, of subscribers on behalf of which the feed is being fetched.



 TOC 

7.3.  Content Distribution

If, after a content fetch, the hub determines that the topic feed content has changed, the hub sends information about the changes to each of the subscribers to the topic. A content distribution request is an HTTP (Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” .) [RFC2616] POST request from hub to the subscriber's callback URL. This request has a Content-Type of application/atom+xml and its request body is an Atom feed document with the list of new and changed items.

If the document represents a single feed being replicated for the subscriber, then the feed-level elements SHOULD be preserved aside from the atom:entry elements. However, the atom:id element MUST be reproduced exactly. The other atom:updated and atom:title elements required by the Atom specification SHOULD be present. Each atom:entry element in the feed contains the content from an entry in the single topic that the subscriber has an active subscription for. Essentially, in the single feed case the subscriber will receive an Atom document that looks like the original.

The response from the subscriber's callback URL MUST be an HTTP success (2xx) code. The hub MUST consider all other subscriber response codes as failures. The response body will be ignored. Hubs SHOULD retry notifications repeatedly until successful (up to some reasonable maximum).

The subscriber's callback response SHOULD include the header field X-Hub-On-Behalf-Of whose value is an integer number, possibly approximate, representing the number of subscribers on behalf of which this feed notification was delivered. This value SHOULD be aggregated by the hub across all subscribers and used to provide the X-Hub-Subscribers header to feed publishers.



 TOC 

7.4.  Aggregated Content Distribution

When the notification document represents an aggregated set of feeds, the hub SHOULD reproduce all of the elements from the source feed inside the corresponding atom:entry in the content distribution request by using an atom:source element. However, the atom:id value MUST be reproduced exactly within the source element. If the source entry does not have an atom:source element, the hub MUST create an atom:source element containing the atom:id element. The hub SHOULD also include the atom:title element and an atom:link element with rel "self" with values that are functionally equivalent to the corresponding elements in the original topic feed.

Example aggregated feed:

<?xml version="1.0"?>
<atom:feed>
  <title>Aggregated feed</title>
  <updated>2008-08-11T02:17:44Z</updated>
  <id>http://myhub.example.com/aggregated?1232427842-39823</id>

  <entry>
    <source>
      <id>http://www.example.com/foo</id>
      <link rel="self" href="http://publisher.example.com/foo.xml" />
    </source>
    <title>Testing Foo</title>
    <link href="http://publisher.example.com/foo24.xml" />
    <id>http://publisher.example.com/foo24.xml</id>
    <updated>2008-08-11T02:15:01Z</updated>
    <content>
      This is some content from the user named foo.
    </content>
  </entry>

  <entry>
    <source>
      <id>http://www.example.com/bar</id>
      <link rel="self" href="http://publisher.example.com/bar.xml" />
    </source>
    <title>Testing Bar</title>
    <link href="http://publisher.example.com/bar18.xml" />
    <id>http://publisher.example.com/bar18.xml</id>
    <updated>2008-08-11T02:17:44Z</updated>
    <content>
      Some data from the user named bar.
    </content>
  </entry>

</atom:feed>


 TOC 

8. References

[RFC2119] Bradner, B., “Key words for use in RFCs to Indicate Requirement Levels,” RFC 2119.
[RFC2606] Eastlake, D. and A. Panitz, “Reserved Top Level DNS Names,” RFC 2606.
[RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” RFC 2616.
[RFC3986] Berners-Lee, T., “Uniform Resource Identifiers (URI): Generic Syntax,” RFC 3986.
[RFC4287] Nottingham, M., Ed. and R. Sayre, Ed., “The Atom Syndication Format,” RFC 4287 (HTML).
[RSS20] Winer, D., “RSS 2.0.”
[W3C.REC-html401-19991224] Raggett, D., Hors, A., and I. Jacobs, “HTML 4.01 Specification,” World Wide Web Consortium Recommendation REC-html401-19991224, December 1999 (HTML).
[XEP-0060] Millard, P., Saint-Andre, P., and R. Meijer, “Publish-Subscribe,” XSF XEP 0060.


 TOC 

Appendix A.  Specification Feedback

Feedback on this specification is welcomed via the pubsubhubbub mailing list, pubsubhubbub@googlegroups.com. For more information, see the PubSubHubbub group on Google Groups. Also, check out the FAQ and other documentation.



 TOC 

Authors' Addresses

  Brad Fitzpatrick
  Google, Inc
Email:  brad@danga.com
  
  Brett Slatkin
  Google, Inc
Email:  bslatkin@gmail.com
  
  Martin Atkins
  Six Apart Ltd.
Email:  mart@degeneration.co.uk