Atom Feed Protocol for Metadata Harvesting 1.0

Draft 2012-11-23

This version: http://uq-eresearch-spec.github.com/atom-pmh/atom-pmh-20121123.html
Latest version: http://uq-eresearch-spec.github.com/atom-pmh
Editors: Hoylen Sue h.sue@uq.edu.au
Tim Dettrick t.dettrick@uq.edu.au

This work by The University of Queensland (ITEE eResearch Group) is licensed under a Creative Commons Attribution 3.0 Australia License.

Abstract

The Atom Feed Protocol for Metadata Harvesting (Atom-PMH) defines a protocol for publishing and harvesting metadata records. It is independent of the format used to represent the metadata records, and supports representations in multiple formats. The protocol allows servers to publish metadata records and clients to harvest the metadata records and changes to the metadata records. The protocol is based on Atom 1.0 and archived feeds. It is stateless and follows RESTful principles.

Atom-PMH uses Atom Feed Documents to publish Atom entries representing the metadata records and changes to those metadata records. The Atom entries contain one or more links to representations of the metadata record. Clients can retrieve all the metadata record by retrieving all the Atom entries and following those links.

The entire feed can be broken up into a chain of multiple Atom Feed Documents. This allows support for large and/or changing feeds. The Atom Feed Documents are ordered by time, so clients can harvest changes to the feed without need to retrieve the entire feed. An updated metadata record appears as a newer Atom entry in the feed, and a deletion Atom entry is used to indicate a metadata record has been deleted. Clients can detect changes by retrieving and processing Atom Feed Documents until an Atom Feed Document older than their previous harvest is encountered.

A variant of the protocol is defined to support servers that do not keep track of deleted metadata records. In a “complete feed,” the entire feed represents all the currently available metadata records. In a complete feed, deletion Atom entries are not used to indicate deleted metadata records. Clients can detect deleted metadata records by their absence from the complete feed.

Introduction

General introduction

The Atom Feed Protocol for Metadata Harvesting (Atom-PMH) is a mechanism for a producer to communicate to consumers the set of metadata records it has. Changes to that set (additions, modifications and deletions) are also communicated.

The provider produces a set of Atom Feed Documents linked together as an archive feed. There is a subscription document which contains the most recent entries, followed by zero or more archive documents. The Atom Feed Documents are in reverse chronological order from the newest to the oldest. The set of entries from all the Atom Feed Documents makes up the logical feed.

When a new metadata record is created, an entry is added to the logical feed. When a metadata record is modified, a new entry is added to the logical feed. When a metadata record is deleted a special deletion entry is added to the logical feed. With modifications and deletions, the previous entries can either be kept or removed from the logical feed, since the consumer knows to ignore them because they have an older timestamp than the newer entry.

Consumers start with the subscription document and process the entries in it, then continuing with the previous archive document. Since the Atom Feed Documents are in reverse chronological order, the consumer can stop processing entries when it reaches a timestamp that it has previously processed. This allows consumers to efficiently harvest changes.

The creation and update entries contain one or more alternate links to representations of the metadata record. The special deletion entry is an entry that contains no alternate links (but has an empty atom:content element to satisfy the rules of Atom 1.0.)

General examples

Entries

This is an example of an Atom Feed Document representing two metadata records.

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <updated>2012-11-01T14:00:00Z</updated>
  <id>urn:uuid:953d1150-ff9a-41c0-975b-d1fbe17c3dd8</id>
  <title>Basic example</title>

  <entry>
    <title>Test 1</title>
    <id>urn:uuid:0b116a23-9bfc-49b1-97f7-90fb012c60a4</id>
    <updated>2011-12-10T18:30:02Z</updated>
    <link rel="alternate" type="application/atom+xml" href="http://example.org/entry/t1.atom-rdc"/>
  </entry>
  
  <entry>
    <title>Test 2</title>
    <id>urn:uuid:24870a63-01ae-4fed-878b-2ec8d498cfd0</id>
    <updated>2011-12-10T15:30:02Z</updated>
    <link rel="alternate" type="application/atom+xml" href="http://example.org/entry/t2.atom-rdc"/>
  </entry>
</feed>

In this example, the URI in the two alternate links resolve to an Atom-RDC Atom Entry Document that represents the metadata record.

Representing updates and delete

If the second metadata record is modified and the first metadata record deleted, the new entries are shown in this Atom Feed Document. The atom:updated timestamp tells the consumer which ones to process.

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <updated>2012-11-01T14:00:00Z</updated>
  <id>urn:uuid:953d1150-ff9a-41c0-975b-d1fbe17c3dd8</id>
  <title>Basic example</title>

  <entry>
    <title>Test 1 (deleted)</title>
    <id>urn:uuid:0b116a23-9bfc-49b1-97f7-90fb012c60a4</id>
    <updated>2011-12-10T20:00:00Z</updated>
    <content/>
  </entry>
  
  <entry>
    <title>Test 2 (modified)</title>
    <id>urn:uuid:24870a63-01ae-4fed-878b-2ec8d498cfd0</id>
    <updated>2011-12-10T20:00:00Z</updated>
    <link rel="alternate" type="application/atom+xml" href="http://example.org/entry/t2.atom-rdc"/>
  </entry>

  <entry>
    <title>Test 1</title>
    <id>urn:uuid:0b116a23-9bfc-49b1-97f7-90fb012c60a4</id>
    <updated>2011-12-10T18:30:02Z</updated>
    <link rel="alternate" type="application/atom+xml" href="http://example.org/entry/t1.atom-rdc"/>
  </entry>
  
  <entry>
    <title>Test 2</title>
    <id>urn:uuid:24870a63-01ae-4fed-878b-2ec8d498cfd0</id>
    <updated>2011-12-10T15:30:02Z</updated>
    <link rel="alternate" type="application/atom+xml" href="http://example.org/entry/t2.atom-rdc"/>
  </entry>
</feed>

Alternatively, one/both of the the original entries can be omitted.

Archived feeds

When there are many entries, the producer can split the logical feed into multiple Atom Feed Documents. These form a linked list, using the prev-archive link.

For example, the start of the linked list is the subscription document:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <link rel="prev-archive" href="http://example.org/feed/L99"/>
  <updated>2012-11-01T14:00:00Z</updated>
  ...
</feed>

The prev-archive link resolves to the following archive document:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <link rel="prev-archive" href="http://example.org/feed/L98"/>
  <updated>2012-10-31T14:00:00Z</updated>
  ...
</feed>

The prev-archive link resolves to the following archive document:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <link rel="prev-archive" href="http://example.org/feed/L97"/>
  <updated>2012-10-30T14:00:00Z</updated>
  ...
</feed>

And so on, until the oldest archive document which does not have a prev-archive link.

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <updated>2012-01-01T14:00:00Z</updated>
  ...
</feed>

Complete feeds introduction

An alternative method for representing deletions can be used by providers that do not track deleted metadata records. All entries for that metadata record are simply removed from the logical feed and the Atom Feed Document is marked as complete with the fh:complete element.

This avoids the need for deletion entries. But it requires the entire logical feed to be entirely represented by a single Atom Feed Document. Therefore, this approach is not recommended when there are many entries.

Consumers processing this Atom Feed Document can detect deleted metadata record by the absence an entry for it.

Example of complete feeds

The previous example showing a deleted metatdata record can be represented as:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
  xmlns:fh="http://purl.org/syndication/history/1.0">
  <updated>2012-11-01T14:00:00Z</updated>
  <id>urn:uuid:953d1150-ff9a-41c0-975b-d1fbe17c3dd8</id>
  <title>Alternate delete example</title>
  
  <fh:complete/>
  
  <entry>
    <title>Test 2 (modified)</title>
    <id>urn:uuid:24870a63-01ae-4fed-878b-2ec8d498cfd0</id>
    <updated>2011-12-10T20:00:00Z</updated>
    <link rel="alternate" type="application/atom+xml" href="http://example.org/entry/t2.atom-rdc"/>
  </entry>

  <entry>
    <title>Test 2</title>
    <id>urn:uuid:24870a63-01ae-4fed-878b-2ec8d498cfd0</id>
    <updated>2011-12-10T15:30:02Z</updated>
    <link rel="alternate" type="application/atom+xml" href="http://example.org/entry/t2.atom-rdc"/>
  </entry>
</feed>

As before, the original entry for the updated metadata record can be omitted.

Concepts and terminology

Notational conventions

Conformance

The key words shall, shall not, should and should not in this specification are to be interpreted according to RFC 2119.

Namespaces

The following XML namespace prefixes are used in this specification:

atom: http://www.w3.org/2005/Atom
fh: http://purl.org/syndication/history/1.0

Layers of representation

This specification clearly distinguishes between the different layers of abstraction:

Model: A conceptual data model that is not tied to any representation. For example, the abstract model of a collection’s title.
Format: A mechanism for representation instances of the model. There can be different formats that represent the same model. For example, an instance of the title model can be represented in RIF-CS or RDF.
Protocol: A mechanism for transporting instances of a format. For example, OAI-PMH is a protocol.

The mapping between these layers is not always one to one. For example, a single Atom-RDC entry can represent more than one metadata record when it both contains metadata about a data collection and the people associated with that data collection.

Data and metadata

This specification uses the following terms to describe the different types of models:

Subject: The actual research collection data, person, activity or service.
Metadata record: Metadata that describes that subject.
Administrative metadata: Metadata that describes the metadata record. For example, when the metadata record was updated.

Metadata records

The profiles in this specification are based on a model where the provider has a set (i.e. unordered collection) of zero or more metadata records. This set will be referred to as the current pool.

The following events can occur on a metadata record:

A metadata record is added to the current pool when it is created.
The contents of a metadata record can be changed when it is modified.
A metadata record is removed from the current pool when it is deleted.

Implementation of modify and/or delete

Some providers might never modify any metadata records.

Some providers might never delete any metadata records. Most implementations that delete metadata records will keep track of them, but some basic implementations do not. Implementations that do not track deleted metadata records cannot distinguish between a record that had never been created vs one that was created and then deleted.

It is recommended that providers track deleted metadata records. This specification supports both types of implementation, but is only scalable when deletions are tracked (or deletions never occur).

A metadata record is considered deleted when all of its representations are no longer available. But it is up to the consumer to decide how to interpret when some, but not all, of the representations are no longer available. This is discussed in Processing deleted formats.

Identification of metadata records

Every metadata record is associated with a value that is a unique identifier for the purposes of this specification. An “identifier” means one value corresponds to no more than one metadata record. As a “unique identifier,” no other value (of this type) can correspond with that same metadata record. There can be other types of identifers associated with the metadata record (both non-unique identifiers and unique identifiers), but they will not be discussed in this specification. When this specification uses the term identifier it is always referring to this particular unique identifier.

Last modified time

A last modified time is associated with every metadata record and is the time of the most recent creation, modification or deletion event that occured on the metadata record.

Atom 1.0

Atom 1.0 is defined in RFC 4287 The Atom Syndication Format RFC 4287. It defines two models:

Logical feed is a model composed of feed metadata and zero or more entries. The term logical feed comes from RFC 5005, which introduces the concept of splitting up a logical feed that RFC 4287 did not have.
Entry is a model composed of entry metadata and optional entry content.

The following terms are not used in RFC 4287, but also have been introduced in this specification for clarity.

Feed metadata is a model of the metadata that describes the logical feed.
Entry metadata is a model of the metadata that describes the entry.
Entry content refers to the actual data of the entry. It is represented by the atom:content element and/or referenced as by a URI in the alternate link(s) in the entry metadata.

The RFC 4287 defines two XML vocabularies:

The Atom Feed Document is a format to represent feed metadata and (the complete set or a subset of) entries from the logical feed. It is an XML document whose root element is the atom:feed element.
The Atom Entry Document is a format to represent a single entry. It is an XML document whose root element is the atom:entry element.

In this specification, the term alternative link refers to an atom:link element whose link relationship type (as indicated by the absence of a rel attribute or its presence and value) is alternate. The term self link refers to an atom:link element whose link relationship type (as indicated the value of the rel attribute) is self. Other types of links are described in the same way.

Complete vs archive feeds

A complete feed is when the entire logical feed is entirely represented by a single Atom Feed Document.

An archived feed is when the entire logical feed is entirely represented by a set of one or more linked Atom Feed Documents as defined by RFC 5005 Feed Paging and Archiving.

With archived feeds there is always one subscription document and zero or more archive documents. The subscription document contains the latest entries and is the starting point for following the chain consisting of all the Atom Feed Documents making up the archived feed. The archive documents contain older entries.

The terms introduced in this section are defined by RFC 5005. The “paged feeds” defined by RFC 5005, are not used in this specification. This specification avoids the term “pages” to ensure that it is clear that it is always referring to archived feeds.

Do not confuse archived feeds with paged feeds. It is particularly confusing because the prev-archive links in archived feeds performs a similar function to next links in paged feeds; and next-archive links are similar to prev links. The prev links and next links are not prohibited by this specification; they are simply not used because paged feeds are not used.

Example metadata format: Atom-RDC

Atom-RDC is a format for representing metadata records. It is specified by Atom-RDC-spec and also available on Atom-RDC-Dataspace.

It uses an Atom entry to represent the metadata records. That Atom entry contains Atom elements and non-atom elements to represent the data elements of the metadata record. The Atom elements are used when it was appropriate to map the data element into the Atom elements. The non-Atom elements were introduced when such mappings were not appropriate.

This specification treats Atom-RDC as just another format for representing metadata records.

If Atom-RDC is being used with this protocol, there are two different types of Atom entries being used. The Atom-RDC entries correspond to the metadata records and the entries in this protocol correspond to administrative metadata for those metadata records.

Roles

This specification defines requirement on the following roles:

Producer: Deployment of software that makes the current pool of metadata records available using the mechanisms defined in this specification.
Consumer: Deployment of software that obtains the current pool of metadata records from a producer using the mechanisms defined in this specification.

The protocol defined in this specification is designed to allow the contents of the current pool, and the events that subsequently occur on it, to be communicated from a provider to a consumer.

Protocol details

Overview

This specification defines the use of these mechanisms:

Using a single logical feed as a model of the current pool.
Atom Feed Documents as a format for representing the logical feed and changes to that logical feed.

The delivery of those Atom Feed Documents from the provider to the consumer is out of scope for this specification. But typically they will be delivered over HTTP/HTTPS.

Logical feed

Every entry in the logical feed shall correspond with exactly one metadata record.

This metadata record could be in the current pool or a deleted metadata record that is not in the current pool.

Atom Feed Documents

Producers shall provide a set of one or more Atom Feed Documents that represents the logical feed.

These Atom Feed Document(s) shall conform to RFC 4287 RFC 4287.

These Atom Feed Document(s) shall conform to an archived feed as defined by RFC 5005.

Timestamps

Every Atom Feed Document shall have an updated timestamp later than or equal to the updated timestamp of every entry contained in it.

In all Atom Feed Documents with a prev-archive link (referrer), the prev-archive link shall resolve to an Atom Feed Document (referee).

The referee Atom Feed Document shall have an updated timestamp that is earlier than or equal to the timestamp of every entry in the referrer Atom Feed Document.

These requirements define an ordering of the Atom Feed Documents, but the individual entries within each Atom Feed Document can appear in any order.

Types of entries

The entries in the logical feed can be classified according to their purpose:

Active entries represent the value of a metadata entry currently in the current pool.
Historical entries represent the value of a metadata entry that has been modified and/or deleted. These are the result of an implementation not removing previously active entries when the metadata record is modified or deleted, which is permitted by this specification.
Deletion entries represent the deletion of a metadata record.

The entries in the logical feed function as administrative metadata. The entries do not represent the actual metadata records. The active entries contain a link to a representation of the metadata record.

Active entries

Every metadata record in the current pool shall correspond with exactly one active entry in the logical feed.

The logical feed shall not have any active entries that do not correspond to any metadata record in the current pool.

The active entries needs to satisfy all of the following:

The value of the atom:id element in the active entry shall be the identifier for the metadata record.
The value of the atom:updated element in the active entry shall be the last modified time of the metadata record.
The value of the atom:updated element in the active entry shall not be earlier than any historical entry for the metadata record.
The value of the atom:updated element in the active entry should not be the same as any historical entry for the metadata record.
The active entry shall not contain an atom:content element.
The active entry shall contain at least one atom:link element that satisfies all of these criteria:
- the link relationship type is alternate;
- the URI value resolves to a representation of the metadata record; and
- the link type is an Internet Media Type of the metadata record’s representation format.
The active entry shall not contain any atom:link element of link relationship type alternate and the URI value does not resolve to a representation of the metadata record.

RFC 4287 mandates there to be exactly one atom:id, exactly one atom:updated and exactly one atom:title element in in every entry. The entry may contain other elements, but these are not used by these profiles.

Atom 1.0 specifies that the link relationship type must be treated as alternate if the rel attribute is not present. Therefore, the atom:link element must either: have no rel attribute, or a rel attribute with a value of alternate.

No requirements are placed on the value of the mandatory atom:title element. It is suggested that a copy of an appropriate value from the metadata record be used to aid debugging. For example, the title from a collection record, or the person’s name from a party record.

Deletion entries

Deletion entries represent the deletion of a metadata record. That is, when it is removed from the current pool.

The provider shall treat every metadata record removed from the current pool in the same manner; satisfying at least one of the following deletion options for every deletion of a metadata record:

Add exactly one deletion entry corresponding to the deletion event to the logical feed; or
Remove all historical records corresponding to the metadata record from the logical feed.

The first deletion option is preferred, because it simplifies how consumers incrementally detect deletions and will not require modification of any archive documents. The second deletion option is useful for providers that do not track deleted metadata records.

If using the first deletion option, the provider should not include a fh:comlete element in any of the Atom Feed Document(s).

If using the second deletion option, the provider shall ensure the subscription document contains all the active entries and no historical entries corresponding to any deleted metadata records from the logical feed, and it shall include a fh:complete element in the subscription document.

See Complete Feeds for addition requirements relating to the use of the fh:complete element. The second deletion option is not preferred, because it prohibits the use of archive pages and therefore can be inefficient when there are many metadata records.

If used, the deletion entries needs to satisfy all of the following:

The value of the atom:id element in the deletion entry shall be the identifier for the deleted metadata record.
The value of the atom:updated element in the deletion entry shall be the last modified time of the metadata record.
The value of the atom:updated element in the deletion entry shall not be earlier than any historical entry for the metadata record.
The value of the atom:updated element in the deletion entry should not be the same as any historical entry for the metadata record.
The active entry shall not contain any atom:link element that has a link relationship type of alternate (either with or without an explicit ref attribute).
The deletion entry shall contain an atom:content element that has no src attribute and no content.

Historical entries

Historical entries represent the value of a metadata entry that has been modified and/or deleted.

Implementations can choose whether to include historical entries in their logical feed or not. If a metadata record is deleted, the impelementation can either remove the old active entry and replace it with a new active entry; or keep the old active entry as a historical entry and add a new active entry. Similarly, for when a metadata record is modified.

Every modification or deletion of a metadata record should correspond with exactly one historical entry in the logical feed. In this section, the active entry for that metadata record immediately before the event will be referred to as the old entry.

The historical entries needs to satisfy all of the following:

The value of the atom:id element in the historical entry shall be the identifier for the metadata record.
The value of the atom:updated element in the historical entry shall be same as the value of the atom:updated entry in the old entry.
The historical entry shall not contain an atom:content element.
The historical entry shall contain at least one atom:link element that has a link relationship type of alternate (either with or without an explicit ref attribute).

There are no formal requirements on the URI of the alternate link(s). If a metadata record is modified, the URI might resolve to a representation of the metadata record before it was modified or after modification. If a metadata record is deleted, the URI might resolve to representation of the metadata record before it was deleted or is unresolvable.

Complete feeds

If an Atom Feed Document contains all the active entries and no historical entries corresponding to any deleted metadata records from the logical feed, and it should include a fh:complete element.

If an Atom Feed Document does not contain all the active entries from the logical feed, or contains historical entries without corresponding deleted entries or active entries for that metadata record, it shall not include a fh:complete element.

The fh:complete element is defined in RFC 5005. Typically, it can be only be used when there is only one Atom Feed Document (i.e. only the subscription document and are no archive documents). There are other situations when it can appear. For example, when the subscription document is empty and all the active entries are all in one of the archive documents. An edge case is when the current pool contains no metadata records, since all Atom Feed Documents with no entries or only deletion entries can claim they are complete, and there can be multiple such Atom Feed Documents.

Atom-RDC

For the active entry,

If using Atom-RDC as the representation format, the URI resolves to an Atom Entry Document that conforms to Atom-RDC.
If using Atom-RDC as the representation format, the Internet Media Type is application/atom+xml.

For the Atom-RDC Atom Entry Documents:

In the Atom-RDC representing the metadata record, the self link (if present) shall be the same URI that is in the entry metadata alternate link.

Examples

Example 1: feed with archive documents

In this example, the producer provides four undeleted metadata records. The producer provides four Atom Feed Documents (one subscription document and three archive documents) and one Atom Entry Document for each metadata record.

This is the subscription document.

It is suggested that all subscription documents include a current link, but this is not mandatory. In all subscription documents where there exists at least one archive document, the prev-archive link is mandatory. But it will be prohibited when there does not exist any archive documents.

In this example, the current link resolves to this subscription document (i.e. itself); and the prev-archive link resolves to the first archive document.

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <link rel="current" href="http://example.org/feed/archived"/>
  <link rel="prev-archive" href="http://example.org/archived/2012/10/31"/>
  <updated>2012-11-01T14:00:00Z</updated>
  <id>urn:uuid:3ce05531-b9c0-4a7d-8966-4d9a9a3a0695</id>
  <title>Example 1</title>

  <entry>
    <title>Alpha data collection</title>
    <id>urn:uuid:177d5415-c443-410f-a5b6-44bf8433594f</id>
    <updated>2012-11-01T07:00:00Z</updated>
    <link rel="alternate" type="application/atom+xml" href="http://example.org/entry/0001"/>
  </entry>

</feed>

This is the first archive document.

It is suggested that all most-recent archive document include a current link and the fh:archive element; but none of these are mandatory. In all most-recent archive documents when there are more than one archive documents, the prev-archive link is always mandatory; but if there is only one archive document, the prev-archive link is prohibited.. There is no next-archive link, because this is the most recent complete archive; although another archive may be under construction, it would be an error to link to it before completion.

In this example, the current link resolves to the subscription document; and the prev-archive link resolves to the second archive document.

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:fh="http://purl.org/syndication/history/1.0">
  <link rel="current" href="http://example.org/archived"/>
  <link rel="self" href="http://example.org/feed/archived"/2012/10/31"/>
  <link rel="prev-archive" href="http://example.org/archived/2012/06/30"/>
  <updated>2012-10-31T14:00:00Z</updated>
  <id>urn:uuid:3ce05531-b9c0-4a7d-8966-4d9a9a3a0695</id>
  <title>Example 1</title>
  <fh:archive/>
  
  <entry>
    <title>Beta data collection</title>
    <id>urn:uuid:e7aca47e-76c5-4648-948b-583ffdaafa0d</id>
    <updated>2012-10-31T12:35:52Z</updated>
    <link type="application/atom+xml" href="http://example.org/entry/0002"/>
  </entry>

</feed>

This is the second archive document.

It is suggested that intermediate archive documents include a next-archive link, a current link and the fh:archive element; but none of these are mandatory. In every intermediate archive document, the prev-archive link is mandatory.

In this example, the optional current link resolves to the subscription document; the optional next-archive link resolves to the first archive document; and the mandatory prev-archive link resolves to the third archive document.

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:fh="http://purl.org/syndication/history/1.0">
  <link rel="current" href="http://example.org/archived"/>
 	  <link rel="next-archive" href="http://example.org/archived/2012/10/31"/>
  <link rel="self" href="http://example.org/feed/archived"/2012/06/30"/>
 	  <link rel="prev-archive" href="http://example.org/archived/2011/12/31"/>
  <updated>2012-03-30T14:00:00Z</updated>
  <id>urn:uuid:3ce05531-b9c0-4a7d-8966-4d9a9a3a0695</id>
  <title>Example 1</title>
  <fh:archive/>
  
  <entry>
    <title>Gamma data collection</title>
    <id>urn:uuid:fca64ec1-4984-4d34-8f02-f14a58ec5e78</id>
    <updated>2012-02-29T14:00:00Z</updated>
    <link type="application/atom+xml" href="http://example.org/entry/0003.atom"/>
    <link type="application/atom+xml" href="http://example.org/entry/0003"/>
  </entry>
  
</feed>

This is the third and earliest archive document.

It is suggested that all earliest archive document include a next-archive link, and a current link and the fh:archive element; but none of these are mandatory. In every earliest archive document, the prev-archive link is prohibited.

In this example, the optiona current link resolves to the subscription document; and the optional next-archive link resolves to the second archive document.

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:fh="http://purl.org/syndication/history/1.0">
  <link rel="current" href="http://example.org/archived"/>
 	  <link rel="next-archive" href="http://example.org/archived/2012/06/30"/>
  <link rel="self" href="http://example.org/feed/archived"/2011/12/31"/>
  <updated>2012-02-29T14:00:00Z</updated>
  <id>urn:uuid:3ce05531-b9c0-4a7d-8966-4d9a9a3a0695</id>
  <title>Example 1</title>
  <fh:archive/>
  
  <entry>
    <title>Delta data collection</title>
    <id>urn:uuid:4cee3cd0-a7a7-42c8-a6ee-74df0bd04cc4</id>
    <updated>2011-12-10T18:30:02Z</updated>
    <link rel="alternate" type="application/atom+xml" href="http://example.org/entry/0004"/>
  </entry>
  
</feed>

The URI in the entry in the subscription document resolves to this Atom-RDC conformant Atom Entry Document:

<?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom">
  <id>urn:uuid:177d5415-c443-410f-a5b6-44bf8433594f</id>
  <link rel="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
        href="http://purl.org/dc/dcmitype/Collection" title="Collection"/>
  <title type="text">Location and temperature data for estuarine
  crocodiles on Cape York Peninsula measured using acoustic telemetry</title>
  <content type="text">Estuarine crocodile location and
    temperature observations recorded using acoustic telemetry
    since 2007 on Cape York Peninsula, Queensland, Australia. The
    data comes from sensors attached to more than sixty estuarine
    crocodiles. Variables measured include location, depth,
    environmental temperature and body temperature.</content>
  <link rel="http://xmlns.com/foaf/0.1/page" href="http://example.com/index.html?page=39442"/>
  <category label="Zoology"/>
  <category term="Ecology"/>
  <author>
    <name>Dr Hamish Campbell</name>
    <email>hamish.campbell@uq.edu.au</email>
  </author>

  <link rel="self" href="http://example.org/entry/0001">
  <updated>2012-10-30T07:00:00Z</updated>
  <source>
    <author>
      <name>OzTrack System</name>
      <uri>http://oztrack.uq.edu.au</uri>
    </author>
  </source>
</entry>

The URI in the entries in all the archive documents also resolve to an Atom-RDC conformant Atom Entry Document.

The self links and fh:archive elements in the head sections are shown in this example, but they are optional.

Atom 1.0 allows other elements in the feed metadata. These are optional, but can be included to aid debugging. For example atom:generator and atom:author.

Example 2: delete with deletion entries

In this example, the producer has deleted one of the metadata records from example 1.

This is the subscription document.

It contains a deletion entry for the metadata record.

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:fh="http://purl.org/syndication/history/1.0">
  <link rel="current" href="http://example.org/feed/archived"/>
  <link rel="prev-archive" href="http://example.org/archived/2012/11/01"/>
  <updated>2012-11-02T14:00:00Z</updated>
  <id>urn:uuid:3ce05531-b9c0-4a7d-8966-4d9a9a3a0695</id>
  <title>Example 2</title>

  <entry>
    <title>Alpha data collection (deletion entry)</title>
    <id>urn:uuid:177d5415-c443-410f-a5b6-44bf8433594f</id>
    <updated>2012-11-01T23:00:00Z</updated>
    <content/>
  </entry>

</feed>

The prev-archive link resolves to the first Atom Feed Document presented in example 1. It was previously the subscription document but is now one of the archive documents.

Example 3: feed with no archive documents

In this example, the producer provides four undeleted metadata records.

There is only the subscription document. It is a complete Atom Feed Document containing alternate links to all four metadata records.

This example contains an optional self link and optional fh:complete element in the head section.

The first entry shows an atom:link element with an explicit ref attribute.

The second entry shows an atom:link element without a ref attribute, since its absence means alternate.

The third entry shows that it is possible to have multiple alternate links. This is permitted, since the only requirement is there has to be at least one of them.

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:fh="http://purl.org/syndication/history/1.0">
  <updated>2012-11-01T14:00:00Z</updated>
  <id>urn:uuid:3ce05531-b9c0-4a7d-8966-4d9a9a3a0695</id>
  <title>Example 2</title>
  <link rel="self" href="http://example.org/feed/complete/>

  <fh:complete/>
  
  <entry>
    <title>Alpha data collection</title>
    <id>urn:uuid:177d5415-c443-410f-a5b6-44bf8433594f</id>
    <updated>2012-11-01T07:00:00Z</updated>
    <link rel="alternate" type="application/atom+xml" href="http://example.org/entry/0001"/>
  </entry>

  <entry>
    <title>Beta data collection</title>
    <id>urn:uuid:e7aca47e-76c5-4648-948b-583ffdaafa0d</id>
    <updated>2012-10-31T12:35:52Z</updated>
    <link type="application/atom+xml" href="http://example.org/entry/0002"/>
  </entry>

  <entry>
    <title>Gamma data collection</title>
    <id>urn:uuid:fca64ec1-4984-4d34-8f02-f14a58ec5e78</id>
    <updated>2012-02-29T14:30:00Z</updated>
    <link type="application/atom+xml" href="http://example.org/entry/0003"/>
    <link type="application/atom+xml" href="http://example.org/entry/0003.atom"/>
  </entry>

  <entry>
    <title>Delta data collection</title>
    <id>urn:uuid:4cee3cd0-a7a7-42c8-a6ee-74df0bd04cc4</id>
    <updated>2011-12-10T18:30:02Z</updated>
    <link rel="alternate" type="application/atom+xml" href="http://example.org/entry/0004.atom"/>
    <link rel="alternate" type="application/rifcs+xml" href="http://example.org/entry/0004.rifcs"/>
    <link rel="alternate" type="application/rdf+xml" href="http://example.org/entry/0004.rdf"/>
    <link rel="alternate" type="application/xhtml+xml" href="http://example.org/entry/0004.html"/>
  </entry>

</feed>

The URI in the first entry resolves to the same Atom-RDC conformant Atom Entry Document that was presented in the example of Profile 1.

The “application/atom+xml” typed URI in the other entries also resolve to an Atom-RDC conformant Atom Entry Document.

Example 4: delete without deletion records

In this example, the producer has deleted one of the metadata records from example 2.

Instead of creating deletion entries this example uses the option of removing all historical entries, this option is possible because it has one complete Atom Feed Document.

The fh:complete element in the head section is mandatory, because deletion entries are not being used.

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:fh="http://purl.org/syndication/history/1.0">
  <updated>2012-11-01T14:00:00Z</updated>
  <id>urn:uuid:3ce05531-b9c0-4a7d-8966-4d9a9a3a0695</id>
  <title>Example 4</title>
  <link rel="self" href="http://example.org/feed/complete/>

  <fh:complete/>
  
  <entry>
    <title>Beta data collection</title>
    <id>urn:uuid:e7aca47e-76c5-4648-948b-583ffdaafa0d</id>
    <updated>2012-10-31T12:35:52Z</updated>
    <link type="application/atom+xml" href="http://example.org/entry/0002"/>
  </entry>

  <entry>
    <title>Gamma data collection</title>
    <id>urn:uuid:fca64ec1-4984-4d34-8f02-f14a58ec5e78</id>
    <updated>2012-02-29T14:30:00Z</updated>
    <link type="application/atom+xml" href="http://example.org/entry/0003"/>
    <link type="application/atom+xml" href="http://example.org/entry/0003.atom"/>
  </entry>

  <entry>
    <title>Delta data collection</title>
    <id>urn:uuid:4cee3cd0-a7a7-42c8-a6ee-74df0bd04cc4</id>
    <updated>2011-12-10T18:30:02Z</updated>
    <link rel="alternate" type="application/atom+xml" href="http://example.org/entry/0004.atom"/>
    <link rel="alternate" type="application/rifcs+xml" href="http://example.org/entry/0004.rifcs"/>
    <link rel="alternate" type="application/rdf+xml" href="http://example.org/entry/0004.rdf"/>
    <link rel="alternate" type="application/xhtml+xml" href="http://example.org/entry/0004.html"/>
  </entry>

</feed>

Example 5: update

In this example, the producer has a metadata records that has been modified.

The first entry is the historical entry and the second entry is the new active entry. They both have the same atom:id because they both correspond to the same metadata entry. The entry with the latest atom:updated timestamp is the active entry and the other entry is the historical entry.

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:fh="http://purl.org/syndication/history/1.0">
  <updated>2012-11-01T14:00:00Z</updated>
  <id>urn:uuid:3ce05531-b9c0-4a7d-8966-4d9a9a3a0695</id>
  <title>Example 5</title>

  <entry>
    <title>Beta data collection</title>
    <id>urn:uuid:e7aca47e-76c5-4648-948b-583ffdaafa0d</id>
    <updated>2012-10-31T12:35:52Z</updated>
    <link type="application/atom+xml" href="http://example.org/entry/0002"/>
  </entry>

  <entry>
    <title>Beta data collection</title>
    <id>urn:uuid:e7aca47e-76c5-4648-948b-583ffdaafa0d</id>
    <updated>2012-11-02T07:30:00Z</updated>
    <link type="application/atom+xml" href="http://example.org/entry/0002"/>
  </entry>

</feed>

It is not mandatory to keep the historical entry, so this example is equally valid if it did not include the first entry.

Implementation notes

Detecting entry type

Consumers can detect which type of entry appears in an Atom Feed Document using the following approach:

If the entry has no alternate links and an empty atom:content element without a src attribute, it is a deletion entry.
If the entry has at least one alternate link and no atom:content element, it is either an active entry or a historical entry.
- If it is the only entry for that metadata record in the logical feed or the one with the most latest timestamp, then it is the active entry.
- If there are other entries for that metadata record in the logical feed with later timestamps, then it is a historical entry.

Other types of entries are not defined by this specification.

Detecting deleted metadata records

If the consumer obtains an Atom Feed Document with the fh:complete element, it can detect that any metadata records that it has previously retrieved have been deleted if there are no active entries for it in that complete Atom Feed Document. In this situation, it cannot rely on deletion entries, because the producer might be using the second deletion option.

If the consumer obtains an Atom Feed Document without the fh:complete element, it can rely on deletion entries to detect deleted metadata records. It only needs to retrieve recent Atom Feed Documents, up to and including the time of its previously retrieved Atom Feed Document, to detect all the deleted metadata records – it does not have to retrieve the entire logical feed. In this situation, the producer is definitely not using the second deletion option.

Processing deleted formats

A metadata record can be represented by one or more formats, each represented by a different alternate link. It is possible for a modification to add or remove different formats.

The consumer can choose how to process a modification where all of the formats it needs are removed. Most consumers would treat this the same as the deletion of the metadata record.

Resolvability of URIs

The only mandatory requirement for URIs to be resolvable is with the URI in active entries.

Therefore, if a URI fails to resolve the consumer can deduce the entry is no longer an active entry. It can assume metadata record has been deleted or modified. This can occur when processing an entry for a metadata record that was deleted after the Atom Feed Document was obtained by the consumer.

A consumer needs to take into account delivery protocol semantics when determining if a URI is resolvable or not. For example, when using HTTP/HTTPS a response code of 404 (forbidden), 404 (not found) or 410 (gone) can be interpreted as a non-resolvable URI. But a response code of 401 (unauthorised) or 500 (internal server error) might not be.

Changing archive documents and URI

It is not mandatory for the contents of the archive documents to never change, nor their published URI not to change. This is because those requirements are defined as should instead of must in RFC

This specification does not change those requirements.

Consumers cannot rely on the contents of the archive documents to never change. They can also not rely on the URI that resolves to those archive documents not to change.

Size of Atom Feed Documents

There are no requirements about the number of entries in each Atom Feed Document. Therefore, an implementation is able to adjust the number as necesssary: from having one entry per document to having a single document with all the entries. Typically, the Atom Feed Documents split up the logical feed according to fixed time periods, which leads to a variable number of entries depending on how many updates occured in a given period.

Acknowledgements

This specification was produced by a project supported by the Australian National Data Service (ANDS). ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy Program and the Education Investment Fund (EIF) Super Science Initiative.