The goal of the Subversion project is to write a version control system that takes over CVS's current and future user base 1. The first release has all the major features of CVS, plus certain new features that CVS users often wish they had. In general, Subversion works like CVS, except where there's a compelling reason to be different.
So what does Subversion have that CVS doesn't?
Some of these advantages are clear and require no further discussion. Others are not so obvious, and are explained in greater detail below.
Full rename support means you can trace through ancestry by name or by entity. For example, if you say "Give me revision 12 of foo.c", do you mean revision 12 of the file whose name is now foo.c (but perhaps it was named bar.c back at revision 12), or the file whose name was foo.c in revision 12 (perhaps that file no longer exists, or has a different name now)? In Subversion, both interpretations are available to the user.
(Note: we've not yet implemented this, but it wouldn't be too hard. People are advocating switches to 'svn log' that cause history to be traced backwards either by entity or by path.)
Historically, binary files have been problematic in CVS for two
unrelated reasons: keyword expansion, and line-end conversion.
Subversion recognizes the same two types. The question is, how does it determine a file's type? Experience with CVS suggests that assuming text unless told otherwise is a losing strategy - people frequently forget to mark images and other opaque formats as binary, then later they wonder why CVS mangled their data. So Subversion will not mangle data: when moving over the network, or when being stored in the repository, it treats all files as binary. In the working copy, a tweakable meta-data property indicates whether to treat the file as text or binary for purposes of whether or not to allow contextual merging during updates.
Users can turn line-end conversion on or off per file by tweaking meta-data. Files do not undergo keyword substitution by default, on the theory that if someone wants substitution and isn't getting it, they'll look in the manual; but if they are getting it and didn't want it, they might just be confused and not know what to do. Users can turn substitution on or off per file.
Both of these changes are done on the client side; the repository does not even know about them.
Subversion is internationalized - commands, user messages, and errors can be customized to the appropriate human language at build-time (or run time, if that's not much harder).
File names and contents may be multilingual; Subversion does not assume an ASCII-only universe. For purposes of keyword expansion and line-end conversion, Subversion also understands the UTF-* encodings (but not necessarily all of them by the first release).
Subversion supports branching and tagging with one efficient operation: `clone'. To clone a tree is to copy it, to create another tree exactly like it (except that the new tree knows its ancestry relationship to the old one).
At the moment of creation, a clone requires only a small, constant amount of space in the repository - most of its storage is shared with the original tree. If you never commit anything on the clone, then it's just like a CVS tag. If you start committing on it, then it's a branch. Voila! This also implies CVS's "vendor branching" feature, since Subversion has real rename and directory support.
Subversion has a flexible log message policy (a small matter, but one dear to our hearts).
Log messages should be a matter of project policy, not version control software policy. If a user commits with no log message, then Subversion defaults to an empty message. (CVS tries to require log messages, but fails: we've all seen empty log messages in CVS, where the user committed with deliberately empty quotes. Let's stop the madness now.)
Subversion supports client-side plug-in diff programs.
There is no need for Subversion to have every possible diff mechanism built in. It can invoke a user-specified client-side diff program on the two revisions of the file(s) locally.
(Note: This feature does not exist yet, but is planned for post-1.0.)
Subversion remembers what has already been merged in and what hasn't, thereby avoiding the problem, familiar to CVS users, of spurious conflicts on repeated merges.
(Note: This feature does not exist yet, but is planned for post-1.0.)
For details, See Merging and Ancestry.
For text files, Subversion resolves conflicts similarly to CVS, by folding repository changes into the working files with conflict markers. But, for both text and binary files, Subversion also always puts the old and new pristine repository revisions into temporary files, and the pristine working copy revision in another temporary file.
Thus, for any conflict, the user has four files readily at hand:
and in a binary file conflict, the user has all but the last.
When the conflict has been resolved and the working copy is committed, Subversion automatically removes the temporary pristine files.
A more general solution would allow plug-in merge resolution tools on the client side; but this is not scheduled for the first release). Note that users can use their own merge tools anyway, since all the original files are available.
This chapter explains the user's view of Subversion -- what "objects" you interact with, how they behave, and how they relate to each other.
Suppose you are using Subversion to manage a software project. There are two things you will interact with: your working directory, and the repository.
Your working directory is an ordinary directory tree, on your local system, containing your project's sources. You can edit these files and compile your program from them in the usual way. Your working directory is your own private work area: Subversion never changes the files in your working directory, or publishes the changes you make there, until you explicitly tell it to do so.
After you've made some changes to the files in your working directory, and verified that they work properly, Subversion provides commands to publish your changes to the other people working with you on your project. If they publish their own changes, Subversion provides commands to incorporate those changes into your working directory.
A working directory contains some extra files, created and maintained by Subversion, to help it carry out these commands. In particular, these files help Subversion recognize which files contain unpublished changes, and which files are out-of-date with respect to others' work.
While your working directory is for your use alone, the repository is the common public record you share with everyone else working on the project. To publish your changes, you use Subversion to put them in the repository. (What this means, exactly, we explain below.) Once your changes are in the repository, others can tell Subversion to incorporate your changes into their working directories. In a collaborative environment like this, each user will typically have their own working directory (or perhaps more than one), and all the working directories will be backed by a single repository, shared amongst all the users.
A Subversion repository holds a single directory tree, and records the history of changes to that tree. The repository retains enough information to recreate any prior state of the tree, compute the differences between any two prior trees, and report the relations between files in the tree -- which files are derived from which other files.
A Subversion repository can hold the source code for several projects; usually, each project is a subdirectory in the tree. In this arrangement, a working directory will usually correspond to a particular subtree of the repository.
For example, suppose you have a repository laid out like this:
/trunk/paint/Makefile canvas.c brush.c write/Makefile document.c search.c
In other words, the repository's root directory has a single
subdirectory named trunk
, which itself contains two
subdirectories: paint
and write
.
To get a working directory, you must check out some subtree of the
repository. If you check out /trunk/write
, you will get a working
directory like this:
write/Makefile document.c search.c .svn/This working directory is a copy of the repository's
/trunk/write
directory, with one additional entry -- .svn
-- which holds the
extra information needed by Subversion, as mentioned above.
Suppose you make changes to search.c
. Since the .svn
directory remembers the file's modification date and original contents,
Subversion can tell that you've changed the file. However, Subversion
does not make your changes public until you explicitly tell it to.
To publish your changes, you can use Subversion's commit
command:
$ pwd /home/jimb/write $ ls -a .svn/ Makefile document.c search.c $ svn commit search.c $
Now your changes to search.c
have been committed to the
repository; if another user checks out a working copy of
/trunk/write
, they will see your text.
Suppose you have a collaborator, Felix, who checked out a working
directory of /trunk/write
at the same time you did. When you
commit your change to search.c
, Felix's working copy is left
unchanged; Subversion only modifies working directories at the user's
request.
To bring his working directory up to date, Felix can use the Subversion
update
command. This will incorporate your changes into his
working directory, as well as any others that have been committed since
he checked it out.
$ pwd /home/felix/write $ ls -a .svn/ Makefile document.c search.c $ svn update U search.c $
The output from the svn update
command indicates that Subversion
updated the contents of search.c
. Note that Felix didn't need to
specify which files to update; Subversion uses the information in the
.svn
directory, and further information in the repository, to
decide which files need to be brought up to date.
We explain below what happens when both you and Felix make changes to the same file.
A Subversion commit
operation can publish changes to any number
of files and directories as a single atomic transaction. In your
working directory, you can change files' contents, create, delete,
rename and copy files and directories, and then commit the completed set
of changes as a unit.
In the repository, each commit is treated as an atomic transaction: either all the commit's changes take place, or none of them take place. Subversion tries to retain this atomicity in the face of program crashes, system crashes, network problems, and other users' actions. We may call a commit a transaction when we want to emphasize its indivisible nature.
Each time the repository accepts a transaction, this creates a new state of the tree, called a revision. Each revision is assigned a unique natural number, one greater than the number of the previous revision. The initial revision of a freshly created repository is numbered zero, and consists of an empty root directory.
Since each transaction creates a new revision, with its own number, we can also use these numbers to refer to transactions; transaction n is the transaction which created revision n. There is no transaction numbered zero.
Unlike those of many other systems, Subversion's revision numbers apply to an entire tree, not individual files. Each revision number selects an entire tree.
It's important to note that working directories do not always correspond
to any single revision in the repository; they may contain files from
several different revisions. For example, suppose you check out a
working directory from a repository whose most recent revision is 4:
write/Makefile:4 document.c:4 search.c:4
At the moment, this working directory corresponds exactly to revision 4
in the repository. However, suppose you make a change to
search.c
, and commit that change. Assuming no other commits have
taken place, your commit will create revision 5 of the repository, and
your working directory will look like this:
write/Makefile:4 document.c:4 search.c:5Suppose that, at this point, Felix commits a change to
document.c
, creating revision 6. If you use svn update
to
bring your working directory up to date, then it will look like this:
write/Makefile:6 document.c:6 search.c:6Felix's changes to
document.c
will appear in your working copy of
that file, and your change will still be present in search.c
. In
this example, the text of Makefile
is identical in revisions 4, 5,
and 6, but Subversion will mark your working copy with revision 6 to
indicate that it is still current. So, after you do a clean update at
the root of your working directory, your working directory will
generally correspond exactly to some revision in the repository.
For each file in a working directory, Subversion records two essential pieces of information:
Given this information, by talking to the repository, Subversion can tell which of the following four states a file is in:
Subversion does not prevent two users from making changes to the same
file at the same time. For example, if both you and Felix have checked
out working directories of /trunk/write
, Subversion will allow
both of you to change write/search.c
in your working directories.
Then, the following sequence of events will occur:
search.c
first. His
commit will succeed, and his text will appear in the latest revision in
the repository.
search.c
, Subversion
will reject your commit, and tell you that you must update
search.c
before you can commit it.
search.c
, Subversion will try to merge Felix's
changes from the repository with your local changes. By default,
Subversion merges as if it were applying a patch: if your local changes
do not overlap textually with Felix's, then all is well; otherwise,
Subversion leaves it to you to resolve the overlapping
changes. In either case,
Subversion carefully preserves a copy of the original pre-merge text.
search.c
,
which now contains everyone's changes.
Some version control systems provide "locks", which prevent others from changing a file once one person has begun working on it. In our experience, merging is preferable to locks, because:
Of course, the merge process needs to be under the users' control. Patch is not appropriate for files with rigid formats, like images or executables. Subversion allows users to customize its merging behavior on a per-file basis. You can direct Subversion to refuse to merge changes to certain files, and simply present you with the two original texts to choose from. (Or, someday, you can direct Subversion to merge using a tool which respects the semantics of the file format.)
Files generally have interesting attributes beyond their contents: mime-types, executable permissions, EOL styles, and so on. Subversion attempts to preserve these attributes, or at least record them, when doing so would be meaningful. However, different operating systems support very different sets of file attributes: Windows NT supports access control lists, while Linux provides only the simpler traditional Unix permission bits.
In order to interoperate well with clients on many different operating systems, Subversion supports property lists, a simple, general-purpose mechanism which clients can use to store arbitrary out-of-band information about files.
A property list is a set of name / value pairs. A property name is an arbitrary text string, expressed as a Unicode UTF-8 string, canonically decomposed and ordered. A property value is an arbitrary string of bytes. Property values may be of any size, but Subversion may not handle very large property values efficiently. No two properties in a given a property list may have the same name. Although the word `list' usually denotes an ordered sequence, there is no fixed order to the properties in a property list; the term `property list' is historical.
Each revision number, file, directory, and directory entry in the Subversion repository, has its own property list. Subversion puts these property lists to several uses:
svn-acl
property might hold an access control list
which the Subversion server uses to regulate access to repository
files.
Property lists are versioned, just like file contents. You can change properties in your working directory, but those changes are not visible in the repository until you commit your local changes. If you do commit a change to a property value, other users will see your change when they update their working directories.
[WARNING: this section was written in May 2000, at the very beginning of the Subversion project. This functionality probably will not exist in Subversion 1.0, but it's planned for post-1.0. The problem should be reasonably solvable by recording merge data in 'properties'.]
Subversion defines merges the same way CVS does: to merge means to take a set of previously committed changes and apply them, as a patch, to a working copy. This change can then be committed, like any other change. (In Subversion's case, the patch may include changes to directory trees, not just file contents.)
As defined thus far, merging is equivalent to hand-editing the working copy into the same state as would result from the patch application. In fact, in CVS there is no difference - it is equivalent to just editing the files, and there is no record of which ancestors these particular changes came from. Unfortunately, this leads to conflicts when users unintentionally merge the same changes again. (Experienced CVS users avoid this problem by using branch- and merge-point tags, but that involves a lot of unwieldy bookkeeping.)
In Subversion, merges are remembered by recording ancestry sets. A revision's ancestry set is the set of all changes "accounted for" in that revision. By maintaining ancestry sets, and consulting them when doing merges, Subversion can detect when it would apply the same patch twice, and spare users much bookkeeping. Ancestry sets are stored as properties.
In the examples below, bear in mind that revision numbers usually refer to changes, rather than the full contents of that revision. For example, "the change A:4" means "the delta that resulted in A:4", not "the full contents of A:4".
The simplest ancestor sets are associated with linear histories. For
example, here's the history of a file A:
_____ _____ _____ _____ _____ | | | | | | | | | | | A:1 |----->| A:2 |----->| A:3 |----->| A:4 |----->| A:5 | |_____| |_____| |_____| |_____| |_____|
The ancestor set of A:5 is:
{ A:1, A:2, A:3, A:4, A:5 }
That is, it includes the change that brought A from nothing to A:1, the
change from A:1 to A:2, and so on to A:5. From now on, ranges like this
will be represented with a more compact notation:
{ A:1-5 }
Now assume there's a branch B based, or "rooted", at A:2. (This
postulates an entirely different revision history, of course, and the
global revision numbers in the diagrams will change to reflect it.)
Here's what the project looks like with the branch:
_____ _____ _____ _____ _____ _____ | | | | | | | | | | | | | A:1 |----->| A:2 |----->| A:4 |----->| A:6 |----->| A:8 |----->| A:9 | |_____| |_____| |_____| |_____| |_____| |_____| \ \ \ _____ _____ _____ \| | | | | | | B:3 |----->| B:5 |----->| B:7 | |_____| |_____| |_____|
If we produce A:9 by merging the B branch back into the trunk
_____ _____ _____ _____ _____ _____ | | | | | | | | | | | | | A:1 |----->| A:2 |----->| A:4 |----->| A:6 |----->| A:8 |---.->| A:9 | |_____| |_____| |_____| |_____| |_____| / |_____| \ | \ | \ _____ _____ _____ / \| | | | | | / | B:3 |----->| B:5 |----->| B:7 |--->-' |_____| |_____| |_____|
then what will A:9's ancestor set be?
{ A:1, A:2, A:4, A:6, A:8, A:9, B:3, B:5, B:7}
or more compactly:
{ A:1-9, B:3-7 }
(It's all right that each file's ranges seem to include non-changes; this is just a notational convenience, and you can think of the non-changes as either not being included, or being included but being null deltas as far as that file is concerned).
All changes along the B line are accounted for (changes B:3-7), and so are all changes along the A line, including both the merge and any non-merge-related edits made before the commit.
Although this merge happened to include all the branch changes, that
needn't be the case. For example, the next time we merge the B line
_____ _____ _____ _____ _____ _____ _____ | | | | | | | | | | | | | | | A:1 |-->| A:2 |-->| A:4 |-->| A:6 |-->| A:8 |-.->| A:9 |-.->|A:11 | |_____| |_____| |_____| |_____| |_____| | |_____| | |_____| \ / | \ / | \ _____ _____ _____ / _____ | \| | | | | | / | | / | B:3 |-->| B:5 |-->| B:7 |-->|B:10 |->-' |_____| |_____| |_____| |_____|
Subversion will know that A's ancestry set already contains B:3-7, so
only the difference between B:7 and B:10 will be applied. A's new
ancestry will be
{ A:1-11, B:3-10 }
But why limit ourselves to contiguous ranges? An ancestry set is truly
a set - it can be any subset of the changes available:
_____ _____ _____ _____ _____ _____ | | | | | | | | | | | | | A:1 |----->| A:2 |----->| A:4 |----->| A:6 |----->| A:8 |--.-->|A:10 | |_____| |_____| |_____| |_____| |_____| / |_____| | / | ______________________.__/ | / | | / | \ __/_ _|__ \ { } { } \ _____ _____ _____ _____ \| | | | | | | | | B:3 |----->| B:5 |----->| B:7 |----->| B:9 |-----> |_____| |_____| |_____| |_____|
In this diagram, the change from B:3-5 and the change from B:7-9 are
merged into a working copy whose ancestry set (so far) is { A:1-8 } plus any local changes. After committing, A:10's ancestry set is
{ A:1-10, B:5, B:9 }
Clearly, saying "Let's merge branch B into A" is a little ambiguous. It usually means "Merge all the changes accounted for in B's tip into A", but it might mean "Merge the single change that resulted in B's tip into A".
Any merge, when viewed in detail, is an application of a particular set of changes - not necessarily adjacent ones - to a working copy. The user-level interface may allow some of these changes to be specified implicitly. For example, many merges involve a single, contiguous range of changes, with one or both ends of the range easily deducible from context (i.e., branch root to branch tip). These inference rules are not specified here, but it should be clear in most contexts how they work.
Because each node knows its ancestors, Subversion never merges the same
change twice (unless you force it to). For example, if after the above
merge, you tell Subversion to merge all B changes into A, Subversion
will notice that two of them have already been merged, and so merge only
the other two changes, resulting in a final ancestry set of:
{ A:1-10, B:3-9 }
This description of merging and ancestry applies to both intra- and inter-repository merges. However, inter-repository merging will probably not be implemented until a future release of Subversion.
Subversion is conceptually divided into a number of separable layers.
Assuming that the programmatic interface of each layer is well-defined, it is easy to customize the different parts of the system. Contributors can write new client apps, new network protocols, new server processes, new server features, and new storage back-ends.
The following diagram illustrates the "layered" architecture, and where
each particular interface lies.
+--------------------+ | commandline or GUI | | client app | +----------+--------------------+----------+ <=== Client interface | Client Library | | | | +----+ | | | | | +-------+--------+ +--------------+--+----------+ <=== Network interface | Working Copy | | Remote | | Local | | Management lib | | Repos Access | | Repos | +----------------+ +--------------+ | Access | | neon | | | +--------------+ | | ^ | | / | | DAV / | | / | | v | | +---------+ | | | | | | | Apache | | | | | | | +---------+ | | | mod_DAV | | | +-------------+ | | | mod_DAV_SVN | | | +----------+-------------+--------------+----------+ <=== Filesystem interface | | | Subversion Filesystem | | | +--------------------------------------------------+
The Subversion client, which may be either command-line or GUI, draws on three libraries.
The working copy library, libsvn_wc
, provides an API for
managing the client's working copy of a project. This includes
operations like renaming or removal of files, patching files,
extracting local diffs, and routines for maintaining administrative
files in the .svn/
directory.
The repository_access library, libsvn_ra
, provides an API for
exchanging information with a Subversion repository. This includes
the ability to read files, write new revisions of files, and ask the
repository to compare a working copy against its latest revision.
Note that there are two implementations of this interface: one
designed to talk to a repository over a network, and one designed to
work with a repository on local disk. Any number of interface
implementations can exist.
The client library, libsvn_client
provides general client
functions such as update()
and commit()
, which may
involve one or both of the other two client libraries.
libsvn_client
should, in theory, provide an API that allows
anyone to write a Subversion client application.
For details, See Client.
The network layer's job is to move the repository API requests over a wire.
On the client side, a network library (libneon
) translates
these requests into a set of HTTP WebDAV/DeltaV requests. The
information is sent over TCP/IP to an Apache server. Apache is used
for the following reasons:
Our rationale is that any attempt to write a dedicated "Subversion server" (with a "Subversion protocol") would inevitably end up evolving towards Apache's already-existing feature set. (However, Subversion's layered architecture certainly doesn't prevent anyone from writing a totally new network access implementation.)
An Apache module (mod_dav_svn
) translates the DAV requests into
API calls against a particular repository.
For details, See Protocol.
When the requests reach a particular repository, they are interpreted
by the Subversion Filesystem library, libsvn_fs
. The
Subversion Filesystem is a custom Unix-like filesystem, with a twist:
writes are revisioned and atomic, and no data is ever deleted! This
filesystem is currently implemented on top of a normal filesystem,
using Berkeley DB files.
For a more detailed explanation: See Server.
Subversion uses three kinds of deltas:
Tree deltas can also carry ancestry information, indicating how the files in one tree are related to files in the other tree. And deltas can describe changes to file meta-information, like permission bits, creation dates, and so on. The repository and working copy use deltas to communicate changes.
The term delta without qualification generally means a tree delta, unless some other meaning is clear from context.
In the examples below, deltas will be described in XML, which happens to be Subversion's (now mostly defunct) import/export patch format. However, note that deltas are an abstract data structure, of which the XML format is merely one representation. Later, we will describe other representations: for example, there is a serialized representation (useful for streaming protocols, among other things), and a db-style representation, used for repository storage. The various representations of a given delta are (in theory, anyway) perfectly isomorphic to one another, since they describe the same underlying structure.
A text delta describes the difference between two strings of bytes, the source string and the target string. Given a source string and a target string, we can compute a text delta; given a source string and a delta, we can reconstruct the target string. However, note that deltas are not invertible: you cannot always reconstruct the source string given the target string and delta.
The standard Unix "diff" format is one possible representation for text deltas; however, diffs are not ideal for internal use by a revision control system, for several reasons:
Instead, Subversion uses the VDelta binary-diffing algorithm, as described in Hunt, J. J., Vo, K.-P., and Tichy, W. F. An empirical study of delta algorithms. Lecture Notes in Computer Science 1167 (July 1996), 49-66. Currently, the output of this algorithm is stored in a custom data format called svndiff, invented by Greg Hudson <ghudson@mit.edu>, a Subversion developer.
The concrete form of a text delta is a well-formed XML element, having
the following form:
<text-delta>data</text-delta>Here, data is the raw svndiff data, encoded in the MIME Base64 format.
A property delta describes changes to a property list, of the sort associated with files, directories, and directory entries, and revision numbers (see Properties). A property delta can record creating, deleting, and changing the text of any number of properties.
A property delta is an unordered set of name/change pairs. No two pairs within a given property delta have the same name. A pair's name indicates the property affected, and the change indicates what happens to its value. There are two kinds of changes:
set value
delete
At the moment, the set
command can either create or change a
property value. However, this simplification means that the server
cannot distinguish between a client which believes it is creating a
value afresh, and a client which believes it is changing the value of an
existing property. It may simplify conflict detection to divide
set
into two separate add
and change
operations.
In the future, we may add a text-delta
change, which specifies a
change to an existing property's value as a text delta. This would give
us a compact way to describe small changes to large property values.
The concrete form of a property delta is a well-formed XML element,
having the following form:
<property-delta>change...</property-delta>Each change in a property delta has one of the following forms:
<set name='name'>value</set> <delete name='name'/>The name attribute of a
set
or delete
element gives
the name of the property to change. The value of a set
element gives the new value of the property.
If either the property name or the property value contains the
characters &
, <
, or '
, they should be replaced with
the sequences &
, <
, or '
, respectively.
A tree delta describes changes between two directory trees, the source tree and the target tree. Tree deltas can describe copies, renames, and deletions of files and directories, changes to file contents, and changes to property lists. A tree delta can also carry information about how the files in the target tree are derived from the files in the source tree, if this information is available.
The format for tree deltas described here is easy to compute from a Subversion working directory, and easy to apply to a Subversion repository. Furthermore, the size of a tree delta in this format is independent of the commands used to produce the target tree -- it depends only on the degree of difference between the source and target trees.
A tree delta is interpreted in the context of three parameters:
A tree delta is a list of changes of the form
<tree-delta>change...</tree-delta>which describe how to edit the contents of source-dir to yield target-dir. There are three kinds of changes:
<delete name='name'/>
<add name='name'>content</add>
<open name='name'>content</open>
tree-delta
element indicates that target-dir is identical
to source-dir.
In the change descriptions above, each content takes one of the following forms:
<file ancestor>prop-delta text-delta</file>
Prop-delta is a property delta describing how f's properties differ from that ancestor; it may be omitted, indicating that the properties are unchanged.
Text-delta is a text delta describing how to construct f
from that ancestor; it may also be omitted, indicating that f's
text is identical to its ancestor's.
<file ancestor/>
<file ancestor></file>
-- a file
element with no property or text delta, thus describing a file identical
to its ancestor.
<directory ancestor>prop-delta tree-delta</directory>
Prop-delta is a property delta describing how sub's properties differ from that ancestor; it may be omitted, indicating that the properties are unchanged.
Tree-delta describes how to construct sub from that
ancestor; it may be omitted, indicating that the directory is identical
to its ancestor. Tree-delta should be interpreted with a new
target-dir of target-dir/name
.
Since tree-delta is itself a complete tree delta structure, tree
deltas are themselves trees, whose structure is a subgraph of the target
tree.
<directory ancestor/>
<directory ancestor></directory>
-- a
directory element with no property or tree delta, thus describing a
directory identical to its ancestor.
The content of a add
or open
tag may also contain
a property delta, describing changes to the properties of that
directory entry.
In the file
and directory
elements described above, each
ancestor has one of the following forms:
ancestor='path'
source-root/path
, in revision. When this
appears as an attribute of a file
element, the element's text
delta should be applied to source-root/path
. When
this appears as an attribute of a directory
element,
source-root/path
should be the new source-dir
for interpreting that element's tree delta.
new='true'
nothing
ancestor
nor a new
attribute is given, this
is an abbreviation for ancestor='source-dir/name'
,
with the same revision number. This makes the common case -- files or
directories modified in place -- more compact.
If the ancestor spec is not new='true'
, it may also contain
the text revision='rev'
, indicating a new value for
revision, in which we should find the ancestor.
If a filename or path appearing as a name or path in the
description above contains the characters &
, <
, or
'
, they should be replaced with the sequences &
,
<
, or '
, respectively.
Suppose we have the following source tree:
/dir1/file1 file2 dir2/file3 file4 dir3/file5 file6
If we edit the contents of /dir1/file1
, we can describe the
effect on the tree with the following tree delta, to be applied to the
root:
<tree-delta> <open name='dir1'> <directory> <tree-delta> <open name='file1'> <file>text-delta</file> </open> </tree-delta> </directory> </open> </tree-delta>The outer
tree-delta
element describes the changes made to the root
directory. Within the root directory, there are changes in dir1
,
described by the nested tree-delta
. Within /dir1
, there are
changes in file1
, described by the text-delta.
If we had edited both /dir1/file1
and /dir1/file2
, then
there would simply be two open
elements in the inner
tree-delta
.
As another example, starting from the same source tree, suppose we
rename /dir1/file1
to /dir1/file8
:
<tree-delta> <open name='dir1'> <directory> <tree-delta> <delete name='file1'/> <add name='file8'> <file ancestor='/dir1/file1'/> </add> </tree-delta> </directory> </open> </tree-delta>As above, the inner
tdelta
describes how /dir1
has
changed: the entry for /dir1/file1
has disappeared, but there is
a new entry, /dir1/file8
, which is derived from and textually
identical to /dir1/file1
in the source directory. This is just
an indirect way of describing the rename.
Why is it necessary to be so indirect? Consider the delta representing the result of:
/dir1/file1
to /dir1/tmp
,
/dir1/file2
to /dir1/file1
, and
/dir1/tmp
to /dir1/file2
file1
and file2
):
<tree-delta> <open name='dir1'> <directory> <tree-delta> <open name='file1'> <file ancestor='/dir1/file2'/> </open> <open name='file2'> <file ancestor='/dir1/file1'/> </open> </tree-delta> </directory> </open> </tree-delta>The indirectness allows the tree delta to capture an arbitrary rearrangement without resorting to temporary filenames.
Another example, starting from the same source tree:
/dir1/dir2
to /dir1/dir4
,
/dir1/dir3
to /dir1/dir2
, and
file3
from /dir1/dir4 to /dir1/dir2.
file3
's path has remained the same, even though the
directories around it have changed. Here is the tree delta:
<tree-delta> <open name='dir1'> <directory> <tree-delta> <open name='dir2'> <directory ancestor='/dir1/dir3'> <tree-delta> <add name='file3'> <file ancestor='/dir1/dir2/file3'/> </add> </tree-delta> </directory> </open> <delete name='dir3'/> <add name='dir4'> <directory ancestor='/dir1/dir2'> <tree-delta> <delete name='file3'/> </tree-delta> </directory> </add> </tree-delta> </directory> </open> </tree-delta>In other words:
/dir1
has changed;
/dir1/dir2
is derived from the old
/dir1/dir3
, and contains a new entry file3
, derived from
the old /dir1/dir2/file3
;
/dir1/dir3
; and
/dir1/dir4
is derived from the old
/dir1/dir2
, except that its entry for file3
is now gone.
Some more possible maneuvers, left as exercises for the reader:
dir2
, and then create a file named dir2
.
/dir1/dir2
to /dir1/dir4
; move file2
into
/dir1/dir4
; and move file3
into /dir1/dir3.
dir2
into dir3
, and move dir3
into /
.
It is sometimes useful to represent a set of changes to a tree without providing text deltas in the middle of the stream. Text deltas are often large and expensive to compute, and tree deltas can be useful without them. For example, one can detect whether two changes might conflict -- whether they change the same file, for example -- without knowing exactly how the conflicting files changed.
For this reason, our XML representation of a tree delta allows the text
deltas to come after the </tree-delta> closure. This allows the
client to receive early notice of conflicts: during a svn commit
command, the client sends a tree-delta to the server, which can check
for skeletal conflicts and reject the commit, before the client takes the
time to transmit the (possibly large) textual changes. This potentially
saves quite a bit of network traffic.
In terms of XML, postfix text deltas are split into two parts. The
first part appears "in-line" and contains a reference ID. The second
part appears after the tree delta is complete. Here's an example:
<tree-delta> <open name="foo.c"> <file> <text-delta-ref id="123"> </file> </open> <add name="bar.c"> <file> <text-delta-ref id="456"> </file> </add> </tree-delta> <text-delta id="123">data</text-delta> <text-delta id="456">data</text-delta>
The static XML forms above are useful as an import/export format, and as
a visualization aid, but we also need a way to express a delta as a
series of operations, to implement directory tree diffing and
patching. Subversion defines a standard set of such operations in the
vtable svn_delta_edit_fns_t
, a set of function prototypes which
anyone may implement (see svn_delta.h
).
Each function in an instance of svn_delta_editor_t
(colloquially known as an editor) implements some distinct subtask
of editing a directory tree. In fact, if you compare the editor
function prototypes to the XML elements described previously, you'll
notice a fairly strict correspondence: there's one function for
replacing a directory, another function for replacing a file, one for
adding a directory, another for adding a file, a function for deleting,
and so on.
Although the editor interface was designed around the general idea of making changes to a directory tree, a specific implementation's behavior depends on its role. For example, the versioning filesystem library offers an editor that creates new revisions, while the working copy library offers an editor that updates working copies. And the network layer offers an editor that turns editing calls into wire protocol, which is then converted back into editing calls on the other side! All of these different tasks can share a single interface, because they are all fundamentally about the same thing: expressing and applying differences between directory trees.
Like the XML forms, a series of editor calls must follow certain nesting conventions; these conventions are implicit in the interface, in that some of the functions take arguments that can only be obtained from previous calls to other editor functions.
Editors can best be understood by watching one work on a real directory tree. For example:
Suppose that the user has made a number of local changes to her working
copy and wants to commit them to the repository. Let's represent her
changes with the same tree-delta from a previous example. Notice that
she has also made textual modifications to file3
; hence the
in-line <text-delta>
:
<tree-delta> <open name='dir1'> <directory> <tree-delta> <open name='dir2'> <directory ancestor='/dir1/dir3'> <tree-delta> <add name='file3'> <file ancestor='/dir1/dir2/file3'> <text-delta>data</text-delta> </file> </add> </tree-delta> </directory> </open> <delete name='dir3'/> <add name='dir4'> <directory ancestor='/dir1/dir2'> <tree-delta> <delete name='file3'/> </tree-delta> </directory> </add> </tree-delta> </directory> </open> </tree-delta>
So how does the client send this information to the server?
In a nutshell: the tree-delta is streamed over the network, as a series of individual commands given in depth-first order.
Let's be more specific. The server presents the client with an object
of type struct svn_delta_edit_fns_t
, colloquially known as an
editor. An editor is really just table of functions; each
function makes a change to a filesystem. Agent A (who has a private
filesystem) presents an editor to agent B. Agent B then calls the
editor's functions to change A's filesystem. B is said to be
driving the editor.
As Karl Fogel likes to describe the process, if one thinks of the tree-delta as a lion, the editor is a "hoop" that the lion jumps through - each portion of the lion being decomposed through time.
B cannot call the functions in any willy-nilly order; there are some logical restrictions. In particular, as B drives the editor, it receives opaque data structures which represent directories and files. It must use and pass these structures, known as batons, to make further function calls.
As an example, let's watch how the client would transmit the above
tree-delta to the repository. (The description below is slightly
simplified. For exact interface details, see
subversion/include/svn_delta.h
.)
[Note: in the examples below, and throughout Subversion's code base, you'll see references to 'baton' objects. This is simply a project convention, a name given to structures that define contexts for functions. Many APIs call these structures 'userdata'. In Subversion, we like the term 'baton', because it reminds us of one function "handing off" context to another function.]
root_baton = editor->open_root();
The client now has an opaque object, root_baton, which represents the root of the repository's filesystem.
dir1_baton = editor->open_dir("dir1", root_baton);
Notice that root_baton gives the client free license to make any
changes it wants in the repository's root directory - until, of course,
it calls editor->close_dir(root_baton)
.
The first change made was a replacement of dir1
. In return, the
client now has a new opaque data structure that can be used to change
dir1
.
dir2_baton = editor->open_dir("dir2", "/dir1/dir3", dir1_baton);
The dir1_baton is now used to open dir2
with a
directory whose ancestor is /dir1/dir3
.
file_baton = editor->add_file("file3", "/dir1/dir2/file3", dir2_baton);
Edits are now made to dir2
(using dir2_baton). In
particular, a new file is added to this directory whose ancestor is
/dir1/dir2/file3
.
window_handler = editor->apply_textdelta(file_baton);
Text-deltas themselves, for network efficiency, are streamed in "chunks". So instead of receiving a baton object, we now have a routine that is able to receive any number of small "windows" of text-delta data.
We won't go into the details of the svn_txdelta_*
functions right
here; but suffice it to say that these routines are used for sending
svndiff data to the window_handler routine.
editor->close_file(file_baton);
The client is done sending the file's text-delta, so it releases the file baton.
editor->close_dir(dir2_baton));
The client is done making changes to dir2
, so it releases its
baton as well.
dir1
, however; it makes two
more edits:
editor->delete_item("dir3", dir1_baton);
dir4_baton = editor->add_dir("dir4", "/dir1/dir2", dir1_baton);
(The function's name is delete_item
rather than
delete
to avoid gratuitous incompatibility with C++, where
delete
is a reserved keyword.)
dir4
(whose ancestry is /dir1/dir2
),
the client removes a file:
editor->delete_item("file3", dir4_baton);
dir4
, as well as its parent
dir1
:
editor->close_dir(dir4_baton);
editor->close_dir(dir1_baton);
editor->close_dir(root_baton);
Of course, at any point above, the repository may reject an edit. If this is the case, the client aborts the transmission and the repository hasn't changed a bit. (Thank goodness for transactions!)
Note, however, that this "editor interface" works in the other direction as well. When the repository wishes to update a client's working copy, it is the client's reponsibility to give a custom editor-object to the server, and the server is the editor-driver.
Here are the main advantages of this interface:
Whatever the case, it's easy to "swap" editors around, and make client and server do new and interesting things.
The Subversion client is built on three libraries. One operates
strictly on the working copy and does not talk to the repository.
Another talks to the repository but never changes the working copy. The
third library uses the first two to provide operations such as
commit
and update
- operations which need to both talk to
the repository and change the working copy.
The initial client is a Unix-style command-line tool (like standard CVS), but it should be easy to write a GUI client as well, based on the same libraries. The libraries capture the core Subversion functionality, segregating it from user interface concerns.
This chapter describes the libraries, and the physical layout of working copies.
Working copies are client-side directory trees containing both versioned data and Subversion administrative files. The functions in the working copy management library are the only functions in Subversion which operate on these trees.
This section gives an overview of how working copies are arranged physically, but is not a full specification of working copy layout.
As with CVS, Subversion working copies are simply directory trees with
special administrative subdirectories, in this case named ".svn" instead
of "CVS":
myproj / | \ _____________/ | \______________ / | \ .svn src doc ___/ | \___ /|\ ___/ \___ | | | / | \ | | base ... ... / | \ myproj.texi .svn / | \ ___/ | \___ ____/ | \____ | | | | | | base ... ... .svn foo.c bar.c | ___/ | \___ | | | | | base ... ... myproj.texi ___/ \___ | | foo.c bar.c
Each dir/.svn/
directory records the files in dir
, their
revision numbers and property lists, pristine revisions of all the files
(for client-side delta generation), the repository from which dir
came, and any local changes (such as uncommitted adds, deletes, and
renames) that affect dir
.
Although it would often be possible to deduce certain information (such as the original repository) by examining parent directories, this is avoided in favor of making each directory be as much a self-contained unit as possible.
For example, immediately after a checkout the administrative information for the entire working tree could be stored in one top-level file. But subdirectories instead keep track of their own revision information. This would be necessary anyway once the user starts committing new revisions for particular files, and it also makes it easier for the user to prune a big, complete tree into a small subtree and still have a valid working copy.
The .svn subdir contains:
format
file, which indicates which version of the working copy
adm format this is (so future clients can be backwards compatible
easily).
text-base
directory, containing the pristine repository
revisions of the files in the corresponding working directory
entries
file, which holds revision numbers and other
information for this directory and its files, and records the presence
of subdirs. It also contains the repository URLs that each file and
directory came from.
It may help to think of this file as the functional equivalent of the CVS/Entries file.
props
directory, containing property names and values for each
file in the working directory.
prop-base
directory, containing pristine property names and
values for each file in the working directory.
dir-props
file, recording properties for this directory.
dir-prop-base
file, recording pristine properties for this
directory.
lock
file, whose presence implies that some client is currently
operating on the administrative area.
tmp
directory, for holding scratch-work and helping make
working copy operations more crash-proof.
log
file. If present, indicates a list of actions that need to
be taken to complete a working-copy-operation that is still "in
progress".
You can read much more about these files in the file
subversion/libsvn_wc/README
.
This library performs "offline" operations on the working copy, and
lives in subversion/libsvn_wc/
.
The API for libsvn_wc is always evolving; please read the header
file for a detailed description: subversion/include/svn_wc.h
.
This library performs operations involving communication with the repository.
The interface defined in subversion/include/svn_ra.h
provides a
uniform interface to both local and remote repository access.
Specifically, libsvn_ra_dav will provide this interface and speak to repositories using DAV requests. At some future point, another library libsvn_ra_local will provide the same interface - but will link directly to the filesystem library for accessing local disk repositories.
These functions correspond to user-level client commands. In theory, any client interface (command-line, GUI, emacs, Python, etc.) should be able to link to libsvn_client and have the ability to act as a full-featured Subversion client.
Again, the detailed API can be found in
subversion/include/svn_client.h
.
The Subversion client library libsvn_ra_dav
uses the Neon
library to generate WebDAV DeltaV requests and sends them to a
"Subversion-aware" Apache server.
This Apache server is running mod_dav and mod_dav_svn, which translates the requests into Subversion filesystem calls.
For more info, See Network Layer.
For a detailed description of exactly how Greg Stein <gstein@lyra.org> is mapping the WebDAV DeltaV spec to Subversion, see his paper: http://svn.collab.net/repos/svn/trunk/www/webdav-usage.html
For more information on WebDAV and the DeltaV extensions, see http://www.webdav.org and http://www.webdav.org/deltav.
For more information on Neon, see http://www.webdav.org/neon.
The term "server" is ambiguous, because it has at least two different meanings: it can refer to a powerful computer which offers services to users on a network, or it can refer to a CPU process designed to receive network requests.
In Subversion, however, the server is just a set of libraries that implements repositories and makes them available to other programs. No networking is required.
There are two main libraries: the Subversion Filesystem library, and the Subversion Repository library.
This library implements a hierarchical filesystem which supports atomic changes to directory trees, and records a complete history of the changes. In addition to recording changes to file and directory contents, the Subversion Filesystem records changes to file meta-data (see discussion of properties in Model).
There are two main files that describe the Subversion filesystem.
First, read the section below (See Repository Structure.) for a general overview of how the filesystem works.
Once you've done this, read Jim Blandy's own structural overview, which
explains how nodes and revisions are organized (among other things) in
the filesystem implementation: subversion/libsvn_fs/structure
.
Finally, read the well-documented API in
subversion/include/svn_fs.h
.
To begin, please be sure that you're already casually familiar with Subversion's ideas of files, directories, and revision histories. If not, See Model. We can now offer precise, technical descriptions of the terms introduced there.
A text string is a string of Unicode characters which is canonically decomposed and ordered, according to the rules described in the Unicode standard.
A string of bytes is what you'd expect.
A property list is an unordered list of properties. A property is a pair
(name, value)
, where name is a text string, and value is a string of bytes. No two properties in a property list have the same name.A file is a property list and a string of bytes.
A node is either a file or a directory. (We define a directory below.) Nodes are distinguished unions -- you can always tell whether a node is a file or a directory.
A node table is an array mapping some set of positive integers, called node numbers, onto nodes. If a node table maps some number i to some node n, then i is a valid node number in that table, and node i is n. Otherwise, i is an invalid node number in that table.
A directory entry is a triple
(name, props, node)
, where name is a text string, props is a property list, and node is a node number.A directory is an unordered list of directory entries, and a property list.
A revision is a node number and a property list.
A history is an array of revisions, indexed by a contiguous range of non-negative integers containing 0.
A repository consists of node table and a history.
Now that we've explained the form of the data, we make some restrictions on that form.
Every revision has a root directory. Every revision's node number is a valid node number, and the node it refers to is always a directory. We call this the revision's root directory.
Revision 0 always contains an empty root directory. This baseline makes it easy to check out whole projects from the repository.
Directories contain only valid links. Every directory entry's node is a valid node number.
Directory entries can be identified by name. For any directory d, every directory entry in d has a distinct name.
There are no cycles of directories. No node is its own child.
Directories can have more than one parent. The Unix file system does not allow more than one hard link to a directory, but Subversion does allow the analogous situation. Thus, the directories in a Subversion repository form a directed acyclic graph (DAG), not a tree. However, it would be distracting and unhelpful to replace the familiar term "directory tree" with the unfamiliar term "directory DAG", so we still call it a "directory tree" here.
There are no dead nodes. Every node is a child of some revision's root directory.
This section provides a conversational explanation of how the repository actually stores and revisions file trees. It's not critical knowledge for a programmer using the Subversion Filesystem API, but most people probably still want to know what's going on "under the hood" of the repository.
Suppose we have a new project, at revision 1, looking like this (using
CVS syntax):
prompt$ svn checkout myproj U myproj/ U myproj/B U myproj/A U myproj/A/fish U myproj/A/fish/tuna prompt$
Only the file tuna
is a regular file, everything else in myproj is
a directory.
Let's see what this looks like as an abstract data structure in the repository, and how that structure works in various operations (such as update, commit, and branch).
In the diagrams that follow, lines represent parent-to-child connections in a directory hierarchy. Boxes are "nodes". A node is either a file or a directory - a letter in the upper left indicates which kind. A file node has a byte-string for its content, whereas directory nodes have a list of dir_entries, each pointing to another node.
Parent-child links go both ways (i.e., a child knows who all its parents are), but a node's name is stored only in its parent, because a node with multiple parents may have different names in different parents.
At the top of the repository is an array of revision numbers,
stretching off to infinity. Since the project is at revision 1, only
index 1 points to anything; it points to the root node of revision 1 of
the project:
( myproj's revision array ) ______________________________________________________ |___1_______2________3________4________5_________6_____... | | ___|_____ |D | | | | A | /* Two dir_entries, `A' and `B'. */ | \ | | B \ | |__/___\__| / \ | \ | \ ___|___ ___\____ |D | |D | | | | | | | | fish | /* One dir_entry, `fish'. */ |_______| |___\____| \ \ ___\____ |D | | | | tuna | /* One dir_entry, `tuna'. */ |___\____| \ \ ___\____ |F | | | | | /* (Contents of tuna not shown.) */ |________|
What happens when we modify tuna
and commit? First, we make a
new tuna
node, containing the latest text. The new node is not
connected to anything yet, it's just hanging out there in space:
________ |F | | | | | |________|
Next, we create a new revision of its parent directory:
________ |D | | | | tuna | |___\____| \ \ ___\____ |F | | | | | |________|
We continue up the line, creating a new revision of the next parent
directory:
________ |D | | | | fish | |___\____| \ \ ___\____ |D | | | | tuna | |___\____| \ \ ___\____ |F | | | | | |________|
Now it gets more tricky: we need to create a new revision of the root
directory. This new root directory needs an entry to point to the
"new" directory A, but directory B hasn't changed at all. Therefore,
our new root directory also has an entry that still points to the
old directory B node!
______________________________________________________ |___1_______2________3________4________5_________6_____... | | ___|_____ ________ |D | |D | | | | | | A | | A | | \ | | \ | | B \ | | B \ | |__/___\__| |__/___\_| / \ / \ | ___\_____________/ \ | / \ \ ___|__/ ___\____ ___\____ |D | |D | |D | | | | | | | | | | fish | | fish | |_______| |___\____| |___\____| \ \ \ \ ___\____ ___\____ |D | |D | | | | | | tuna | | tuna | |___\____| |___\____| \ \ \ \ ___\____ ___\____ |F | |F | | | | | | | | | |________| |________|
Finally, after all our new nodes are written, we finish the "bubble
up" process by linking this new tree to the next available revision in
the history array. In this case, the new tree becomes revision 2 in the
repository.
______________________________________________________ |___1_______2________3________4________5_________6_____... | \ | \__________ ___|_____ __\_____ |D | |D | | | | | | A | | A | | \ | | \ | | B \ | | B \ | |__/___\__| |__/___\_| / \ / \ | ___\_____________/ \ | / \ \ ___|__/ ___\____ ___\____ |D | |D | |D | | | | | | | | | | fish | | fish | |_______| |___\____| |___\____| \ \ \ \ ___\____ ___\____ |D | |D | | | | | | tuna | | tuna | |___\____| |___\____| \ \ \ \ ___\____ ___\____ |F | |F | | | | | | | | | |________| |________|
Generalizing on this example, you can now see that each "revision" in the repository history represents a root node of a unique tree (and an atomic commit to the whole filesystem.) There are many trees in the repository, and many of them share nodes.
Many nice behaviors come from this model:
foo.c
, it need only traverse the repository's
history, locate revision X's root node, then walk down the tree to
foo.c
.
Let's demonstrate the last point by renaming the tuna
to
book
.
We start by creating a new parent "fish" directory, except that this
parent directory has a different dir_entry, one which points the
same old file node, but has a different name:
______________________________________________________ |___1_______2________3________4________5_________6_____... | \ | \__________ ___|_____ __\_____ |D | |D | | | | | | A | | A | | \ | | \ | | B \ | | B \ | |__/___\__| |__/___\_| / \ / \ | ___\_____________/ \ | / \ \ ___|__/ ___\____ ___\____ |D | |D | |D | | | | | | | | | | fish | | fish | |_______| |___\____| |___\____| \ \ \ \ ___\____ ___\____ ________ |D | |D | |D | | | | | | | | tuna | | tuna | | book | |___\____| |___\____| |_/______| \ \ / \ \ / ___\____ ___\____ / |F | |F | | | | | | | | | |________| |________|
From here, we finish with the bubble-up process. We make new parent
directories up to the top, culminating in a new root directory with two
dir_entries (one points to the old "B" directory node we've had all
along, the other to the new revision of "A"), and finally link the new
tree to the history as revision 3:
______________________________________________________ |___1_______2________3________4________5_________6_____... | \ \_________________ | \__________ \ ___|_____ __\_____ __\_____ |D | |D | |D | | | | | | | | A | | A | | A | | \ | | \ | | \ | | B \ | | B \ | | B \ | |__/___\__| |__/___\_| |__/___\_| / ___________________/_____\_________/ \ | / ___\_____________/ \ \ | / / \ \ \ ___|/_/ ___\____ ___\____ _____\__ |D | |D | |D | |D | | | | | | | | | | | | fish | | fish | | fish | |_______| |___\____| |___\____| |___\____| \ \ \ \ \ \ ___\____ ___\____ ___\____ |D | |D | |D | | | | | | | | tuna | | tuna | | book | |___\____| |___\____| |_/______| \ \ / \ \ / ___\____ ___\____ / |F | |F | | | | | | | | | |________| |________|
For our last example, we'll demonstrate the way "tags" and "branches" are implemented in the repository.
In a nutshell, they're one and the same thing. Because nodes are so easily shared, we simply create a new directory entry that points to an existing directory node. It's an extremely cheap way of copying a tree; we call this new entry a clone, or more colloquially, a "cheap copy".
Let's go back to our original tree, assuming that we're at revision 6 to
begin with:
______________________________________________________ ...___6_______7________8________9________10_________11_____... | | ___|_____ |D | | | | A | | \ | | B \ | |__/___\__| / \ | \ | \ ___|___ ___\____ |D | |D | | | | | | | | fish | |_______| |___\____| \ \ ___\____ |D | | | | tuna | |___\____| \ \ ___\____ |F | | | | | |________|
Let's "tag" directory A. To make the clone, we create a new dir_entry
T in our root, pointing to A's node:
______________________________________________________ |___6_______7________8________9________10_________11_____... | \ | \ ___|_____ __\______ |D | |D | | | | | | A | | A | | \ | | | | | B \ | | B | T | |__/___\__| |_/__|__|_| / \ / | | | ___\__/ / / | / \ / / ___|__/ ___\__/_ / |D | |D | | | | | | | | fish | |_______| |___\____| \ \ ___\____ |D | | | | tuna | |___\____| \ \ ___\____ |F | | | | | |________|
Now we're all set. In the future, the contents of directories A and B may change quite a lot. However, assuming we never make any changes to directory T, it will always point to a particular pristine revision of directory A at some point in time. Thus, T is a tag.
(In theory, we can use some kind of authorization system to prevent anyone from writing to directory T. In practice, a well-laid out repository should encourage "tag directories" to live in one place, so that it's clear to all users that they're not meant to change.)
However, if we do decide to allow commits in directory T, and now our repository tree increments to revision 8, then T becomes a branch. Specifically, it's a branch of directory A which shares history with A up to a certain point, and then "broke off" from the main line at revision 8.
You may have been thinking, "Gee, this bubble up method seems nice, but it sure wastes a lot of space. Every commit to the repository creates an entire line of new directory nodes!"
Like many other revision control systems, Subversion stores changes as differences. It doesn't make complete copies of nodes; instead, it stores the latest revision as a full text, and previous revisions as a succession of reverse diffs (the word "diff" is used loosely here - for files, it means vdeltas, for directories, it means a format that expresses changes to directories).
For the initial release of Subversion,
A subversion repository is a directory that contains a number of components:
The Subversion filesystem is just that: a filesystem. But it's also
useful to provide an API that acts at the level of the repository.
The repository library (libsvn_repos
) does this.
In particular, it wraps a few libsvn_fs
routines, such as those
for beginning and ending commits, so that hook-scripts can run. A
pre-commit-hook script might check for a valid log message, and a
post-commit-hook script might send an email to a mailing list.
Additionally, the repository library provides convenience routines for examining and manipulating the filesystem. For example, a routine to generate a tree-delta by comparing two revisions, routines for constructing new transactions, routines for querying log messages, and routines for exporting and importing filesystem data.
Copyright © 2000-2003 Collab.Net. All rights reserved.
This software is licensed as described in the file COPYING, which you should have received as part of this distribution. The terms are also available at http://subversion.tigris.org/license-1.html. If newer versions of this license are posted there, you may use a newer version instead, at your option.
If you're not familiar with CVS or its shortcomings, then skip to Model
In the future, of course, contributors are free modify the Subversion filesystem to operate with more powerful SQL database.