AbiWord Documentation
【转】
Introduction
One of the major bits of AbiWord word processing code is
the Piece Table
PieceTable
- Todo:
- Add more class names / links to sources.
Introduction
A pt_PieceTable is
the data structure used to represent the document. It presents an interface to
access the document content as a sequence of (Unicode) characters. It includes
an interface to access document structure and formatting information. It
provides efficient editing operations, complete undo, and crash
recovery.
Overview
The PieceTable consists of the following classs:
- InitialBuffer -- This is a read-only character array consisting of the
entire character content of the document and initially read from the
disk. (All XML tags and other non-content items are omitted from
this buffer.)
- ChangeBuffer -- This is an append-only character array consisting of all
character content inserted into the document during the editing session.
- InitialAttrPropTable -- This is a read-only table of Attribute/Property
structures extracted from the original document.
- ChangeAttrPropTable -- This is an append-only table of Attribute/Property
structures that are created during the editing session.
在Abiword中以上信息主要在:pt_VarSet 类中定义。
- Piece -- This class represents a piece of the
sequence of the document; that is, a contiguous sub-sequence having
the same properties. Such as a span of text or an object (such as an in-line
image). It contains a links to the previous and next Pieces in the document.
Pieces are created in response to editing and formatting commands.
一个Piece相当于:pf_Frag 抽象类,Frag的类型可以是:文本、图片等。
- TextPiece -- This subclass represents a span of contiguous text in one
of the buffers. All text within the span has the same (CSS) properties. A
TextPiece is not necessarily the longest contiguous span; it is possible to
have adjacent (both in order and in buffer position)
TextPieces with the same properties. A TextPiece contains a buffer offset
and length for the location an size of the text and a flag to indicate
which buffer.
A TextPiece contains (or contains a link to) the text formatting
information. Note that the buffer offset
only gives the location of the content of the span in one of the buffers, it
does not specify the absolute position of the span in the document.
pf_Frag_Text 类。该类的所有文本必须具有相同的CSS属性。
- ObjectPiece -- This subclass represents an in-line object or image. It
has no references to the buffers, but does provide a place-holder in the
sequence.
pf_Frag_Object 类。可以是图片等。
- StructurePiece -- This subclass represents a section or paragraph. It
has no references to the buffers, but does provide (CSS) style information
and a place-holder in the sequence. There are no links between
StructurePieces or between other Pieces and their (containing)
StructurePieces.
pf_Frag_Strux
类。包括段落、章节等。
- PieceList -- This is doubly-linked list of Pieces. The are linked in
document order. A forward traversal of this list will reveal the entire
content of the document; in doing so, it may wildly jump around both of the
buffers, but that is not an issue.
pf_Fragments 类。是所有pf_Frag 类的容器,该容器是一个简单的双向链表的数据结构。
- PX_ChangeRecord --
Each editing and formatting change is represented as a ChangeRecord. A
ChangeRecord represents an atomic change that was made to one or more pieces.
This includes offset/length changes to a TextPiece and changes to the
PieceList.
- ChangeVector -- This is a vector of ChangeRecords. This is used like a
stack. ChangeRecords are appended to the vector (pushed onto the stack) as
they are created in response to editing and formatting commands.
The undo operation takes the last ChangeRecord in
the vector and un-does its effect. A redo operation
re-applies the ChangeRecord. The ChangeVector holds the complete information
to undo all editing back to the initial document. The index of the current
position in the ChangeVector is maintained. ChangeRecords are not removed from
the vector until the redo is invalidated. When a
ChangeRecord is removed from the vector, it is deleted.
px_ChangeHistory
类。该类可以执行undo 、redo 等操作。
Operations
- Insert(position,bAfter,c) --
To insert one or more
characters c into the document
(either before or after) the absolute
document position position, we do the following:
- Append the character(s) to the ChangeBuffer.
- Find the TextPiece that spans the document position.
- If the document position is in the middle of a TextPiece
(p1), we split it into two TextPieces
(p1a, p1c) and create a third TextPiece
(p1b). p1a and p1c contain
the left and right portions referenced
in p1. p1b spans the newly-inserted
character(s). The PieceList is updated so that the
sequence p1a,p1b,p1c replace p1 in
the list.
- If the document position is at the end of a TextPiece and the buffer position
in either buffer is
contiguous with the buffer and
position referenced in the TextPiece and the formatting is the same, we
may avoid the three part split and simply update the offset/length in the
TextPiece. This case is very likely when the user is composing text or is
undoing a delete.
- If the document position is between Pieces, a new TextPiece is created
and inserted into the PieceList.
- Create a ChangeRecord and append it to the ChangeVector. For
an insert, we construct a ChangeRecord of
type
InsertSpan
.
cr.span.m_documentOffset
contains the document
position of the insertion.
cr.span.m_span
marks the buffer position
of the text that was inserted.
cr.span.m_bAfter
remembers whether the insertion was
before or after the document position.
- Delete(position,bAfter,length) --
To delete one or more characters from the document
(either before or after) the absolute
document position position, we do the following:
- Find the TextPiece that spans the document position.
- If the length of characters is contained within the TextPiece
(p1), we split it into two TextPieces
(p1a and pl1b). The offsets and lengths are
set in the new TextPieces such that the deleted sequence is not in either
piece. (The deleted text is not actually deleted from the buffer;
there are just no references to it from the PieceList.)
- If the document position is at the beginning or end of a TextPiece, we
can just adjust the offset/length, rather than doing the split.
- If the deletion extends over multiple Pieces, we iterate over each
piece in the range and perform a delete on the sub-sequence. This will
result in a multi-step ChangeRecord.
- TODO what about non-TextPieces??
- Create a ChangeRecord and append it to the ChangeVector. For
a delete, we construct a ChangeRecord of
type
DeleteSpan
.
cr.span.m_documentOffset
contains the document
position of the deletion.
cr.span.m_span
marks the buffer position
of the text that was deleted.
cr.span.m_bAfter
remembers whether the insertion was
before or after the document position.
- InsertFormatting()
- ChangeFormatting()
- Undo -- This can be implemented using the information in the ChangeVector.
If the CurrentPosition in the ChangeVector is greater than zero, we
have undo information. The information in the
ChangeRecord prior to the CurrentPosition is used to undo the editing
operation. After an undo the CurrentPosition is
decremented.
- If the ChangeRecord is of type
InsertSpan
: we perform
a delete operation
using cr.span.m_documentOffset
, cr.span.m_span.m_length
and cr.span.m_bAfter
.
- If the ChangeRecord is of type
DeleteSpan
: we perform
an insert operation
using cr.span.m_documentOffset
, cr.span.m_span
,
and cr.span.m_bAfter
.
- If the ChangeRecord is of type
ChangeFormatting
:
- If the ChangeRecord is of
type
InsertFormatting
:
- Redo -- This can be implemented using the information in the ChangeVector.
If the CurrentPosition in the ChangeVector is less than the length of the
ChangeVector, the redo has not been invalidated and
may be applied. The information in the ChangeRecord at the CurrentPosition
provides complete information to describe the editing operation to be redone.
After a redo the CurrentPosition is advanced.
- Autosave -- This can be implemented by periodically writing the
ChangeBuffer, ChangeVector, and the ChangeAttrPropTable to temporary files.
After a crash, the original document and the temporary files could be used to
replay the editing operations and reconstruct the modified document.
Observations
- The content of the original file are never modified. Pieces in the
PieceList describe the current document; the original content is referenced in
a random access fashion. For systems with small memory or for very
large documents, it may be worth demand loading blocks of the original content
rather than loading it completly into the InitialBuffer.
- Document content data (in the two buffers) are never moved once
written. insert and delete operations
change the Pieces in the PieceList, but do not move or change the contents of
the two buffers.
- The result of an undo operation must produce
the identical document structure and content. Since consecutive Pieces in the
PieceList may have the same formatting properties and may refer to
congituous buffer locations
(there is no requirement to coalesce them),
an undo operation may produce a different PieceList
than we originally had prior to doing the operation that was undone.
- TODO Check this. Whether the PieceList should be identical or
equivalent.
or Issues
- TextPieces represent spans of text that are convenient for the structure
of the document and a result of the sequence of editing operations. They are
not optimized for layout or display.
- We can provide access methods to return a
const char
*
into the buffers along with a length, which the caller could
use in text drawing or measuring calls, but not c-style, zero-terminated
strings.
- Mapping an absolute document position to a Piece involves a linear search
of the PieceList to compute the absolute document position and find the
correct Piece. The number of Pieces in a document is a function of the number
of editing operations that have been performed in the session and of the
complexity of the structure and formatting of the original document. A linear
search might be painfully slow.
- TODO We will find a tree-structure to use instead
of the doubly-linked list that will give us O(log(n))
searching.
- TODO Consider caching the last few lookup results
so that we can avoid doing a search if possible. This should have a high
hit-rate when the user is composing text.
- We provide a complete, but
first-order undo with redo.
That is, we do not put the undo-operation in the undo (like emacs).
- TODO The before and after stuff
on insert and delete is
a bit of a hand-wave.
- TODO Need to add multi-step-undo so that delete operations which span
multiple pieces can be represented operation to the user.
Code
class PT_PieceTable
{
const UT_UCSChar * m_InitialBuffer;
const UT_UCSChar * m_ChangeBuffer;
pt_PieceList * m_pieceList;
pt_AttrPropTable m_InitialAttrPropTable;
pt_AttrPropTable m_ChangeAttrPropTable;
...
};
class pt_Piece
{
enum PieceType { TextPiece,
ObjectPiece,
StructurePiece };
PieceType m_pieceType;
<linked-list or tree pointers>
...
};
class pt_Span
{
UT_Bool m_bInInitialBuffer;
UT_uint32 m_offset;
UT_uint32 m_length;
};
class pt_TextPiece : public pt_Piece
{
pt_Span m_span;
pt_AttrPropReference m_apr;
...
};
class pt_ObjectPiece : public pt_Piece
{
...
};
class pt_StructurePiece : public pt_Piece
{
pt_AttrPropReference m_apr;
...
};
class pt_PieceList
{
<container for linked-list or tree structure>
...
};
class pt_AttrPropReference
{
UT_Bool m_bInInitialTable;
UT_uint32 m_index;
...
};
class pt_AttrProp
{
UT_HashTable * m_pAttributes;
UT_HashTable * m_pProperties;
...
};
class pt_AttrPropTable
{
UT_vector<pt_AttrProp *> m_Table;
...
};
class pt_ChangeRecord
{
UT_Bool m_bMultiStepStart;
UT_Bool m_bMultiStepEnd;
enum ChangeType { InsertSpan,
DeleteSpan,
ChangeFormatting,
InsertFormatting,
...
};
struct {
UT_uint32 m_documentOffset;
UT_Bool m_bAfter;
pt_Span m_span;
} span;
struct {
UT_uint32 m_documentOffset1;
UT_uint32 m_documentOffset2;
pt_AttrPropReference m_apr;
} fmt;
...
};
class pt_ChangeVector
{
UT_vector m_vecChangeRecords;
UT_uint32 m_undoPosition;
...
};
AbiWord 中Piece Table 数据结构的实现
原文:http://www.cnblogs.com/songtzu/p/3539739.html