
Summer 1992Volume 1, Issue 2
Go Back to DBAdvisor Main Page
In
This Issue:
By Bryan Holland
Alpha is DEC's new 64-bit RISC based architecture that
will run Open-VMS. Alpha is the future -- and VAX DBMS is
there. Engineers within Digital's Database Systems Group have
successfully ported DBMS to the ALPHA platform. Porting DBMS
to the Alpha architecture shows Digital's commitment to its
customers and to the future of DBMS.
Digital took the opportunity to show off DBMS running on
the Alpha platform during the recent DECUS and DECWorld
events.
The developers ported DBMS v4.3 to the Alpha platform. All
features, program and user interfaces and on-disk structures
remain unchanged from the VAX version of the product. DBMS is
DBMS, regardless of what platform it is running on.
Since the on-disk structures are consistent across
platforms, it will be possible to access the same VAX DBMS
database from either the VAX platform or the Alpha platform.
The Porting Effort
While "VMS is VMS", porting DBMS
to the Alpha platform was by no means a "recompile and
run" effort. Many portions of DBMS run in elevated
processor modes and make use of several low-level VMS
interfaces, thus requiring modification to run in the Alpha
environment.
All features, program and user
interfaces and on-disk structures remain unchanged from the
VAX version of the product.
One of the primary goals of the porting
effort was to maintain a common code-base (source code)
across platforms. To do this, it was necessary to eliminate
dependencies on certain VAX architectural features --
requiring a significant amount of code to be rewritten.
According to Digital engineers, over 99 percent of the DBMS
code is now common to both platforms.
Porting Your DBMS Applications to Alpha
According to engineers in the Database
System Group, porting customer DBMS applications from the VAX
to Alpha will generally require only a recompile. Customer
code generally does not make use of low-level VMS interfaces
or operate in elevated processor modes. Thus, most customer
ports will be very straight forward. However, to get the full
benefits of the Alpha architecture, customers will have to
naturally align their data structures.
Data Alignment Issues
To get the best performance from the Alpha
architecture, customers will want to naturally align the data
structures in their programs. Naturally aligned data
structures will yield performance improvements on both the
VAX and Alpha architecture's. While naturally aligning data
structures in the VAX environment will improve performance,
doing so in the Alpha environment significantly improves
performance.
Naturally Aligned Data
The port of DBMS to the Alpha
platform is proof of Digital's strong ongoing commitment to
DBMS and their customers.
Naturally aligned data, is data whose
lowest addressed byte resides at an address that is an even
multiple of the size of the item in bytes.
- a BYTE is naturally aligned at any
address
- a WORD is naturally aligned at an even
address
- a LONGWORD is naturally aligned at a
multiple of 4
- a QUADWORD is naturally aligned at a
multiple of 8
Many of the Alpha compilers will handle
data alignment issues, or issue warnings for unaligned data.
Moving customer applications to the Alpha platform will
generally be a straight forward path.
Translating VAX Images to Alpha with VEST
Digital is offering customers two migration
paths to the Alpha platform from the VAX. The first, and
usually the most straight forward, is the "recompile and
run" port. However, for customers that do not have
access to their application's source code, the VEST (Vax
Environment Software Translator) may be able to translate
their VAX images into Alpha images. Digital has been using
the VEST translator for over a year internally. VEST will
convert your VAX executables into Alpha executables. VEST is
essentially a compiler that uses VAX/VMS images as its source
code. There are two issues surrounding VEST translated images
that customers must be aware of:
- Translated images will execute on the
Alpha platform at roughly VAX speed -- They will not
be able to take advantage of the Alpha architecture.
- The ability to call privileged
sharable images (such as the DBCS) is questionable
for VEST translated images. The jury is still out on
whether VEST will provide a valid migration path for
porting your VAX DBMS applications to Alpha.
In general, customers will want to
"recompile and run" their applications to gain the
greatest benefit from the new Alpha architecture. However,
naturally aligning data will yield significant performance
improvements in the Alpha environment.
The port of DBMS to the Alpha platform is
proof of Digital's strong ongoing commitment to DBMS and
their customers. Alpha is Digital's architecture to move them
into the next century, and DBMS will be there to support your
high-performance application requirements!
Contents
By SCI Technical Staff
DBMS is a robust product with many features
and parameters. Which features should be implemented, and how
they should be implemented, varies from site to site.
Determining the optimal configuration for your environment
is not a trivial task!
This section will be reserved in future
issues of DBAdvisor for your technical
DBMS questions. SCI staff will provide answers to your
questions, and discuss alternative solutions.
SPAM Errors reported by DBO/VERIFY/ONLINE
Q: When running DBO/VERIFY/ONLINE, we receive the
following error messages:
"DBO-W-PGSPAMENT, area <str>,
page <int>; the <int>% fullness value for this
data page does not fall within the <int>-<int>%
range found on the space mgmt. page".
However, if we run DBO/VERIFY standalone on
the same database the error messages do not appear.
What are we doing wrong?
A: The problem
you are experiencing, is a timing problem. The
DBO/VERIFY/ONLINE operation uses batch retrieval (snapshot
files) to ensure a consistent view of the database during the
online verify. However, the snapshot files contain only
storage-area line entries (records), not the SPAM
pages or SPAM entries for each page.
Since the SPAM entries are updated as
records are being added, deleted, or modified during the
online verification operation, the updated SPAM values are not
being written to the snapshot files.
Although there are no perfect solutions to
this problem, there are a couple of ways that you can
eliminate or minimize the impact:
- Run DBO/VERIFY on a copy of the
database restored from the most recent backup. This
solution has the advantage that it not only verifies
the internal integrity of your database, but it also
validates the integrity of your DBO/BACKUP and media.
The obvious down-side to this solution is the time
and space required to implement it.
- To minimize the impact of this
problem in your environment, you should schedule your
DBO/VERIFY/ONLINE to run during low update
periods. This makes sense for all /ONLINE operations
since it will reduce the overhead of the /ONLINE
process.
File Access Errors
Q: When
accessing the database for CONCURRENT RETRIEVAL, we sometimes
receive the following fatal error:
DBM-F-FILACCERR, error opening storage area
file <str>; -SYSTEM-W-ACCONFLICT, file access conflict.
There were no other processes
accessing this database when the error occurs. This problem
seems to occur more frequently during the evening hours.
A: On the
basis of our analysis, the problem you are experiencing is a
conflict with the VMS system BACKUP. This would explain why
you are seeing the problem more frequently during the evening
(when your backups are scheduled).
First, you should make absolutely
certain that you are not using VMS backup to
backup your DBMS database (see Backup article in this issue).
Use the DBO/BACKUP utility to backup your DBMS database.
Next, you should ask your system manager to exclude all
database backups from the VMS system backup (/EXCLUDE=
*.DBS,*.SNP,*.ROO). In addition, SCI recommends that
customers use the DCL COMMAND SET FILE/NOBACKUP to ensure
that DBMS files are not accidentally backed (and restored) by
the system backup and restore procedures. If you attempt to
use VMS to restore a file that was marked
"/NOBACKUP", the error will be obvious -- an
empty file will be restored (VMS will backup the header
of files marked with /NOBACKUP). Always use
the DBO/BACKUP and DBO/RESTORE routines to backup and restore
your DBMS databases!
Extraneous AREABUSY exceptions
Q: Periodically, we receive the error message,
"DBM-F-AREABUSY, usage of storage area <str>
conflicts with a co-user", when we attempt to ready our
database for CONCURRENT RETRIEVAL -- An extensive review of
our code found no cases where any of the areas were
readied in an incompatible mode.
A: VAX DBMS can return the DBM-F-AREABUSY message on a
READY statement even though compatible ready modes
(CONCURRENT RETRIEVAL and CONCURRENT UPDATE) are being used.
This situation can occur when an attempt is made to ready an
area while database recovery is in progress. During database
recovery, all database activity is frozen until the DBR
recovery process has completed. When this situation arises,
VAX DBMS will not wait for the freeze to complete if the lock
request is a NOWAIT lock request. On READY statements, NOWAIT
is the default for area locks requests.
There are a couple of ways to handle this
situation:
1. Ready the database with WAIT (READY
CONCURRENT RETRIEVAL WAIT). This will make the process
issuing the ready wait for an area lock to be granted until
after a database freeze has completed.
2. Modify your programs to trap for this
exception, wait a few seconds and then try to READY the area
again.
Send
Your VAX DBMS related Questions to:
DBAdvisor
c/o Software Concepts International
Nashua, NH 03063, USA
Phone: (603) 879-9022
Internet Address:
sciinc@sciinc.com
Answers will be published in
VAX DBMS Hints, Kinks and Workarounds in
the next issue of the DBAdvisor
Contents
QuickDisconnect™
QuickDisconnect Enables
You to Reorganize Portions of Your
Database Easily and Efficiently by Removing
Cross-Area-Pointers!
Software Concepts International announced
plans to develop QuickDisconnect. According to Bryan
Holland, SCI owner, QuickDisconnect will offer
significant performance improvements over traditional 3GL/DML
(Data Manipulation Language) approaches, when disconnecting
all records from a set.
This is a perfect solution for any VAX DBMS
site that needs to reorganize only a portion of their
database. QuickDisconnect efficiently removes all
cross-area-pointers broken by an initialize after you have
unloaded the data in the "problem area".
Cross-Area-Pointers
Cross-Area-Pointers (CAPs) connect records
in one database storage area with records in a different
storage area. Most database designs include the use of CAPs,
since they offer greater flexibility and significant
performance benefits over single area database designs.
However, if you initialize a storage area with CAPs, the set
pointers in the remaining area no longer point to a valid
DBMS record -- your database is corrupt.
The Reload Dilemma
If you ever tried to reorganize a single
area in your database, you have no doubt been frustrated with
having to deal with cross-area-pointers. If the schema
definition permits, you can use the DML verb, DISCONNECT, to
remove these set relationships before initializing the
"problem area" . However, if the set is defined as
FIXED or MANDATORY, you are forced to reload and erase
additional records that are not stored within the
"problem area" -- possibly forcing you to
reload the entire database.
Traditional Methods Are
Inefficient
Using a traditional 3GL/DML
approach to this problem is grossly inefficient.
A DML DISCONNECT or ERASE verb must update at least
three records for every disconnect
performed...and every member occurrence must be
disconnected before you can initialize the problem
area! If you consider that each disconnect may result
in multiple I/Os to the database, you can see why the
DML DISCONNECT or ERASE option is very expensive.
|
|
The QuickDisconnect Solution
QuickDisconnect is a specialized
program, designed to efficiently remove all record
occurrences from a set. QuickDisconnect does this by
directly removing the pointer clusters and index nodes of the
set-type from all appropriate records in the database in a single
pass through the area. Database pages are read only once and
the affected records are updated only once.
QuickDisconnect further improves
performance by reading large portions of the database into
its buffers with each I/O, thus reducing the number of I/Os
to an absolute minimum.
Disconnects Fixed and Mandatory Sets!
QuickDisconnect disconnects sets by
directly removing the pointer clusters of the appropriate
set-types -- regardless of SCHEMA retention modes. Thus, even
Fixed or Mandatory sets can be disconnected with QuickDisconnect!
After unloading the data in the problem area and initializing
the area the CAPS are efficiently removed by QuickDisconnect.
The set-connections are reestablished using the DML CONNECT
verb -- even for set-types defined with an AUTOMATIC
insertion mode and a FIXED or MANDATORY retention mode.
Improves Availability
QuickDisconnect allows you to
partition your database reorganizations into more manageable
pieces. Being able to reorganize individual areas of your
database means less time reorganizing your database...and
that improves the availability of your database!
QuickDisconnect
is the ideal solution for any
situation that requires all members of a set to be
disconnected from their owners.
Licensing
QuickDisconnect will be licensed on
a per-node basis. Priced at only $9,995 for the distribution
license, and $4,995 for the right-to-copy license regardless
of node size.
Limited time, Special Offer!
SCI is offering a 25 percent discount for
customers who sign predevelopment POs. This offer provides
customers with significant savings with no risk.
*QuickDisconnect is a trademark of
Software Concepts International
*VAX DBMS is a trademark of
Digital Equipment Corporation
Call Software Concepts International at
(603) 879-9022 to order QuickDisconnect or ReloadExpert!
FOR
MORE INFORMATION
New England
Software Concepts
Nashua, NH 03063
(603) 879-9022
Contents
Implementing SPAMS
for Performance
By SCI Technical Staff
Space Area Management (SPAM) is a technique
used by VAX DBMS to improve the storage performance of nearly
full storage areas. The objective of SPAM technology is to
allow the DBCS (Database Control System) to quickly locate a
page with sufficient free space to store new records in your
database. SPAM pages maintain an inventory of available space
on database storage area pages. They enable databases with
SPAM pages to spend less time searching for free in your
database during STORE operations. Without SPAM
technology, the DBCS would have to search each page in the
storage area until it finds a page with sufficient free space
to store the new record.
This article examines the inner workings of
SPAM technology. By understanding the purpose of SPAMs and
how they work, you can make better decisions when
implementing SPAMs in your database design. You will also
learn why decreasing SPAM thresholds to a tiny number in
versions prior to V4.3 didn't result in a decreased level of
record fragmentation.
The optimal SPAM values for your database
storage areas depend on the size of the records stored in the
area, the desired record placement strategy, the level of
database contention in your database, as well as the
"typical" order in which records are stored on a
database page. For example, storing a large record followed
by several smaller records requires different strategy than
if the small records are stored first, followed by the large
records.
SPAM pages maintain a bit-mapped inventory
of the free space available on the storage pages in your
database. Two (2) bits are used to track the free space of
each storage page in your database.
SPAM
Bits |
Decimal |
Threshold |
Default |
| 0 0 |
0 |
0 |
0 |
| 0 1 |
1 |
1 |
70 |
| 1 0 |
2 |
2 |
85 |
| 1 1 |
3 |
3 |
95 |
Without SPAM technology, the DBCS
would have to search each page in the storage area until it
finds a page with sufficient free space to store the new
record.
Thus, the free space on each database page is represented
by a value between zero (0) and three (3), inclusive. These
values correspond to the threshold values you set when you
create the database (or when you restore the database using
DBO/RESTORE/THRESHOLD). The SPAM entry for a page is updated
whenever the amount of data on that page crosses a threshold
boundary.
It is important to note, that while the DBCS uses the SPAM
pages to locate pages with sufficient free space, the
threshold values of the SPAM pages are based on used space.
The SPAM algorithm
(pre v4.3):
The DBCS first checks the target page
for free space, regardless of SPAM thresholds. The
target page is the page in the database that the DBCS wants
to store the new record based on the PLACEMENT clause
in the storage schema. If there is not enough space available
on the target page for the new record, then the DBCS checks
the other pages in the current database buffer for a
page with sufficient free space to store the new record. The current
database buffer is the buffer that contains the target
page (as well as additional database pages based on the
buffer length). The DBCS only checks the SPAM pages to
look for free space when none of the pages in the current
buffer contain sufficient free space to store the new record.
Thus, with versions of DBMS prior to version 4.3, it is
possible to exceed the third threshold (T3) when storing new
records. That is, there was no way to define the maximum
fullness of the database pages for storing new records. This
could cause excessive fullness of the database pages,
resulting in severe record fragmentation when the DBCS adds
set pointer clusters to records during a CONNECT operation...regardless
of SPAM values!
The SPAM algorithm
(v4.3):
The SPAM algorithm has been enhanced
slightly for version 4.3 of DBMS. While earlier releases of
DBMS would store a new record on any page in the buffer
containing the target page regardless of SPAM values,
version 4.3 will not store a record on a page that is already
at the T3 threshold.
Sites that calculated SPAM thresholds based
on the old (pre v4.3) algorithm should adjust their SPAM
thresholds to reflect the new SPAM algorithm introduced with
version 4.3.
While this change offers some improvement
over the prior SPAM algorithm, what database designers really
need is the ability to specify how much free space to leave
on a page. Thus, by knowing how many records will be stored
on a page and how much each record is likely to grow (due to
set insertions and dynamic data items), it would be possible
to reserve this space on the page for future record
expansion.
SPAM parameters
There are two database parameters that
affect Space Management; the threshold values, and the
interval. Selecting optimal values for SPAM parameters
can dramatically improve database performance -- Using the default
values are guaranteed to be a poor choice.
The threshold values determine the percent
fullness at which the SPAM entries are updated, while the
interval determines how many database storage pages are managed
by each SPAM page.
SPAM Interval
Selecting optimal values for SPAM
parameters can dramatically improve
database performance
Using the default values are guaranteed
to be a poor choice.
The /INTERVAL qualifier determines how many
data pages are managed by each SPAM page in a database
storage area.
| |
Interval
|
|
Interval
|
| S
|
d |
d |
d |
... |
d |
d |
d |
S
|
d |
d |
d |
... |
d |
d |
d
|
| P
|
a |
a |
a |
... |
a |
a |
a |
P
|
a |
a |
a |
... |
a |
a |
a
|
| A
|
t |
t |
t |
... |
t |
t |
t |
A
|
t |
t |
t |
... |
t |
t |
t
|
| M
|
a |
a |
a |
... |
a |
a |
a |
M
|
a |
a |
a |
... |
a |
a |
a
|
| SPAM
Interval Diagram |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Increasing the SPAM interval may reduce I/O, but may cause
additional contention for SPAM pages in a high contention
update environment.
The default interval for SPAM pages is 256. This means
that, by default, 256 database storage pages are managed by
each SPAM page. This is also the minimum value that you can
specify for the SPAM interval. The maximum SPAM interval
value is determined by the following equation:
MAXIMUM_SPAM_INTERVAL = ((blocks-per-page * 512) - 22) * 4
The value of 512 is the number of bytes per block. The
value of 22 is the number of bytes in the SPAM page header,
and the value of 4 is the number of SPAM entries (two bits)
per byte (eight bits).
In general, the larger your storage area, the larger the
SPAM interval value should be. This methodology optimizes
SPAM effectiveness by reducing the free space search time,
while balancing the likelihood of SPAM page conflict. Low
contention environments should use the maximum interval
value, while high-contention update environments
should lower their interval values.
The Optimal
SPAM Values can significantly improve database performance!
Determining what the "optimal" values are is a difficult
task!
Software Concepts
International staff is the recognized authority in VAX DBMS
technical issues.
Call (603) 879-9022 for
VAX DBMS Technical Consulting!
The trade-off when setting SPAM intervals
is data page I/O (searching for free space in which to store
new records) versus SPAM page conflicts (which results in
additional synchronization and I/O to the SPAM page). DBMS
provides statistics to help you measure the performance
impact of both sides of the SPAM interval equation.
Interval Statistics
The best measure of SPAM page contention
can be seen on the PIO statistics page of DBO STATISTICS. The
statistic, "unmark buffer -- SPAM page", measures
how often SPAM pages are being written back to the database
(SPAM page write I/Os). However, this statistic is not a
direct measure of SPAM page contention, since SPAM pages are
written back to the database due COMMIT/ROLLBACK, buffer
overflows as well as contention.
PIO Statistics
To make sense of the "unmark buffer --
SPAM page" statistic, compare this statistic with the
other statistics in the "unmark buffer" group.
If the "unmark buffer -- SPAM
page" statistic is significantly higher than the
"unmark buffer -- transaction", this is good
indication of SPAM page conflicts.
Another measure of SPAM page contention can
be found by comparing the "unmark buffer -- SPAM
page" with the "unmark buffer -- lock
conflict" statistic. However, use caution when making
this comparison -- the "unmark-buffer -- lock
conflict" statistic includes all page
buffers -- not just the SPAM pages.
SPAM page contention also shows up in the
page locking statistics. Increased SPAM contention would
cause an increase in blocking ASTs for page locks. However,
since the page locks also manage the entire buffer
pool, it is difficult (if not impossible) to determine the
severity or cost of SPAM page conflicts from this statistic.
SPAM Thresholds
You can specify up to three threshold
values for the /THRESHOLD qualifier. It is possible to set
two or more of the threshold values to the same number, thus
effectively eliminating one of the threshold levels. If you
have only one or two record-types of different lengths in
your storage area, use only one or two threshold values to
manage space in that storage area. Reducing the number of
thresholds may reduce the frequency in which the SPAM
pages are updated, thus reducing SPAM page conflicts. By
default, DBMS sets three thresholds, with the values 70, 85,
and 95 percent.
SPAM threshold values are based on the
percentage of used space on a database page. However,
to effectively plan your threshold values, you must think of
threshold values in terms of how much free space is guaranteed
to exist on a database page for each threshold value. The
following table helps to illustrate this concept:
Threshold |
SPAM
Value |
Guaranteed Percent Free |
| 0 |
0 |
30% |
| 1 |
70 |
15% |
| 2 |
85 |
5% |
| 3 |
95 |
0% |
Thus, if you attempt to store a record that occupies 65%
of the database page, the DBCS will check every page that is
at the T0 threshold. The DBCS will not waste time checking
those pages whose threshold values exceed the T0 level, since
they do not guarantee sufficient free space for
the new record. Note, that in this example, it is still
possible that the DBCS will physically examine several
database pages since the lowest threshold value (70)
guarantees only 30% free space, while 65 percent is required
to store this new record. Thus, if most or all of the pages
in the storage area are above 35% fullness, the DBCS will
yield little or no benefit from the SPAM thresholds. An
approximation of the efficiency of this threshold value can
be measured by using the following equation:
SPAM_efficiency =
SPAM_threshold - (100 - pct_needed)
SPAM_threshold - SPAM _ threshold (-1)
= 70 - (100 - 65) = 50%
70 - 0
In this example, only 50 percent of the pages at the T0
threshold will have sufficient space to store the new record!
This equation is only approximate, since it assumes a
normal distribution of page percentage fullness. In practice,
the actual distribution page fullness tends not to
follow a normal distribution.
The objective when selecting
thresholds, is to choose values that will minimize the amount
of physical I/O necessary to store new records in the
database.
The objective when selecting thresholds, is to choose
values that will minimize the amount of physical I/O
necessary to store new records in the database. Since the
record size, placement strategies, and update activity is
likely to vary from area to area you must calculate and use
different threshold values for each area in your database.
SPAM Algorithms
To determine the optimal threshold values,
we will need to understand more about the free space search
algorithm used by VAX DBMS.
MAX_FREE_LEN...
The actual amount of space that SPAMS are based on is
called the "maximum free length". This value
represents the amount of space available on a database page
for storing user records (page size less page overhead). The
calculation for the max_free_len is different for areas with
snapshots and those areas without snapshots.
Pages with snapshots:
MAX_FREE_LEN =
(blocks-per-page * 512)
- (24 + 4 + 4 + 18 + 10)
or
(blocks-per-page * 512) - 60
Pages without snapshots:
MAX_FREE_LEN =
(blocks-per-page * 512)
- (24 + 4 + 10)
or
= (blocks-per-page * 512) - 38
The value of 512 is the number of bytes per
block. The value of 24 is the number of bytes in the database
page header. The first value of 4 is the number of bytes per
line-index entry, the second value of 4 (pages with snapshots
only) is the number of bytes per TSN (Transaction Sequence
Numbers are used by snapshot transactions) index entry. The
value of 18 (pages with snapshots only) is the number of
bytes in the page footer. The value of 10 is the minimum
number of bytes reserved for the SYSTEM record on each page.
Thus, any threshold value
greater than 90% (such as the default T3 value of 95%) is
useless!
SPAM thresholds are percentages of
MAX_FREE_LEN, and are always rounded down to the next even
byte. Thus, the MAX_FREE_LEN of a storage area with snapshots
and a page size of 2 blocks per page is (1024 - 60) or 964
bytes. Using the default thresholds of 70, 85, and 95, the
actual threshold byte-values are 674, 818, and 914
respectively. For an existing database, you can confirm these
values by issuing the following command:
$ DBO/DUMP root-file-spec/OPT=DEBUG
For each storage area, look for the
following symbols:
SPAM_T1 = 1st SPAM threshold in bytes
SPAM_T2 = 2nd SPAM threshold in bytes
SPAM_T3 = 3rd SPAM threshold in bytes
Determining required space...
Next, we need to understand how much space the DBCS will
look for when storing a new record. The DBCS uses the
following equation to determine the required space needed to
store the new record:
REQUIRED_LEN =
MAX(record-len + 5, 10)
+ PAGE_PAD_LEN
The value of record-len is the size of the static
data portion of the record, as defined by the storage
schema. It is easy to determine this value for each of the
record-types in your database by using
DBO/DUMP/STORAGE/OPTION = DEBUG and searching for the value
of the "Static data portion length" of each
record-type.
PAGE_PAD_LEN...
The DBCS actually reserves a portion of each database page
(based on the PAGE_PAD_LEN) for future record growth on the
page. The value of PAGE_PAD_LEN is currently calculated as 10
percent of the physical page size (blocks-per-page * 512) *
0.10 -- Thus, any threshold value greater than 90% (such as
the default T3 value of 95%) is useless!
The value of PAGE_PAD_LEN for each area can be verified by
dumping the database root file as described above. Search for
the value of PAGE_PAD_LEN for each storage area in your
database.
As stated previously, the value used by the
DBCS for the record length, is the length of the static
data portion of the record. This value can be verified using
DBO/DUMP/STORAGE/OPTION = DEBUG and looking for the value of
the "Static data portion length" for each
record-type in your database. This value does not include the
length of known (automatic) set pointer clusters, or the
length of dynamic storage items. In other words, the
DBCS does not use the actual record-length of the storage
segment -- the value used by the DBCS will always be
less than the actual record length.
When looking at the SPAM pages, the DBCS
looks for the largest SPAM threshold (in terms of bytes) that
is less than or equal to MAX_FREE_LEN - REQUIRED_LEN.
Pages at the T3 threshold are never
searched, since they cannot guarantee any free space.
Pages at the T0 level are always searched, since they
guarantee the "most" free space.
The following SPAM Selection Example Table
illustrates what threshold value DBCS will search for using
the default thresholds for a database area with snapshots and
a page size of two blocks per page, and assuming a page pad
length of ten percent (102 bytes):
Threshold Statistics
The objective when setting threshold
values, is to minimize the number of pages the DBCS searches
when storing new data. DBMS provides several statistics that
can help you measure the effectiveness of your SPAM
thresholds.
Record Statistics
The Record Statistic, Pages Checked, is the single
most useful statistic for determining the overall
effectiveness of your SPAM thresholds. This statistic shows
the number of data pages that were checked before the DBCS
found a page with sufficient space to store the record. By
comparing the Pages Checked statistic, with the Records
Stored statistic, you can get a very accurate picture of
the effectiveness of your SPAM thresholds. In a well-tuned
database, Pages Checked should be no more than 2 times
Records Stored.
PIO Statistics
The PIO Statistic, "SPAM Page Request", provides
one measure of the effectiveness of your SPAM strategy. A
high number of SPAM page requests relative to the database
activity may indicate that your SPAM interval is too low, and
that the DBCS is having a difficult time locating free space
for new records. This number will also increase when the
storage area becomes over utilized.
| SPAM Selection Example
|
Threshold
Level |
Max Free Bytes |
Max Free Len |
Guar
Free
Bytes |
Page
Pad
Len |
Max Record Size |
When Threshold is used |
| T3 |
914 |
964 |
0 |
102 |
= -107 |
Never Searched!
|
| T2 |
818 |
964 |
50 |
102 |
= -57 |
Never Searched!
|
| T1 |
674 |
964 |
146 |
102 |
= 39 |
Searched only if the
record-len <= 39 |
| T0 |
0 |
964 |
290 |
102 |
= 183 |
Always searched
|
| GUARANTEED_FREE_BYTES
= Max_Free_Len - Max_Free_Bytes(T-1) PAGE_PAD_LEN = 10% of Page size in bytes
MAX_RECORD_SIZE =
Guaranteed_Free_Bytes - Page_Pad_Len - 5
|
Specialized
VAX DBMS training Seminars
from the leading VAX DBMS support specialists!
- VAX DBMS Database Administration (5 days)
- VAX DBMS DBA Performance (3 days)
- VAX DBMS Internals & Data Structures (4 days)
- VAX DBMS Programming (5 days)
- VAX DBMS Advanced Programming (3 days)
Call Software Concepts International
today
for seminar registration and information!
(603) 879-9022
Contents
By Digital Equipment
Corporation's
Database Engineering and Documentation Group
VAX DBMS provides backup and restore
utilities that offer many different backup options. A
well-constructed database backup and recovery strategy will
include a combination of these functions selected to match
the needs of your particular environment. There is no single
backup strategy that works best for all configurations. The
wrong strategy could even cost more than it is worth.
There is no single backup strategy
that works best for all configurations. The wrong strategy
could even cost more than it is worth.
A proper database backup strategy should
strive to achieve a balance between minimizing operational
overhead and maximizing database integrity. Over the life of
a database, you will perform many more backups than restores.
Since you cannot predict when a disk device will fail, or
when a database will become corrupt, you must continually
backup your data as it changes. This article contains
suggestions and recommendations on making frequent database
backup operations easier and infrequent restore operations
more reliable.
Rules of a Strategy
A database backup strategy should be
customized for your environment. It should take into
consideration the system workload, usage schedule, importance
of data, and hardware environment of the database. However,
there are some guidelines that apply to all databases.
Incorporating these guidelines into your strategy will ensure
more reliable and more cost effective backups.
Digital strongly recommends the following
guidelines:
- Always use after-image journal (.AIJ)
FILES.
- Always maintain a log of your backup
schedule.
- Occasionally store backups in a
separate location.
- Do not use the VMS Backup
Utility on a database.
- Spread your database across several
disks.
- Always consider your existing hardware
configuration and system workload.
- Determine the volatility and value of
your data.
- Verify the integrity of your database
before each full backup.
- Verify the integrity of your backup
media after each backup.
Always use after-image (AIJ) files
Digital strongly recommends the use of
after-image journaling. As data modifications are made to
your database, an .AIJ file captures the operations each
transaction makes. This provides exceptional recovery
capabilities because you will be able to restore your
database to the same state it was in before the corruption or
hardware failure occurred. Considering the balance between
cost effectiveness and data reliability, .AIJ files typically
cost 5-10% (or less) in overhead but provide essentially a
continuous backup of your transactions.
You should not enable and disable
after-image journaling at different times. To be effective,
journaling must be used consistently. Leaving
"holes" in your journal file defeats the purpose of
this option. An .AIJ file should always be on a disk separate
from all associated database files. If this disk is lost, you
should restart the journaling and then immediately perform a
full database backup. If the disk with the root file or
storage area files is lost, you restore the lost files from
backup and then use the .AIJ file to recover the most recent
transactions. If either the root file and .AIJ files, or the
storage area and .AIJ files were on the same disk, you will
lose data if the disk failed.
NOTE:
After enabling after-image journaling, you should immediately
perform a full database backup.
Failure to do so may result in a non-recoverable database.
The database reference manuals contain
additional information on backing up and recovering .AIJ
files.
Always maintain a backup schedule and log
A well-designed database backup and
recovery strategy is useless if the strategy is not
publicized and practiced. Operational personnel need to be
aware of the importance of the backup methodology, and they
need to be able to quickly access backup media in the event
database recovery is necessary.
Accurate record-keeping and media storage
are essential to a proper backup and recovery strategy. The
backup schedule should include dates and times, type of
backup and tape labels.
Occasionally store backups in an offsite
location
Backups are performed to facilitate
database restoration, if your primary media fails, once the
problem is corrected. There are many possible sources of
failure of your primary media, and some of these sources must
be considered in your backup strategy. Also, always remember
that the data in your database is much more valuable than the
physical media it is stored on.
If a site catastrophe destroys your disks,
would your backup media still be safe? While it may be
convenient, leaving your backup tapes on top of the tape
drive is not a good idea. You should store you backup media
in a separate location -- whether offsite or just another
area of the building. Although offsite storage is the safest
method, access to an offsite archive will take longer than
one on-site, and there may be security issues about removing
confidential data from your site. A compromise might be to
send backups offsite only once a month. Traditionally, you
send the previous month's or week's backup offsite, keeping
the most recent backup in a safe location on-site.
Magnetic tapes are an inexpensive,
reliable, and reusable backup media. The environmental
parameters for tapes are often broad and sometimes can lead
to abuse. The audio tapes stored in your car can develop some
hiss, and the video tapes in your home can drop out some
colors, but if your backup tapes are damaged in this way,
they are useless. You must follow the manufacturer's
instructions on temperature, humidity, magnetic fields, and
tape life span. A specially constructed tape library or
archive is the safest storage strategy.
NOTE:
Tapes age and become less reliable over
time. A good backup strategy precludes your ever needing to
gamble on restoring a set of five-year-old tapes.
Do not use the VMS Backup Utility on a
database
In the earliest versions of VAX DBMS, the
VMS Backup Utility was a legitimate alternative to the
database Backup and Restore Utilities. Today, there are many
reasons not to use the VMS backup utility. For
example:
- VMS backup does not consider database
free space and wastes time and tape backing up unused
space in the database.
- VMS backup does not understand the
relationships between files. Databases spread across
several devices would need special attention to
ensure that the correct database root file, .AIJ, and
storage area files are backed up safely and that they
are synchronized.
- VMS backup does not provide the means
for a database to be backed up safely when the
database is in use. You would need to shut down your
database during backups to ensure that no
transactions occurred during the backup operation;
this is clearly unacceptable for 7x24 applications.
- VMS backup may lock database
files, thus preventing access to the database for
large periods of time.
In the earliest
versions of VAX DBMS, the VMS Backup Utility was a legitimate
alternative to the database Backup and Restore Utilities.
Today,
there are many reasons
not to use the VMS backup utility.
The multi-threaded database backup utility
is designed specifically for VAX DBMS and Rdb/VMS databases.
Digital recommends that you always use the multi-threaded
backup capabilities of the product when backing up a database
to tape because the utility:
- Operates faster than VMS Backup
because it has been specifically optimized for
databases.
- Uses less tape than VMS backup because
it does not backup free space.
- Understands file dependencies and
relationships.
- Can write to multiple tape devices
simultaneously, reducing the total amount of backup
time.
- Allows backups to be accomplished
without shutting down the database.
The ability of the database backup utility
to write to multiple tape drives simultaneously depends on
the degree of concurrency (i.e., the number of simultaneous
writes) the number of drives, TCUs (controllers), HSC STI
ports and how they have been cabled together.
Simultaneous writing to multiple tape
derives also depends on specific versions of VAX DBMS and
Rdb/VMS. The following table identifies what capabilities
exist in the various versions of the database products.
| DBMS |
Rdb |
Tape Drive Capability |
| V4.3 |
V4.1 |
Will give you the
option of overriding the auto-configuration done by
the backup utility. |
| V4.2 |
V4.0 |
Will treat all drives
that appear on the same HSC as not capable of
simultaneous writes. |
| V4.1 |
V3.1 |
Does not explicitly
support TA9x drives, although the drives may work to
some degree |
Never use VMS file operations, such as
COPY, RENAME and DELETE on database files.
Multi-threaded backup uses the same error recovery and
group redundancy as the VMS Backup utility. Considering all
factors in the database environment, it is significantly more
reliable than VMS Backup.
Never use VMS file operations such as COPY, RENAME and
DELETE on database files.
Always spread your database across several
disks
A backup strategy should provide guidelines
on backups and restores of your database. In addition, your
overall database strategy should include safeguards for
protecting your data from becoming corrupt and needing a
restoration. While you cannot predict hardware failures, you
can minimize their effects.
By spreading your database across several
disks, you can eliminate having a single point of failure.
Even if your database is not large enough to require multiple
disks, Digital strongly recommends putting the root file on a
disk separate from the storage are disks, and you should always
put the .AIJ files on a separate disk from the rest of the
database. In addition, balancing busy storage areas between
different disks is often a good tuning suggestion, but it
also makes sense from a safety point of view.
The VAX DBMS database reference manuals
provide additional information on backing up individual
storage areas of a database.
Whether you backup you database to tapes or
to disk, your backup schedule could overlap the routine VMS
backups of your system (especially if your database files
share their disks with other users.) Because VMS backups
often run at a high priority, you should consider waiting
until they are finished before starting a database backup.
There will be less chance of mixing up tapes, less contention
for the disks, and, if your system has multiple tape drives,
you can allocate all of them to a multi-threaded database
backup.
Always consider your existing hardware
configuration and system workload
The decision to backup your database either
to tape or to disk will be determined primarily by your
hardware environment and by personal preference.
There are a few suggestions to consider:
- Rather than backup to disk and then to
tape, it is often faster and more reliable to backup
directly to tape.
- Multi-threaded backup to tape is
usually faster than a single-threaded backup to disk.
If you can allocate two tape drives, the backups
complete in about half of the normal time. Using
three drives completes the task in about a third of
the time. In some configurations, the speed of
multi-threaded backups is limited more by the VAX CI
(Cluster Interconnect) port than by the speed of the
tape drives.
- Historically, disk media is more
reliable and faster than tapes. However,
multi-threaded backup uses redundancy blocks which
are more reliable than the single verify pass you get
with a disk.
- Tapes are cheaper and easier to store
than removable disk media. You can afford to keep
more old backup versions around before you recycle
the tapes. Try to maintain a minimum of three (3)
versions of full database backups, since a single
past version is not sufficient for a reliable backup
strategy.
- If you backup to non-removable disk
media, you can probably only keep one past set of
backups, which is also insufficient for a reliable
backup strategy.
Determine the volatility and value of your
data
The volatility of your database refers to
how often data changes. If you can determine what data is
changing, and how often, you can create a strategy based on
backing up the most frequently changed storage areas more
often. For example, the data in an inventory area could be
updated several times an hour, an employee field could be
updated at the end of every shift, and a list of suppliers
might only change every few months. Determining and mapping
this workload will help determine the needs of your site.
The time required to complete incremental
backups, and especially storage area incremental backups, is
a good indication of changes to a database. Consider that a
percentage of data in a database changes daily. Calculate
this percentage and perform a full backup when 10% of a
database has changed. While it is difficult to calculate this
figure, it is much simpler to observe that when incremental
backups take longer to complete each day, a greater
percentage of a database has been changed since the last full
backup. In fact, because of the extra calculations involved,
an incremental backup could take longer than a full backup. A
large incremental restore can take much longer than a full
restore. Through observation, you should be able to determine
the break-even point for performing a full backup rather than
an incremental backup.
For backup to tape, the
multi-threaded backup is optimized to proceed at
whatever speed the hardware configuration will allow.
Allocating more tape drives
decreases backup times proportionally.
In addition to the frequency of backups, a
backup strategy should also consider the differing daily
usage of a database. For example, a typical backup schedule
might include incremental backups every night and a full
backup on Friday night. However, if an examination of your
usage shows that most changes are made on Wednesdays, they
you should consider moving the full backup to Wednesday
night. Similarly, if your database is accessed by users in
multiple time zones, you might consider starting backups an
hour or two later when the activity level has quieted down.
Scheduling backups around user activities causes less impact
on the users and results in more efficient backups.
Verify the integrity of your database
before each full backup
A backup strategy is useless if the
database being backed up is corrupt to begin with. VAX DBMS
contains online utilities to verify the correctness of a
database; these tools should be used prior to starting the
database backup operation. No database backup operation
should be allowed to proceed if the database has not been
verified as consistent in all respects.
Verify the integrity of your backup media
after each backup
One of the most common mistakes made is not
verifying the integrity of the database backup media; at
least, not until the backup is needed to be restored and then
it is too late.
After a database is backed up, the backup
media should be verified. Verification ensures:
- that the backup strategy works as
intended.
- that the backup media is verifiable.
- that the backup media is restorable.
There are several methods available to
verify the integrity of the database backup media. However,
only one method is guaranteed to accurately reflect the
integrity of the backup media; The actual restoration or
recovery of a database. Anything else is merely verifying the
media, not the contents of the media.
NOTE
Never, ever, test the database
restoration procedures by first deleting the source database.
Restore or recover the database to a distinct file system,
preferably on a separate machine.
Speeding Up Backups
For backup to tape, the multi-threaded
backup is optimized to proceed at whatever speed the hardware
configuration will allow. Allocating more tape drives
decreases backup times proportionally. You can also save some
operator time by formatting all of your tapes ahead of time.
For backing up a database directly to disk,
use the single-threaded backup utility, and set up RMS to
have a large extend size. This reduces both fragmentation and
the need to repeatedly expand the backup file. For example,
RMS can allocate 50,000 blocks much faster than it can
allocate 5000 blocks 100 times. This extend time becomes
significant when compared to the amount of time for a simple
write operation. You want to minimize the number of times the
file extends while not setting it too large for your
available disk space. If you can accurately calculate the
final size of the resulting backup file, you can use the SET
RMS_DEFAULT command in DCL to optimize the default extend
size.
For example, assume that a backup file is
1,005 blocks. You cannot directly specify the initial size of
the file, but it is typically quite small (typically 3 blocks
in most VMS systems). If you set the default extend to 1,000
blocks, the backup utility starts writing, gets to the
end-of-file (after 3 blocks) and extends the file to 1,003
blocks. The backup utility again gets to the end-of-file and
expands the 1,003 block file by 1,000 blocks, giving 2,003
blocks. When the database backup finishes, the file is
truncated down to 1,005 blocks, but while the file is being
written, 998 wasted blocks are allocated.
It is sometimes possible to estimate the
backup file size by analyzing your database to determine how
full each storage area is. For example, use the space
utilization summary pages generated by VAX DBMS
DBO/ANALYZE/NOSET command.
NOTE
This method does not always provide a
valid estimate. For example, consider what happens if you
have a storage area with 100 pages, and there is one record
on each page: Every page will be backed up. If you take that
same area and put all 100 records on one page, only that one
page will be backed up. The storage area is just as full in
both cases.
A simpler, and more reliable, method is to
check the final size of the backup file the next time you
perform a full database backup. If you set up your backups to
run from a command procedure, be sure to consider future
growth in your calculations.
Contents
Is
Database Performance
Slowing You Down?
Software Concepts International
technical staff can help you identify and eliminate database
bottlenecks!
Haven't you waited
long enough?
Call (603) 879-9022
and stop the wait!
ReloadExpert™ v1.1
New England Software Concept's ReloadExpert
version 1.1
offers many new features at a new lower price!
SCI has enhanced its ReloadExpert
optimizing reload script generator with the introduction of
version 1.1. This new version offers enhanced functionality
at a new lower price!
The Need for Reloads...
Periodically reloading your VAX DBMS
database is needed to maintain optimal database performance.
Productivity Tool
ReloadExpert
automates the reload process and ensures that your reload
strategy is truly optimized and always up-to-date! Imagine creating a truly optimized VAX DBMS reload
routine in a matter of minutes!
Minimize Time to Reload
Reloading a production database can be a
time-consuming process. You cannot afford an inefficient
reload strategy.
ReloadExpert minimizes the time
required to reload your database by producing truly optimized
reload scripts and by making optimal use of system resources.
ReloadExpert recommends optimal SYSGEN and UAF values
for reloading your database based on your system's
resources.
Accurate Reload Scripts
Your business depends on the data in your
database. An erroneous reload script can seriously damage the
integrity of your data. With ReloadExpert you can rest
easy knowing that every set-type in your database has been
carefully analyzed to ensure that the proper relationships
will be maintained during the reload process.
Automatic Restarts
Imagine creating a truly
optimized VAX DBMS reload routine
in a matter of minutes!
Should your system fail during an unload or
load processes, ReloadExpert can automatically
restart the unload or load at the point of failure, thus
saving you from restarting the reload from the beginning!
Complete Set of Tools
ReloadExpert
generates everything you need to efficiently reload your
database, including:
Load and Unload Format Language File
(LFL/UFL); defines the record format of the unload files, and
all set-significant information needed to maintain proper set
relationships.
- Load Sequence Language files (LSL);
defines the order in which record-types will be
loaded into the database.
- Unload Sequence Language files (USL);
defines the order and method by which your database
will be unloaded. Moves the appropriate
set-significant information to the output records to
maintain proper set relationships during the reload.
- Reload Control file (CTL); defines the
location and order in which the unload and load
sequence files are executed.
- Optimized Reload Subschema (DDL); ReloadExpert
even produces a subschema that is optimized for the
reload process.
- Unload your source database with a
single command!
- Load you target database with a single
command!
Tape Support
ReloadExpert
now supports the use of tape for reloading your database. You
can unload portions of your database directly to tape, and
then use the same tape during the load process.
Parallel Unloads
An optimized reload routine means less
down-time. ReloadExpert allows you to unload portions
of your database in parallel, thus reducing the overall
reload time.
Prolog and Epilog routines
You can now write your own routines that ReloadExpert
will execute immediately before and/or after each unload and
load sequence to customize your reload. You can write
routines to mount tapes, backup portions of your database,
alter the placement of the unload RMS files, and many other
uses!
User Defined Keys
Some database schema definitions do not
guarantee unique keys for owner-record occurrences. ReloadExpert
allows you to augment the schema definition by specifying
User Defined Key (UDK) files that ReloadExpert uses to
generate the reload scripts. In fact, ReloadExpert
will even generate a template UDK file that can be easily
edited to meet the unique needs of your database.
DBO LOAD facility
ReloadExpert generates scripts for
the DBO/UNLOAD and DBO/LOAD facility.
This provides the following advantages:
- DIGITAL support of the underlying
reload facility
- Enters the DBCS at a lower level than
DML programs, thus is more efficient than traditional
DML programs.
Licensing
The ReloadExpert distribution
license is only $4,995. Additional nodes can be licensed with
the right-to-copy license for only $2,495. These are one-time
license fees, and are independent of node-size.
*VAX DBMS is a trademark of
Digital Equipment Corp.
*ReloadExpert is a trademark of
Software Concepts International
FOR
MORE INFORMATION
New England
Software Concepts
Nashua, NH 03063
(603) 879-9022
Contents
| Phone |
USA Toll-free (Sales) |
Fax |
e-mail |
| (603) 879-9022 |
1-888-CODASYL |
(603) 879-9023 |
Copyright © 1996 Software Concepts
International Inc. All rights reserved.