There are times when databases, particularly large ones, need cleaning up. One aspect which should be given attention is the deletion of duplicate records. These can cause problems with the integrity of the database. For example, information could be split so that parts of it appeared on different records which were ostensibly the same. Mail-outs could include more than one copy being sent to the same addressee.
Defining a Duplicate
But what are "duplicate records"? Well, that's for the database
manager to decide. For example, duplicate records may exist in a
simple contacts database if the first name, last name and telephone
number are the same in more than one record. Duplicate records may
exist in a medical database if the first name, last name and birth
date are the same. A motor mechanic's database may contain duplicate
records if the vehicle registration number is the same.
To check for duplicate records of names and addresses, for example,
the database must have a single field that contains enough
information to uniquely identify each person and address. Obviously,
a field containing the last name, or even the full name, is usually
not adequate, and the address field is too prone to inconsistent data
entry to be a good choice. However, a field containing the telephone
number would be satisfactory as a distinguishing identifier.
Setting up Fields
In one form or another, the duplicate criterion will be based on the
concatenation of a number of fields. Let's assume that we have a
database for which we've decided on the criterion. It's a simple
contacts database, with fields for names, addresses and telephone
numbers. Figure 1 shows the fields in our file.

Of particular importance is the field "Duplicate criterion",
because it's defined as a calculation (with "text" result) based on
the concatenation of three fields which we've decided will identify
duplicity. The field definition is:
First Name & " " & Last Name & " " & Home Phone
The concatenation includes spaces only for the purpose of making it
easier to read the contents of the field - they're not essential!
We'll also need the last two fields, "gDuplicate" and "Mark?", when
doing the search for duplicates.
Figure 2 shows one of the records in our file.

The fields "Duplicate criterion", "gDuplicate" and "Mark?" would
not normally be on our main layout - we'd usually hide them, or place
them on another layout, but in this example it's convenient to have
them visible.
Writing the Script
We now need to find a way of comparing the data in the "Duplicate
criterion" field to establish which records have that data more than
once. FileMaker Pro has its own in-built facility for finding
duplicates, but we'll proceed here to do it our way because it's a
useful exercise in the use of scripting. The routine we'll use is set
out in Script 1, which we'll now set up and examine in some
detail.
Script Step / Options / Script Parameters
1 Set Error Capture
On
2 Go to Layout Layout #1
3 Find All
4 Replace Mark?
No dialog, Replace data:
Calculation: "NO"
5 Sort Sort Order:
Restore sort order: Duplicate criterion (Ascending)
No dialog
6 Go to Record/Request/Page
First
7 Set Field gDuplicate
Calculation: Duplicate criterion
8 Loop
9 Go to Record/Request/Page
Next
Exit after last
10 If Calculation:
Duplicate criterion = gDuplicate
11 Go to Record/Request/Page
Previous
12 Set Field Mark?
Calculation: "YES"
13 Go to Record/Request/Page
Next
14 Set Field Mark?
Calculation: "YES"
15 Else
16 Set Field gDuplicate
Calculation: Duplicate criterion
17 End If
18 End Loop
19 Enter Find Mode
20 Paste Literal Mark?
Select entire contents: YES
21 Perform Find
22 Sort Sort Order:
Restore sort order: Duplicate criterion (Ascending)
No dialog
23 Go to Layout original layout
In the "Script" pop-up menu, choose "ScriptMaker...". In the
"Script Name:" box in the resulting dialogue box, type "Find
Duplicate Records" and click on the "Create" button. At the next
screen, click on the "Clear All" button, and start entering the
script steps shown in Script 1.
Adding script steps is easy - just find the one you need in the list
provided and double-click on it. Then choose the options associated
with the step you've chosen. For example, if you choose the step "Go
to Layout", you can choose the name of the layout from the "Specify:"
pop-up menu. If you choose the step "Set Field", you can choose the
field to be set, and what it is to be set to, by clicking on the two
buttons presented on the screen.
In our script, Line 1 suppresses normal FileMaker Pro alert messages
- I invariably start my scripts with this line. "Set Error Capture"
suppresses normal FileMaker Pro alert messages. Once a FileMaker Pro
script is running, FileMaker Pro sends error-handling information to
the "Status (CurrentError)" function. By checking the status of the
"Status (CurrentError)" function, you can test each important step in
your script. By using the "Status (CurrentError)" function with the
control script steps, you can make sure your script performs
correctly. When you decide to suppress alerts, it's important that
you anticipate as many problems as possible, and that you include
clear directions to the user as to what to do when an error condition
is encountered.
Line 2 makes sure we go to the layout on which the fields "Duplicate
criterion", "gDuplicate" and "Mark?" are located. Lines 3 and 4 cause
the data in the field "Mark?" to be changed to "NO" in all records.
The purpose of this is to ensure that all records are marked "NO" so
that there are none which may be inadvertently still set to "YES"
from a previous search for duplicates (or any other search we may
perform which makes use of the field "Mark?").
Figure 3 shows the dialogue box for this step.

Note the choice of the third radio button, which will automatically cause the screen in Figure 4 to be revealed.

Also note that "NO" is text, and therefore needs to be in
quotation marks.
Line 5 sorts all the records by the field "Duplicate criterion" so
that records with the same "Duplicate criterion" are grouped
together. The aim of the script is now to progressively compare the
data in the "Duplicate criterion" field with the data which is in
that field in the previous record.
Starting with the first record, the script copies the value from the
"Duplicate criterion" field into the global field "gDuplicate". The
script then goes to the next record and compares the value in
"Duplicate criterion" with the value in "gDuplicate":
- If the values match, the record is a duplicate, and the script puts
"YES" in the "Mark?" field .
- If the values don't match, the script copies the new "Duplicate
criterion" value into "gDuplicate", and doesn't change the "Mark?"
field.
The script repeats this procedure until it reaches the last record in
the file. This is accomplished in Lines 6 to 18 inclusive, making use
of an "If" step inside a "Loop" routine. On the screen, these steps
would appear as in Figure 5.

In Lines 19 to 22, the records marked "YES" in the preceding steps
are found and sorted, and Line 23 returns the user to the layout from
which the script was initiated.
Running the Script
Now we can run the script. It can be initiated from the "Script"
pop-up menu, or from a button on the data layout. Having found the
sorted records which satisfy the duplicate criterion, as in Figure 6,
it's possible to proceed to examine them one by one, and to determine
whether there are any true duplicates which can be deleted, or
whether information needs to be transferred from one to another
before doing so.

Remember that deletion is final - you can't recover deleted
records! Always have a current backup of your file available just in
case you make an mistake.
As a matter of interest, Script 1 takes about ninety seconds to run
through my contacts database of about one-thousand records on my
Performa 6200 which has a 75 Mhz processor. How long did it take for
you?
The Alternative Solution
I mentioned earlier in this article that FileMaker Pro has an
in-built facility for finding duplicates. It involves the use of the
exclamation mark ("!") as the "Find" criterion. We still need to use
the field "Duplicate criterion" as the field on which to sort and
search, but we don't need to use the field "gDuplicate".
To write the script, it's simply a matter of duplicating our previous
script and calling it, say, "Find using "!"". We can then edit the
copy by choosing unwanted steps, one at a time, with the mouse button
and hitting the "Delete" key. The one change we need to make is to
set the "Find" criterion to "!". The resulting script would look like
Figure 7.

Try running it - it's much quicker!
You can download a copy of this article in PDF format from the appropriate link below. To view the file you'll need Acrobat or Acrobat Reader from Adobe Systems Inc. Any FileMaker Pro files used in this article will also be downloaded.
|
|
|
|
|
| |
|
|
|
|
|
|