4GL Problem for Las Vegas Fall 1990 DECUS Symposium
           
           
                                  Table of Contents
           
           
          1. Introduction ......................................          1
          2. Distribution of data ..............................          1
          3. Description of data ...............................          2
          3.1 MASTER Record ....................................          2
          3.2 DETAIL Record ....................................          3
          3.2.1 DETAIL_BINARY Record ...........................          4
          3.2.2 Distribution of DETAIL Records .................          4
          3.3 FUNDCODE Table ...................................          6
          4. Description of Problem ............................          7
          5. How data was generated ............................         10


                       Instructions for Las Vegas (Fall 1990)
                                     4GL Problem
           
                                         by
                      Joe H. Gallagher, Ph. D., Managing Editor
                           Wombat Examiner & 4GL Dispatch
                                816-276-4068 days    
                                913-894-9550 evenings
           
          1. Introduction
           
          The 4GL  Problem  Panel  at  the  New  Orleans  1990 Spring DECUS
          Symposium was a very well  received  presentation.  A 4GL Problem
          Panel is scheduled for the  Las  Vegas 1990 Fall DECUS Symposium.
          The problem for Las Vegas builds  on and extends the problem from
          New Orleans.  The  problem  for  New  Orleans was to browse, both
          forward and backwards, through  a  set of "people oriented" data.
          The problem for Las Vegas  extends  the  browse to include both a
          MASTER set and a DETAIL set of  data.   The data is modelled on a
          MASTER set of donors with  the  DETAIL  records of the amount and
          date of their donations.  In  addition to browsing the data base,
          certain processing (ADD, MODIFY, and  DELETE) is required for the
          solution of the problem.  One  optional report is also part of of
          the solution.
           
          2. Distribution of data
           
          Data for the Las Vegas (Fall  1990) 4GL Problem is distributed on
          one 1600 bpi magtape in  VMS  Backup  format  or on two 5.25 inch
          high density (1.2  Meg)  IBM  diskettes.    Other  formats may be
          available by calling me at one of the numbers above.
           
          The VMS Backup  save  set  name  is  VEGASPROB.BCK;  the tape was
          created with the command:
           
                  BACKUP/VERIFY/LOG *.* MSA0:VEGASPROB.BCK/SAVE_SET
           
          The files in this distribution are:
           
                   Filename                   Description
           
              AAREADME.TXT        This   file   containing   the   complete
                                  description of the  data and the problem.
                                  This file was created using MASS11 (so it
                                  is actually in stream format).
              MASTER.DTR          The DATATRIEVE  record  definition of the
                                  master file.
              MASTER.DAT          The  master  record  data;  5518 records,
                                  each record  is  75  bytes  long,  in 820
                                  blocks.
              DETAIL.DTR          The DATATRIEVE  record  definition of the
                                  DETAIL file.


                                        - 1 -


              DETAIL_BINARY.DTR   The DATATRIEVE  record  definition of the
                                  DETAIL with DATE_OF_DONATION in the usual
                                  64-bit binary format.    This file is not
                                  on the IBM Diskette distribution.
              DETAIL.DAT          The detail record  data  - 36027 records,
                                  each record  is  21  bytes  long, in 1549
                                  blocks.       (In    the   IBM   diskette
                                  distribution, this file  is on the second
                                  diskette.)
              DETAIL_BINARY.DAT   The detail record  data  - 36027 records,
                                  each record  is  23  bytes  long, in 1689
                                  blocks.   This  file  is  not  on the IBM
                                  Diskette distribution.
              FUNDCODE.DTR        The  record  definition  of  the FUNDCODE
                                  table.
              FUNDCODE.DAT        A table which  maps  the FUND-CODE (a one
                                  digit number 0 to  9) to the abbreviation
                                  of the name of the FUND and the full name
                                  of the FUND.  Ten records, each record is
                                  56 bytes long, in 2 blocks.
           
          The order of the  files  on  a  particular media may be different
          from the order specified above.
           
          3. Description of data
           
          The data consists of three  files:  the MASTER record, the DETAIL
          record, and the FUNDCODE translation table.
           
          3.1 MASTER Record
           
          There are 5518 records  in  the  MASTER  file;  each record is 75
          bytes long.  The MASTER record contains only ASCII text; there is
          no binary data in the  file.    The data is, therefore, printable
          and transferable with KERMIT.  The record layout is:
           
          01 MASTER_REC.
             03 ID_NUMBER         PIC 9(7).
             03 LAST_NAME         PIC X(15). ! may contain " ", "'", or "-"
             03 FIRST_NAME        PIC X(10).
             03 MIDDLE_INITIAL    PIC X(1).
             03 TITLE_OF_ADDRESS  PIC X(4).
             03 SENIORITY         PIC X(4).
             03 ADDR              PIC X(15).
             03 CITY              PIC X(12).
             03 STATE             PIC X(2).
             03 ZIP_CODE          PIC 9(5).
          ;
          The MASTER file is in  the  order  of  the ID_NUMBER.  The record
          layout for the MASTER file is contained in the file MASTER.DTR.
           
          The values in ID_NUMBER are random but can span the full range of
          0000000 to 9999999.  That  is,  there  are 5518 unique numbers in
          the 7 digit range.
           
                                        - 2 -


          The LAST_NAME field may contain  a  space, single quote, or dash.
          The data in the  LAST_NAME  field  has  been revised from the New
          Orleans data.  The distribution of names more nearly approximates
          the  distribution  of  names  in  the  United  States.    In this
          distribution, the names  WILLIAMS,  SMITH, JOHNSON, JONES, BROWN,
          MILLER,  DAVIS,  and   WHITE   have   the  highest  frequency  of
          occurrence.  Names of Asian  and Scandinavian origin are probably
          under represented.    Names  of  English,  Irish, Scotch, German,
          Italian, and eastern European origin are well represented.  Names
          of  American  Blacks   and   Hispanic   are  also  proportionally
          represented.  Most other ethnic  groups  are represented by a few
          names.
           
          Other fields (TITLE_OF_ADDRESS, SENIORITY, ADDR, CITY, STATE, and
          ZIP_CODE) are unchanged from the New  Orleans data.  In a certain
          number of  randomly  selected  records,  two  fields (FIRST_NAME,
          MIDDLE_INITIAL) have been slightly changed.   In some records the
          middle initial has been set  to  a  blank.  In other records, the
          middle initial has been  move  into  the  first part of the first
          name and the  middle  initial  has  been  set  to  a blank.  (For
          example: if the first  name  were  "JOHN"  and the middle initial
          were "L", the new  first  name  is  not  "L. JOHN" and the middle
          initial is " ").
           
          3.2 DETAIL Record
           
          There are 36027 records  in  the  DETAIL  file; each record is 21
          bytes long.  The DETAIL  record contains only ASCII numeric data;
          there is no binary data  in  the  file.   The data is, therefore,
          printable and transferable with KERMIT.  The record layout is:
           
          01 DETAIL_REC.
             03 ID_NUMBER          PIC 9(7).
             03 DATE_OF_DONATION   PIC 9(6).       ! in MMDDYY order
             03 DATE REDEFINES DATE_OF_DONATION.
                05 MONTH           PIC 9(2).
                05 DAY             PIC 9(2).
                05 YEAR            PIC 9(2).
             03 FUND_CODE          PIC 9(1).
             03 AMOUNT_OF_DONATION PIC 9(5)V99 .
          ;
           
          The data in the DETAIL record  is  SORTED by date of the donation
          (ascending date order) and  then  by  ID_NUMBER if there are more
          than one donation on a given date.  
           
          The ID_NUMBER span  can  span  the  range  of 0000000 to 9999999.
          However, there are no orphan records.   Every DETAIL record has a
          master record.  However, there are many (1199 to be exact) MASTER
          records without a DETAIL record.
           
          The DATE_OF_DONATION is in MMDDYY format.  The dates are in 1987,
          1988, and 1989.
           
                                        - 3 -


          The FUND_CODE is a one  digit  number  between  0 and 9.  The ten
          different digits are approximately evenly distributed.  One digit
          was used in order to same space.  The FUND_CODE may be translated
          into the name of the fund using the FUND_CODE translation table.
           
          The format of the AMOUNT_OF_DONATION  is  5 digits to the left of
          the decimal and 2 digits to  the  right of the decimal.  However,
          the decimal point is implied and  does  not  use up a byte in the
          file.  For example, if the data in AMOUNT_OF_DONATION is 0012345,
          the amount of the donation is  $123.45.  The DETAIL file contains
          only digits; there are no alpha characters in the file.
           
          3.2.1 DETAIL_BINARY Record
           
          On the  VMS  Backup  version  of  the  distribution,  there is an
          alternate  version  of   the   data.      In  this  version,  the
          DETAIL_BINARY version, the  DATE_OF_DONATION  is expressed in the
          usual 64-bit binary  format.    There  are  36027  records in the
          DETAIL_BINARY  file;  each  record   is   23  bytes  long.    The
          DETAIL_BINARY record contains  binary  data; therefore, this file
          is not printable and nor transferable with KERMIT (with out using
          8-bit binary mode).  The record layout is:
           
          01 DETAIL_BINARY_REC.
             03 ID_NUMBER          PIC 9(7).
             03 DATE_OF_DONATION   USAGE IS DATE.
             03 FUND_CODE          PIC 9(1).
             03 AMOUNT_OF_DONATION PIC 9(5)V99 .
          ;
           
          3.2.2 Distribution of DETAIL Records
           
          The distribution of DETAIL records per MASTER records is given by
          the following table:
           
                        
                                        - 4 -


                        Number    Number     Total
                          of        of      number
                        detail    master      of
                        records   records   detail
                         per                records
                        master
                        
                           0         1199         0
                           1          996       996
                           2          364       728
                           3          487      1461
                           4          254      1016
                           5          187       935
                           6           96       576
                           7           73       511
                           8          229      1832
                           9          174      1566
                          10          135      1350
                          11          288      3168
                          12          418      5016
                          13            9       117
                          14           18       252
                          15           23       345
                          16           42       672
                          17           71      1207
                          18           97      1746
                          19            0         0
                          20            0         0
                          21            1        21
                          22            1        22
                          23            2        46
                          24            6       144
                          25            0         0
                          26            0         0
                          27            1        27
                          28            0         0
                          29            0         0
                          30            0         0
                          31            1        31
                          32            4       128
                          33           10       330
                          34           35      1190
                          35           98      3430
                          36          199      7164
                        
                        Totals       5518     36027
                        
           
                                        - 5 -


          3.3 FUNDCODE Table
           
          In order to save space in the DETAIL records, the fund to which a
          donation is given is indicated by  a  single digit in the range 0
          to   9.      This   FUND-CODE   may   be   translated   into  the
          FUND-ABBREVIATION or the  FUND-NAME  from  date from the FUNDCODE
          table.  The data in the table is:
           
          01 FUND_REC.
             03 FUND_CODE PIC 9.
             03 FUND_ABBREVIATION PIC X(5).
             03 FUND_NAME PIC X(50).
          ;
           
          FUND    FUND                            FUND
          CODE ABBREVIATION                       NAME
           
           0      UNWAY    United Way
           1      UNCA     United Negro College Appeal
           2      UNICF    UN International Children's Educational Fund
           3      AREDC    American Red Cross
           4      SALVA    Salvation Army
           5      INFAM    International Famine Relief
           6      UJA      United Jewish Appeal
           7      AMCHC    American Catholic Charities
           8      AEART    American Endowment for the Arts
           9      GMFND    The German-Marshall Fund of the United States
           
          The data is contained in  the  file  FUNDCODE.DAT.  There are ten
          records of 56 characters each.


                                        - 6 -


          4. Description of Problem
           
          -----------------------------------------------------------------
             The 4GL Problem for the Las Vegas 1990 Fall DECUS Symposium
                 Joe H. Gallagher, 4GL Solutions, Overland Park, KS
          -----------------------------------------------------------------
          The 4GL  Problem  Panel  at  the  New  Orleans  1990 Spring DECUS
          Symposium was a very well  received  presentation.  A 4GL Problem
          Panel is scheduled for the  Las  Vegas 1990 Fall DECUS Symposium.
          The problem for Las Vegas builds  on and extends the problem from
          New Orleans.  The  problem  for  New  Orleans was to browse, both
          forward and backwards, through  a  set of "people oriented" data.
          The problem for Las  Vegas  extends  the browse to include DETAIL
          records in addition to the  MASTER  record.  The data is modelled
          on a MASTER set of donors  with  the DETAIL records of the amount
          and date of their donations.    In  addition to browsing the data
          base, certain processing  (ADD,  MODIFY,  and DELETE) is required
          for the solution of the problem.   One report is an optional part
          of the solution.
           
          The DATA:
           
          The  MASTER  record  consists  of  ID_NUMBER  (7  digit  number),
          LAST_NAME,   FIRST_NAME,   MIDDLE_INITIAL   (may   be   missing),
          TITLE_OF_ADDRESS (MR, MS, MRS, MISS, etc), SENIORITY (JR, SR, II,
          III, etc), ADDRESS,  CITY,  STATE_CODE,  and ZIP_CODE (5 digits).
          The number of  DETAIL  records  and the total AMOUNT_OF_DONATIONs
          for a particular donor are  to  be  managed as part of the MASTER
          record.  There are 5518 MASTER records.
           
          The DETAIL records consists of ID_NUMBER, DATE_OF_DONATION (dates
          are in 1987, 1988, and 1989), FUND_CODE (1 digit number between 0
          and 9), and the AMOUNT_OF_DONATION.   The largest single donation
          is $99,999.999.  There are 36027 DETAIL records.
           
          A FUNDCODE table is provided  which  may be used to translate the
          FUND_CODE  to  the   FUND_ABBREVIATION   (5  characters)  or  the
          FUND_NAME (50 characters).  There are 10 FUNDCODE records.
           
          A complete and  detailed  description  of  the  MASTER and DETAIL
          records was published in September  1990 issue of the Newsletter,
          Volume 6, Number 1, Section DTR, pages 6-10.
           
          The Problem:
           
          Part 1. The Browse Phase
           
          The user may  enter  the  ID_NUMBER,  the  LAST_NAME, part of the
          LAST_NAME, or the  LAST_NAME  to  be  used  in  a SOUNDEX search.
          None, one, or  many  records  are  selected.    If no records are
          found, the user  is  notified  and  given  an opportunity to make
          another selection.  If one record  is found, control is passed to
          the second part of the problem.    If many records are found, the
          user may browse  forwards  and  backwards  through the records to
          choose one or none.    These  records  must  be "browsed" in sort

                                        - 7 -


          order by LAST_NAME,  FIRST_NAME,  MIDDLE_INITIAL,  ID_NUMBER.  If
          none is chosen,  the  user  may  try  again.    If one is chosen,
          control passes to the  second  part  of  the problem.  During the
          browse, the display shall  include  all information in the MASTER
          record (including the number of  DETAIL  records and the total of
          all donations) and all DETAIL  records or whatever subset of them
          will fit on the screen.   The  DETAIL records (or the subset that
          will fix on the screen) must be display by decreasing date.  That
          is, the most recent  donation  (if  any) must be displayed first.
          The abbreviation of the fund  name  should  be display.  The fund
          code may also be displayed.    At  no time should the contents of
          DETAIL records and MASTER records  be display on the screen which
          do not belong together.
           
          Part 2. The Processing Phase
           
          Once a MASTER record  has  been  chosen  by the browse phase, the
          user may then ADD a DETAIL record, MODIFY a DETAIL record, DELETE
          a DETAIL  record,  or  do  no  further  processing  of the record
          hierarchy.  The user may also delete the MASTER record and all of
          its DETAIL records.  No "orphan"  DETAIL records are allowed.  At
          the conclusion of an ADD, MODIFY,  or DELETE, the count of DETAIL
          records and the total amount of all donations must be UPDATED and
          the record hierarchy displayed  for  further processing (if any).
          When no  further  processing  is  required  on  the chosen record
          hierarchy, control is return to the  browse phase.  Note that the
          problem  DOES  NOT  require  that  MASTER  records  be  added  or
          modified.  The entry  of  the  FUND_CODE or the FUND_ABBREVIATION
          must be validated and help  should  be provided on possible valid
          entries.
           
          Part 3. Optional Reporting Phase
           
          An optional part of the problem  is  to  make a report of the top
          100 donors.   This  part  of  the  application  does  not have to
          integrate into the browse and  processing phases; in fact, it may
          be a totally separate application.
           
          The top 100 donors are to be reported (printed on a line printer,
          80 column wide) by  decreasing  total  amount of donations.  That
          is, the highest  donor  must  appear  first  in  the report.  The
          donors complete name should printed.  The punctuation of the name
          is critical.  Examples of the  format  of two detail lines in the
          report are:
           
                   MR. JOHN E. SMITH, JR.               $67,890.00
                   MISS MARY JONES                         $998.23
           
          Parts of the Problem Which are not Constrained:
           
          Because there are too many  details  in a complete application to
          describe in a fixed  amount  of  Symposia session time, there are
          many details which are not specified as part of the problem.  The
          problem solver may  choose  to  design  their application as best
          meets their 4GL in the following areas:

                                        - 8 -

           
            1. Loading the data is not  part  of  the problem.  Solvers may
               load the data any way  they  choose  and they do not need to
               report how they loaded their  data  since it is assumed that
               the application would have  been  developed in their 4GL and
               the data would have been entered one record at a time.
            2. The count of DETAIL records  and  the total of donations for
               each donor may be either  a  virtual  or a real field in the
               MASTER record.  The solver  must report how they have chosen
               to manage these two  fields.    If these fields are computed
               and actually  stored  in  the  MASTER  record, then database
               constraints must apply.
            3. The date of donations may be stored as MMDDYY or as a 64-bit
               binary VMS date data type.  This choice should be reported.
            4. The fund code may be  retained  or the DETAIL records may be
               converted to  the  abbreviation  of  the  name  of the fund.
               However, the abbreviation of  the  name  of the fund must be
               displayed during the browse phase.
            5. Adding and  modifying  MASTER  records  is  NOT  part of the
               problem.  Solvers may  develop  the application so that this
               is possible, but solvers should not report if their code has
               this capability.
            6. Solvers may  choose  which  fields  are  keys;  there are no
               restrictions on  the  choice.    However,  the solver should
               report which fields they have keyed.
            7. Solvers may choose to  have  the  user enter either the fund
               code or the abbreviation of  the  fund name when entering or
               modifying  DETAIL  records.    Either  may  be  entered, but
               entries must be validated.
           
          Reporting on Development of the Application:
           
          As part of  the  symposia  session,  the  problem solvers must be
          prepared to report the following:
           
            1. the experience of the developer with 3GL's and their 4GL
            2. what background knowledge of the user has been assumed
            3. what training is,  or  would  be  required,  for the user to
               operate the browse and processing phases of the application
            4. what  software  is  required  to  run  the  application (all
               licenses products)
            5. what part or parts of the  application is done in a 3GL (for
               example, the SOUNDEX algorithm could be done in MACRO)
            6. what part or parts of  the  application could not be done in
               their 4GL
            7. how long it took to develop each part of the application
            8. how long does it take  to  find  and display a single record
               which has 36 detail records for a system with a specific CPU
               and disk.
           

                                        - 9 -


          5. How data was generated
           
          -----------------------------------------------------------------
             Developing Applications in DATATRIEVE: Generating Test Data
                 Joe H. Gallagher, 4GL Solutions, Overland Park, KS
          -----------------------------------------------------------------
           
          Most of  our  efforts  in  developing  applications  are spent in
          thinking about and analyzing the problem or writing, testing, and
          debugging  procedures,  reports,  FMS  or  TDMS  screens, tables,
          domain definitions, and  record  definitions. When an application
          becomes  more  complex  (the   number  of  concurrently  accessed
          domains), when the  number  of  simultaneous users increases, and
          when remote data is  accessed  (data  accessed across a network),
          the ability to accurately  simulate the "real-world" with trivial
          data is no longer possible. In such situations it is necessary to
          create data to test  our  application  design and to evaluate the
          performance of our application.
           
          The best test data for  an  application, of course, is the "real"
          data. But there are many  instances  when the "real" data has not
          yet been acquired or may not  yet be available to the application
          developer. It is these instances  were one needs to generate test
          data that will be address by this article.
           
          The examples used  in  this  article  are  taken from the command
          files and procedures which  were  used  to generated the data for
          the  Las  Vegas  1990  Fall  Symposium  4GL  Problem.  A complete
          description of the Las Vegas 4GL  Problem will appear in a future
          issue of this newsletter.
           
          Types of Data
           
          With  respect  to  generation  of   data,  there  are  two  broad
          categories.   The   first   is   data   that   because   of   its
          non-mathematical nature or because there are a discrete number of
          occurrence among a very,  very  large number of possibilities can
          only be generated from lists of existing data. The second is data
          that is numeric, either continuous  or discrete, and which can be
          generated by mathematical methods.
           
          List Data
           
          The best example of first kind  of  data, list data, is first and
          last names. Mathematically there  a  billions more possible names
          than actually occur in  a  given language (English, for example);
          so there is no effective  way  to generate real names (as opposed
          to nonsense names) with a  random  number generator. The only way
          to get realistic names is to take names from a pre-existing list.
          One source of possible names  is the telephone book. John Babiarz
          used his local  telephone  book  to  generate  names for the 1990
          Spring New Orleans 4GL Problem. He used a little over 1100 unique
          names and used each one 4  times with four different first names.
          However, he did not make a complete or random sample of the names
          in the phone  book.  The  frequency  of  occurrence  of names was

                                        - 10 -


          peculiar, if  not  bazaar,  as  there  were  more  that 400 names
          beginning with "Z" in a set of 5518 names.
           
          I chose another approach. Since I work in a fairly large hospital
          and have some good contacts  in the Data Processing Department, I
          was able to secure  the  list  of  last  names of the most recent
          32000 hospital admissions.  This  list  of  names has some rather
          interesting properties. First there  were  about  20 names in the
          list whose length was more  that  15 characters. All but three of
          the names were due to  hyphenated names (the concatenation of two
          last names). All  three  of  the  "naturally"  long names were of
          Germanic origin. They were  16,  17,  and 19 characters long (and
          the 19 character one was  a  real mouth full). Because the master
          file in the problem set  had already specified that the LAST-NAME
          field was PIC X(15), I  discarded  all  last names longer than 15
          characters. Names  on  a  hospital  admitting  list  do  have one
          peculiar property. If a name  appears  on the list, then there is
          too high a probability that the next name on the list will be the
          same. The occurs  when  several  members  of  the same family are
          involved in a motor  vehicle  accident  and they are all admitted
          through the emergency  room  at  the  same  time. Because of this
          peculiarity,  I  did  remove  the  duplicates  if  they  occurred
          together. The set  of  names  did  have  a good representation of
          English, Irish,  Scotch,  German,  American  Black,  and Hispanic
          names with a few Scandinavian,  Asian, Middle Eastern, and Indian
          names. However, because the 4GL Problem was of project of the 4GL
          SIG, I check the names  against  the  list  of members of the SIG
          Steering Committee and if a  steering committee member's name was
          not present I added it.  Thus,  the  list  I used came from three
          sources:  more  than  1100   unique  names  from  John  Babiarz's
          telephone book, about 20  names  from  the SIG Steering Committee
          list who weren't already  on  the  list,  and more than 4000 more
          from the patient admissions'  list  at Research Medical Center to
          make a list  of  5518  names.  While  the  list  had  some of the
          properties of a typical list of American names (Smith, Jones, and
          Brown occurred 46,  37,  and  35  times),  there were still 67.4%
          (3718 of 5518) which  were  unique.  In  a typical list there are
          often only 33% to 40% unique names.
           
          Now with a list of 5518 last names in a domain NEWNAMES, I needed
          to load these names into the  MASTER domain. If the MASTER domain
          did not already have  values  in  the  other fields, I could just
          STORE the records. But to  move  the  first name from NEWNAMES to
          the first last-name in MASTER,  and  the second to the second and
          so on, there is a bit of a trick!
           
          The MASTER record is 
           

                                        - 11 -


          01 MASTER_REC.
             03 ID_NUMBER         PIC 9(7).
             03 LAST_NAME         PIC X(15). ! may contain " ", "'", or "-"
             03 FIRST_NAME        PIC X(10).
             03 MIDDLE_INITIAL    PIC X(1).
             03 TITLE_OF_ADDRESS  PIC X(4).
             03 SENIORITY         PIC X(4).
             03 ADDR              PIC X(15).
             03 CITY              PIC X(12).
             03 STATE             PIC X(2).
             03 ZIP_CODE          PIC 9(5).
          ;
           
          and the NEWNAMES record is
           
           
          01 NEWNAMES_REC.
             03 NAMES             PIC X(15).
          ;
           
          One would like to be able to have DATATRIEVE do something like
           
              ready MASTER modify
              ready NEWNAMES read
              for a in NEWNAMES
                  for next 1 b in MASTER
                      modify using b.last-name = a.names
           
          but, of course, the  "FOR  NEXT  1"  is  not legal syntax in DTR.
          Richard Copeland suggested something like
           
              ready MASTER write
              ready NEWNAMES read
              find MASTER
              declare size_of_master usage is integer.
              size_of_master = count of current
              declare x usage is integer.
              for a in NEWNAMES begin
                  x = running count
                  modify b in MASTER with 
                      FN$MOD(running count,size_of_master) = 
                      x using last-name = names
                  end
           
          which would work, but its performance would be poor.
           
          The solutions I used was:
           
              ready MASTER modify
              ready NEWNAMES read
              find c in MASTER
              for NEWNAMES begin
                  select next c
                  modify using LAST-NAME = NAMES
                  end 

                                        - 12 -

           
          which  apparently  violates  the  documented  restriction  that a
          SELECT may not appear within  a BEGIN-END block. However, in this
          case it works just fine and gives the desired result.
           
          Mathematically Generated Data
           
          The second type of data can  be,  but is not always, generated by
          some  mathematical  function  or   algorithm.   If  the  data  is
          continuous or almost continuous,  a  function with or without the
          use of a random number generator  is  often used. But if the data
          is discrete, many times  it  is  easier  to  develop a table or a
          frequency of occurrence for the values. 
           
          Discrete Values
           
          For the Las Vegas  4GL  Problem,  we  needed DETAIL records which
          would be associated  with  the  MASTER  records. The frequency of
          occurrence of  the  DETAIL  records  with  respect  to the MASTER
          records is given  by  the  values  in  the domain DISTRIBUTION. I
          actually used a spread sheet  to  fiddle  with the values so that
          the records would come out  to  exactly 5518. The three fields in
          the DISTRIBUTION domain are N,  the  number of DETAIL records per
          MASTER record, M, the number of MASTER records having this number
          of DETAIL records, and  Z,  total  number  of detail records (the
          product of N and M).
           

                                        - 13 -


                      N            M          Z
                   
                      0         1199          0
                      1          996        996
                      2          364        728
                      3          487       1461
                      4          254       1016
                      5          187        935
                      6           96        576
                      7           73        511
                      8          229       1832
                      9          174       1566
                     10          135       1350
                     11          288       3168
                     12          418       5016
                     13            9        117
                     14           18        252
                     15           23        345
                     16           42        672
                     17           71       1207
                     18           97       1746
                     19            0          0
                     20            0          0
                     21            1         21
                     22            1         22
                     23            2         46
                     24            6        144
                     25            0          0
                     26            0          0
                     27            1         27
                     28            0          0
                     29            0          0
                     30            0          0
                     31            1         31
                     32            4        128
                     33           10        330
                     34           35       1190
                     35           98       3430
                     36          199       7164
                   
                   Totals       5518      36027
                   
           
          To create  DETAIL  records  with  the  distribution  gives by the
          values in DISTRIBUTION,  the  following  skeleton set of commands
          and statements can be used:
           

                                        - 14 -


          ready DISTRIBUTION read
          ready MASTER modify
          define file for DETAIL;
          ready DETAIL write
          find a in MASTER
          for DISTRIBUTION begin
              repeat m begin
                  select next a
                  repeat n begin
                      store DETAIL using begin
                          id-number = a.id-number
                          . . 
                          end
                      end
                  end
              end
           
          and again the trick of  using  SELECT  NEXT  A in a collection is
          used to step through the MASTER domain.
           
          Continuous Uniform Random Numbers
           
          A  random   number   generator   makes   numbers  uniformly,  but
          pseudo-randomly, distributed on the  interval  0  <= x < 1. There
          are many kinds  of  random  number  generators  and many kinds of
          tests  for  them  as  well.  For  information  on  random  number
          generators  see  "Encyclopedia   of  Computer  Science",  Anthony
          Ralsont, Editor, Petrocelli/Charter, New  York, 1976, pp. 1192-7,
          or "The  Art  of  Computer  Programming"  Volume 2 (Seminumerical
          Algorithms). Addison Wesley,  Reading,  MA, 1969. The VAX-FORTRAN
          random  number   generator   has   properties   better  than  the
          additive-multiplicative generator and a  shift generator since it
          is apparently has properties of both types.
           
          A  self-seeding  routine  to   call  the  FORTRAN  random  number
          generator was described in Volume 4, Number 1, Section DTR, pages
          11-12 of the newsletter. The FORTRAN code is:
           
          !
          !       a self-initializing, FORTRAN interface for 
          !       DATATRIEVE to the VAX random number generator.
          !
                  real*4 function fn_random()
                  real*4 mth$random
                  integer*4 quad(2),seed
                  equivalence (quad,seed)
                  if (seed .eq. 0) then
                      call sys$gettim(quad)
                      end if
                  fn_random = mth$random(seed)
                  return
                  end
           
          The linkage to attach this function to DATATRIEVE is
           

                                        - 15 -


          ;
          ; FN$RANDOM- a pseudo-random number generator
          ;
          ; output is an F-floating pseudo-random number 0<= x < 1
          ;
          $DTR$FUN_DEF FN$RANDOM, FN_RANDOM, 0
              $DTR$FUN_OUT_ARG  TYPE = FUN$K_VALUE, DTYPE = DSC$K_DTYPE_F
              $DTR$FUN_NOOPTIMIZE
          $DTR$FUN_END_DEF
           
          One can then use  the  random  number  generator to create number
          distributed between LOW_VALUE and HIGH_VALUE by the expression
           
                  LOW_VALUE + fn$random * (HIGH_VALUE - LOW_VALUE)
           
          However if one needs numbers exponentially distributed between 10
          and 100,000, once could use the expression
           
                      FN$EXP(FN$LN(10.0)*(1.0 + 4.0*FN$RANDOM))
           
           
          Discrete Random Numbers
           
          To make integers (discrete numbers) rather than real (continuous)
          number, judicious use  of  FN$FLOOR  or  FN$NINT will convert the
          continues variable to a discrete one. However, if you need unique
          integer, you have to  keep  track  of which integers have already
          been used. For that, you need a domain and a table like
           
          DEFINE DOMAIN INDEX USING INDEX_RECORD ON INDEX.DAT;
          DEFINE RECORD INDEX_RECORD USING
          01 INDEX_REC.
             03 ID PIC 9(7).
          ;
          DEFINE TABLE INDEX_TABLE FROM DOMAIN INDEX
          ID : ID
          END_TABLE
           
          and code like
           
                  ready INDEX write
                  declare temp-id pic 9(7).
                  . . .
                  temp-id = 9999999 * fn$random
                  while (temp-id in index-table) begin
                      if temp-id eq 9999999 then begin
                          temp-id = 0
                          end
                      temp-id = temp-id + 1
                      end
                  store INDEX using id = temp-id
                  . . .
           
          to keep track of which numbers have already been used.
           

                                        - 16 -


          Normally Distributed (Gaussian) Numbers
           
          Normally  distributed  numbers  can  be  created  from  uniformly
          distributed  numbers   by   inverting   the   Gaussian  function.
          Unfortunately, there is no  exact closed-form expression for this
          inverse distribution. There is,  however, a good approximation to
          the needed function. The  FORTRAN  routine  FN_ZPROB gives such a
          result.
           
          !       A real function to invert the probability function.
          !       See Handbook of Mathematical Functions with Formulas, Graphs,
          !       and Mathematical Tables, Edited by Milton Abramowitz and 
          !       Irene A. Stegun, National Bureau of Standards, Applied
          !       Mathematics Series, 55, page 933.
          !
          !       input                0 =< rin  =< 1.0
          !       output          -9.262 =< rout =< +9.262
          !       
          !       If the input is uniformly distributed between 0 and 1,
          !       the output is normally distributed about zero with a
          !       standard deviation of 1 (except for the very ends of
          !       the distribution).
          !
          !       Because of the singularity at rin = 0 and rin = 1
          !       this function sets
          !
          !               0 <= rin <= 1.0E-20     rout = -9.262
          !            1 - 1.0E-20 <= rin <= 1.0  rout = +9.262
          !
          !       Thus, the tails of the distribution are "chopped" off
          !       at the very, very ends.
          !
                  real function fn_zprob(rin)
                  real*4 rin, rout, p, t
                  logical side    ! 0 to 0.5 -> TRUE; 0.5 to 1 -> FALSE
           
                  side = .TRUE.
                  p = rin
                  if ( p .gt. 0.5 ) then
                      side = .FALSE.      ! data is 0.5 to 1
                      p = 1.0 - p         ! invert the data
                      end if
                  if ( p .lt. 1.0E-20) then       ! is data too close to 0?
                      p = 1.0E-20                 ! yes, set to a small value
                      end if
                  t = sqrt( 2.0 * alog ( 1.0/p))
                  rout = t - (2.515517 + t * (0.802853 + t * 0.010328))/
                  1 (1.0 + t * (1.432788 + t * (0.189269 + t*0.001308)))
          !
          !       Size of error in this formula is
          !           | error(p) | < 0.0004.5
          !       However, near input of 0.5, output of formula can go
          !       slightly negative.
          !       May sure value does not go negative, here.
          !

                                        - 17 -


                  if (rout .lt. 0.0) then
                      rout = 0.0
                      end if
                  if (side) then          ! which side are we on?
                      rout = -1.0 * rout  ! 0 to 0.5 maps to -9.262 to 0.0
                      end if
                  fn_zprob = rout         ! return value
                  return
                  end
           
          The linkage to DATATRIEVE is
           
          ;
          ; FN$ZPROB- a normally distributed function
          ;
          ; input is an F-floating number   0<= z <= 1
          ; output is an F-floating number normally distributed
          ;
          $DTR$FUN_DEF FN$ZPROB, FN_ZPROB, 1
              $DTR$FUN_OUT_ARG TYPE = FUN$K_VALUE, DTYPE = DSC$K_DTYPE_F
              $DTR$FUN_IN_ARG  TYPE = FUN$K_REF,   DTYPE = DSC$K_DTYPE_F, ORDER = 1
              $DTR$FUN_NOOPTIMIZE
          $DTR$FUN_END_DEF
           
          The  function  combination  FN$ZPROB(FN$RANDOM)  can  be  used to
          normally and randomly distribute values  with a mean, MEAN, and a
          standard deviation, STD_DEV, with the expression
           
                        MEAN + STD_DEV * FN$ZPROB(FN$RANDOM)
           
          Generation of Test Data for the DETAIL Record
           
          All of  the  techniques  described  above  are  used  to generate
          records for the DETAIL domain. The record definition is:
           
           
          01 DETAIL_REC.
             03 ID_NUMBER          PIC 9(7).
             03 DATE_OF_DONATION   USAGE IS DATE.
             03 FUND_CODE          PIC 9(1).
             03 AMOUNT_OF_DONATION PIC 9(5)V99 .
          ;
           
          The variable ID_NUMBER is uniformly  distributed over the range 0
          to 9999999, but  each  value  is  unique. The DATE_OF_DONATION is
          normally distributed about a target  date which is one nth (where
          n is the number of  donations)  of the time between the beginning
          and ending donation dates. These  start  and stop dates depend on
          the number of donations given.  The  start date is determine by a
          complex CHOICE  OF  expression  which  depends  on  the number of
          donations. The standard  deviation  about  these  target dates is
          arithmetically distributed between one half  and one fifth of the
          mean time between donations. The donations are in 1987, 1988, and
          1989. Donation dates are  also  restricted  to be week days; that
          is, Saturdays  and  Sundays  are  not  allowed.  The FUND_CODE is

                                        - 18 -


          uniformly distributed between 0  and 9. The AMOUNT_OF_DONATION is
          exponentially distributed between $10.00  and $1,000.00 or $10.00
          and $100,000.00 depending  on  the  number  of donations. Ninety-
          eight percent of  the  donations  are  truncated to 3 significant
          figures; 2% of the donations have all digits.
           
          The DATATRIEVE code to generate the data in DETAIL is:
           
          !
          !   a command file to create new id numbers and to
          !   create DETAIL records for the Las Vegas 4GL Problem
          !
          declare start-date usage is date.
          declare stop-date usage is date.
          declare start-quad usage is quad.
          declare stop-quad usage is quad.
          declare delta usage is quad.
          declare std usage is quad.
          declare index-quad usage is quad.
          declare temp-quad usage is quad.
          declare temp-date usage is date.
          declare ln_of_10 usage is real.
          declare trunk usage is integer.
          ln_of_10 = fn$ln(10.0)
          !
          declare dollar-base usage is real.
          declare dollar-range usage is real.
          declare t usage is real.
          declare amount pic 9(6)v99 .
          declare tamount pic 9(6)v99 .
          declare fund pic 9.
          declare tfund pic 9.
          !
          stop-date = "12/31/89"
          stop-quad = stop-date
          !
          ready DISTRIBUTION read
          ready MASTER modify
          define file for DETAIL;
          ready DETAIL write
          define file for INDEX key=id;
          ready INDEX write
          declare temp-id pic 9(7).
          find a in MASTER
          !
          for DISTRIBUTION begin
              repeat m begin
                  temp-id = 9999999 * fn$random
                  while temp-id in INDEX-table begin
                      if temp-id eq 9999999 then begin
                          temp-id = 0
                          end
                      temp-id = temp-id + 1
                      end

                                        - 19 -


                  store INDEX using id = temp-id
                  select next a
                  modify using id-number = temp-id
                  if (n gt 0) begin
                      start_date = choice of
                      n bt 1 and 6 then "1/1/87"
                      n bt 7 and 12 then "1/1/89"
                      n bt 13 and 18 then "7/1/88"
                      n bt 19 and 24 then "1/1/88"
                      else "1/1/87"
                      end_choice
           
                      start-quad = start-date
                      delta = (stop-quad - start-quad)/n
                      index-quad = start-quad + delta/2
                      std = delta/(2.0 + 3.0*fn$random)
                      if ( n bt 1 and 6) then begin
                          dollar-base = 1.0
                          dollar-range = 4.0
                          end else begin
                          dollar-base = 1.0
                          dollar-range = 2.0
                          end
                      t = fn$random
                      t = t * t
                      tamount=fn$exp((dollar-base+t*dollar-range)* ln_of_10)
                      tfund = fn$floor(10.0 * fn$random)
                      repeat n begin
                          amount = tamount
                          if (fn$random le 0.05) then begin
                              t = fn$random
                              amount = fn$exp( (dollar-base +  t * dollar-range) *      
                              ln_of_10)
                              end
                          if (fn$random lt 0.98) then begin
                              trunk = fn$floor(amount/100.0)
                              if (trunk lt 1) then trunk = 1
                              amount = trunk * fn$floor(amount/trunk)
                              end
                          fund = tfund
                          if (fn$random le 0.15) then begin
                              fund = fn$floor(10.0 * fn$random)
                              end
                          temp-quad = index-quad + std * fn$zprob(fn$random)
                          temp-date = temp-quad
                          if (temp-date before "1/1/87") then begin
                              temp-date = "1/2/87"
                              end
                          if (temp-date after "12/29/89") then begin
                              temp-date = "12/29/89"
                              end


                                        - 20 -


                          temp-date = fn$date(fn$upcase(format 
                              temp-date using dd-mmm-yyyy |" 00:00:00.00"))
                          while (format temp-date using w(3) eq "Sat","Sun") begin
                              temp-date = temp-date + 1
                              end
                          store DETAIL using begin
                              id-number = temp-id
                              date-of-donation = temp-date
                              fund-code = fund
                              amount-of-donation = amount
                              end
                          index-quad = index-quad + delta
                          end
                      end
                  end
              end
           
          Perhaps  the  techniques  in  this  article  will  give  you  the
          knowledge and tools to generate data to test your application.
           

                                        - 21 -