|
![]() |
||||||||||||||||||||||||||||||
|
![]() |
||||||||||||||||||||||||||||||
3 Format Structure3.1 General StructureUNIMARC is a specific implementation of ISO 2709, an international standard that specifies the structure of records containing bibliographic data. It specifies that every bibliographic record prepared for exchange conforming to the standard must consist of:
- a DIRECTORY consisting of a 3-digit tag of each data field, along with its length and its starting character position relative to the first data field, - DATA FIELDS of variable length, each separated by a field separator, with the following layout:
ISO 2709 further specifies that the data in fields may optionally be preceded by indicators and subdivided into subfields. UNIMARC, as an implementation, uses the following specific options allowed under ISO 2709. 3.2 Record LabelISO 2709 prescribes that each record start with a 24-character Record Label. This contains data relating to the structure of the record, which are defined within the standard ISO 2709, and several data elements that are defined for this particular implementation of ISO 2709. These implementation-defined data elements relate to the type of record, its bibliographic level and position in a hierarchy of levels, the degree of completeness of the record and the use or otherwise of ISBD or ISBD-based rules in the preparation of the record. The data elements in the Record Label are required primarily to process the record and are intended only indirectly for use in identifying the bibliographic item itself.3.3 DirectoryFollowing the Record Label is the Directory. Each entry in the Directory consists of three parts: a 3-digit numeric tag, a 4-digit number indicating the length of the data field and a 5-digit number indicating the starting character position. No further characters are permitted in a Directory entry. The Directory layout is as follows:
The second segment of the Directory entry gives the number of characters in that field. This includes all characters: indicators, subfield identifiers, textual or coded data and the end of field marker. The length of field is followed by the starting character position of the field relative to the first character position of the variable field portion of the record. The first character of the first variable field is character position 0. The position of character position 0 within the whole record is given in character positions 12-16 of the Record Label. The tag is 3 characters long, the 'length of the data' fills 4 characters and the 'starting character position' fills 5 characters. After all of the 12-character directory entries corresponding to each data field in the record, the directory is terminated by the end of field marker IS2 of ISO 646 (1/14 on the 7-bit code table). For an example of a directory illustrating its position in relation to data fields see the complete examples in Appendix L. The directory entries should be ordered by the first digit of the tag, and it is recommended that order by complete tag be used where possible. The data fields themselves do not have a required order as their positions are completely specified through the directory. 3.4 Variable FieldsThe variable length data fields follow the directory and generally contain bibliographic as opposed to processing data.Data (Control) Field (00-) layout:
Data Field (01- to 999) layout: Indicators Subfield Identifier Other Subfields
Tags are not carried in the data fields but appear only in the directory, except for tags in embedded fields (see 4-- block). Fields with the tag value 00- (e.g. 001) consist only of the data and an end of field character. Other data fields consist of two indicators followed by any number of subfields. Each subfield begins with a subfield identifier that is composed of a subfield delimiter, ISl (1/15 of ISO 646), and a subfield code (one alphabetic or numeric character) to identify the subfield. The subfield identifiers are followed by coded or textual data of any length unless stated otherwise in the description of the field. The final subfield in the field is terminated by the end of field character IS2 (1/14 of ISO 646). The last character of data in the record is followed as usual by the end of field character IS2 which in this instance is followed by the end of record character IS3 (1/13 of ISO 646). 3.5 Mandatory FieldsThe following is a list of fields that must be present in the UNIMARC record: 001* RECORD IDENTIFIER The fields marked by an asterisk (*) must be present in every record, without exception. However, when records are converted into UNIMARC, the remaining fields in the list above are not regarded as mandatory if meaningful fields cannot be produced directly or by computer algorithm. For example, 101 should be omitted if the record would otherwise contain nothing more than 101 |#$a|||. The documentation should inform the user of the omission (see also Appendix K). 3.6 Length of RecordsThe length of records, which is limited by the format to 99,999 characters, is a matter of agreement between parties to an exchange.3.7 Record LinkingIn practice there are situations when it may be desirable to make a link from one bibliographic entity to another. To give two examples: when a record describes a translation, a link may be made to the record that describes the original; or a link may be made between records relating to different serial titles when a change of name occurs. A technique is provided in UNIMARC for making these links. A block of fields (the 4-- block) is reserved for this purpose and more information can be found at the description of those fields and in the introduction to the 4-- block.A linking field will include descriptive information concerning the other item with or without information pointing to a separate record that describes the item. A linking field is composed of subfields, each of which contains a UNIMARC field made up of tag, indicators, and field content including subfield markers. Note that these embedded fields are not accessible through the Directory, since only the entire linking field has a directory entry. The tag of the linking field denotes the relationship of the item identified within it to the item for which the record is being made. 3.8 Character SetsFor data interchange in UNIMARC, ISO character set standards should be used. The record label, directory, indicators, subfield identifiers, and code values specified in this document should be encoded using the control functions and graphic characters of ISO 646 (IRV), which is considered the default set for the record. The code extension techniques specified in ISO 2022 are used when multiple sets are required in a record. Character positions 26-29 and 30-33 of subfield $a in field 100 are used to designate the default and additional graphic character sets used in the record. Character sets should be those established or registered by ISO but may also be the subject of agreement by parties to an exchange.The control functions of ISO 646 are permitted in the UNIMARC record and the following are always used: ISl of ISO 646 (position 1/15 in the 7-bit code table): the first character of the two-character subfield identifier. IS2 of ISO 646 (position 1/14 in the 7-bit code table): field separator, found at the end of the directory and each data field. IS3 of ISO 646 (position 1/13 in the 7-bit code table): record separator, found at the end of each record. When additional character sets are needed, the control function ESC of ISO 646 is frequently used. Two control functions from ISO 6630 used for sorting are also allowed in UNIMARC data. Appendix J gives more information on character sets used with UNIMARC. 3.9 Repetition of DataThere are four possible situations where data could be repeated in different forms:Data appear in both coded and textual, display and non-display forms. Where possible both forms of data should appear in the record even if the information is held only once in the source format. The document contains the same information in different languages. The International Standard Bibliographic Descriptions specify when and how parallel data should be transcribed from the item. This is catered for in UNIMARC by the use of different or repeated subfields. For examples, see field 200. There is more than one language of cataloguing for a multilingual audience. The use of more than one language of cataloguing in, say, notes fields, is useful and in some cases mandatory within a domestic format. For international exchange purposes this facility is less acceptable: unless a receiving agency caters for the same languages as those of the source format it will need to strip out all languages except one. For that reason each record on a UNIMARC exchange tape should have only one language of cataloguing, other languages being catered for by separate records or even separate exchange tapes. The same information is repeated in different scripts to cater for variations of sophistication of output. Ideally a catalogue entry should record a document using the script of the document. This is not always possible. For that reason, agencies with the facilities should be able to record both original and transliterated versions in the same catalogue entry to allow the selection of the best possible option by receiving agencies. The mechanism is described in paragraph 3.10 below. 3.10 Treatment of Different ScriptsRecord alternative graphic representations/scripts in fields 001-099 and 200-899 using content designators appropriate to the data being recorded. All UNIMARC fields will be considered repeatable for recording alternative graphic representations or scripts whether or not so listed in the body of the text. Those fields listed as not repeatable should be used no more than once per alternative graphic representation/script included in the record.This technique is intended to provide a mechanism for recording romanizations, transliterations and alternative scripts or orthographies prepared by the cataloguing agency according to standard tables, rules, guidelines etc. In each field repeated for the purpose of recording an alternative graphic representation/script, include both subfield $6 (Interfield Linking Data) and, if appropriate, subfield $7 (Alphabet/Script of Field). Specific instruction for the use of $6 and $7 are as follows. $6 Interfield Linking Data
Data entered in subfield $6 is recorded as follows:
$6/0 Linking explanation code
a = alternative graphic representation/script z = other reason for linking $6/1-2 Linking number
$6/3-5 Tag of linked field
$7 Alphabet/Script of Field
This subfield contains the code for the alphabet and/or script for the chief contents of the field. Code values are those defined for field 100 character positions 34-35 Script of title. This subfield would usually be omitted in those fields with the same alphabet/script as that coded in 100 character positions 34-35. This subfield should be placed directly before the first data subfield (e.g. $a) of the field in which it is carried. It will usually follow a subfield $6 unless no parallel field exists, in which case there will be no $6. Following the provisions of ISO 2022 Section 1, which states that "The [character set] codes ... are designed to be used for data that is processed sequentially in a foward direction", it is assumed that characters are input in logical order. Where data, such as Arabic or Hebrew, is input in an order that supposes that it will be read right-to-left, this is indicated by '/r' after the code. ISO 2022 Section 1 also states that "Use of these codes in strings of data which are processed in some other way, or which are included in data formatted for fixed-length record processing, may have undesired results or may require additional special treatment to ensure correct interpretation". (EX 4). Optional. Not repeatable. Examples EX 1 EX 2 EX 3 EX 4 | |||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||