Introduction

The numerous nationwide registers in Denmark provide valuable resources for epidemiological research [1]. A prerequisite for utilizing the registers to their full potential is the ability to link them and to track study participants over time with accurate accounting for censoring due to emigration or death. The Danish Civil Registration System (CRS) has this ability [2]. As a result, the entire Danish population can be considered a cohort for epidemiological research [2]. Initially, the CRS was only used to link registers, but the methodological development in epidemiology has facilitated its use in other ways. Of particular interest, the CRS facilitates sampling of comparison groups for use in different types of study design. These advanced ways of using the CRS in epidemiological studies have not previously been described systematically. Here we review the CRS and its use in epidemiology.

Setting

The Danish National Health Service provides tax-supported health care for the entire Danish population [3]. Tax revenues finance approximately 85 % of all health care expenses, including free access to general practitioners, hospitals, and outpatient specialty clinics, and partial reimbursement of prescribed medications [3]. Patients’ out-of-pocket expenditures cover the remaining costs (15 %) of medication and dental care [3].

The Danish health care system has three administrative levels [3]: (1) the state, which is responsible for legislation, national guidelines, surveillance, and health financing through the Ministry of Health and Prevention; (2) the five regions, which are responsible for the delivery of primary and secondary care; and (3) the 98 municipalities, which are responsible for school health, child dental care, home nursing, public health, prevention, and rehabilitation.

Register overview

History

Denmark has a total of 2,152 parishes, which is the ecclesiastical unit of an area committed to one pastor. For centuries, births, marriages, and deaths in Denmark were recorded in parish registers [46]. In 1924, the first registration of the entire Danish population was undertaken through an Act of Parliament [46]. As part of the Act, local municipal administrative registers were set up to facilitate tax collection from the whole population [46]. In 1965, the Central Office of Civil Registration was established to oversee replacement of hard-copy records in municipal registers with electronic records in a centralized civil register [46]. The CRS was launched on April 2, 1968 following the passing of the National Registration Act of 1968 [4]. The establishment of the CRS led to further changes in the Danish taxation system with implementation of a “pay-as-you-earn” tax in 1970.

Today the Central Office of Civil Registration remains responsible for maintaining and developing the CRS as the main source of individual-level information for public authorities and the private sector [7]. The CRS promotes efficiency by providing a uniform data platform that frees authorities and private companies from needing to contact residents to collect and verify general personal data [7].

Purposes

The CRS was established for two main reasons [7]: (1) a growing need for general personal data such as addresses; and (2) the need for identification of individuals for public administration purposes.

Registration

The CRS registers all persons who (1) are born alive of a mother already registered in the CRS; (2) have their birth or baptism registered in a Danish electronic church register; or (3) reside legally in Denmark for 3 months or more [8]. While newborns are registered at birth, even if they live only for a short time, stillborn children are not [8]. Persons, including newborns, who are entitled to Danish citizenship, but live abroad, are not registered in the CRS unless they move to Denmark.

Upon registration in the CRS, each person receives a Civil Personal Register (CPR) number (described in detail below). Persons who do not fulfil the above criteria for registration, but who become members of the Danish Labour Market Supplementary Pension Fund (ATP) or are required to pay tax in Denmark also receive a CPR number [8]. Moreover, residents of Greenland (an autonomous country within the Kingdom of Denmark) have been included in the CRS since May 1, 1972 [5]. As of January 2, 2014, almost 9.5 million unique CPR numbers have been assigned to residents of Denmark and Greenland, among whom 5.6 million are currently alive and living in Denmark (Fig. 1) [9].

Fig. 1
figure 1

The Danish cohort as registered in the Danish Civil Registration System (April 2, 1968 to January 2, 2014). Active status reflects persons alive and with residence in Denmark, Greenland, or with no current residence registered (e.g., homeless persons or persons in prison; n = 14,953). In addition to persons who died, emigrated, or disappeared, non-active status includes all administrative, annulled, deleted, and changed CPR numbers over time (n = 424,585). Note that emigration numbers reflect status on January 2, 2014 (both permanent and temporary emigrants). When temporary emigrants return to Denmark or Greenland, they shift from emigrated to active status

Frequency of register updates

Hard disks replaced magnetic tapes for data storage in 1989, thus permitting daily (Monday through Friday) rather than weekly updates, which improved data accuracy [5].

Variables

The CPR number

The Danish system of CPR numbers is similar to those used in other Nordic countries [10]. The Danish CPR number is a unique ten-digit personal identifier that constitutes an essential part of the personal information stored in the CRS (Table 1) [11]. Other terms referring to the CPR number include the civil registration number and the central personal registry number. The format of the CPR number is DDMMYY-SSSS, where DDMMYY is the date of birth (day-month-year) and SSSS is a serial number that makes it possible to distinguish between persons born on the same day (Fig. 2) [11]. Digits 5–7 encode the century and year of birth [11]. The last digit of the serial number encodes the person’s sex (odd for males and even for females) [11]. Previously, the last digit always functioned as a check digit for the entire CPR number and was used to guard against errors caused by the incorrect transcription of a CPR number when used by public authorities [11]. This control system based on check digits is referred to as modulus 11 control [11]. However, the check digit permits use of only approximately 540 of the numbers available for each date [11]. To increase the capacity of the CPR number system, it became possible in October 2007 to assign a “new” type of CPR number without a check digit [11]. The check digit is, however, still assigned whenever possible. Between October 2007 and April 2014, only 34 “new” CPR numbers had been assigned [9]. The CPR number is assigned to a person for life, except under special circumstances. Such special circumstances include errors in the encoded information on date of birth, change of sex, witness protection programs, and as of April 1, 2014 also severe cases of identity theft [9, 12]. When a person is assigned another CPR number, the CRS keeps a link to the old number so it is possible to track the person over time [9]. A CPR number is never reassigned to another person [11].

Table 1 Variables in the Danish Civil Registration System
Fig. 2
figure 2

The Civil Personal Register (CPR) number

Migration and vital status

The CRS maintains information on migration and vital status [4]. For living persons, the CRS records whether the person resides in Denmark or has disappeared (i.e., when place of residence is unknown to Danish authorities) [4]. Under the National Registration Act, all Danish residents are required to notify the Danish authorities of changes in their address, both when remaining within the country’s borders and when moving to or returning from another country [4]. Authorities in the municipality of a person’s new residence must be notified within 5 days following the change in address [4]. Mortality information for persons who have disappeared or emigrated is available only if the death occurred in Denmark or if the Danish authorities were notified [4]. By January 2, 2014, the CPR had registered the deaths of approximately 2.6 million persons (27 % of the total number of persons registered), the emigration of 750,000 persons (8 %), and the disappearance of 25,000 persons (0.3 %) (Fig. 1) [9].

Other variables

The remaining variables registered in the CRS are listed in Table 1. They cover personal information such as full name, address, date and place of birth, citizenship, Church membership (used to collect church tax), CPR numbers of parents and children, and civil status. Between July 27, 1995 and April 1, 2014, residents also had the option to choose actively to make their data unavailable to researchers (research protection) [12]. By January 2, 2014, 795,873 persons (13 %) had research protection [12]. In practice this meant that researchers were not allowed to send out questionnaires or otherwise contact these individuals for research purposes [12]. Persons with recorded research protection were, however, still available for register-based studies. To ensure representative sampling in surveys, it became on April 1, 2014 no longer possible to register with research protection in the CRS and all existing registrations were annulled [12]. Although, the possibility of registration with research protection no longer apply, it remains a person’s own decision whether to participate or not when asked to fill out a questionnaire [12].

CRS as a research tool

Record linkage

The CPR number is the key component of register linkage in Denmark, as it is used in all Danish administrative and medical registers and databases (Fig. 3). By using the CPR number to link data sources at the individual level, it is straightforward to obtain information on exposures, outcomes, and covariables (Table 2).

Fig. 3
figure 3

Examples of Danish data sources linkable at the individual-level using the Civil Personal Register (CPR) number

Table 2 Use of the Danish Civil Registration System as a tool in epidemiology

While electronic medical records have been implemented nationwide, linkage of other data sources to these records is currently only possible to a limited extent. However, the CPR number facilitates electronic searches of individual medical records, allowing validation of hospital register data through medical record review [13]. The availability of similar systems in other Nordic countries makes it possible to combine nationwide registers from several countries to create Nordic cohorts [1416], or to make cross-national comparisons of drug utilization [15] or disease incidence [17]. As an example, one cohort study included women and their infants born in all Nordic countries (Denmark, Finland, Iceland, Norway, and Sweden) between 1996 and 2007 to examine the risk of stillbirth and infant mortality associated with use of selective serotonin reuptake inhibitors during pregnancy [14]. Another study followed children born in Denmark (1973–2007), Sweden (1973–2006) and Finland (1987–2007) to examine whether delivery by caesarean section was associated with childhood cancer [18].

Complete follow-up

Long-term follow-up of patients in Danish cohort studies is possible using CRS information on emigration and vital status for censoring (Table 2). For instance, the CRS have been used to conduct a 36-year follow-up study of the body mass index-associated risk of atrial fibrillation in 12,850 men [19]. In a larger cohort, the CRS ensured complete 5-year follow-up on emigration and death of 219,354 patients with first-time hospitalization for stroke [20].

Use of population-based health care databases to detect events in randomized clinical trials (RCTs) is another example of the unique opportunities provided by the CRS [21]. In general, RCTs focus on efficacy and safety outcomes in controlled environments, which may limit the applicability of their results to routine clinical practice [22]. Also, funding constraints often dictate short follow-up times in RCTs. In contrast, event detection using population-based healthcare databases allows for large, low-cost RCTs that reflect daily clinical practice, cover a broad range of patients and endpoints, and include lifelong follow-up [21]. This design has been used with notable success in interventional cardiology [21]. Clinical outcomes associated with different coronary stents have been compared head-to-head by the SORT OUT trials [23]. Other examples of utilizing the CRS for long-term follow-up of RCT participants include the DANAMI 2 Trial (comparing angioplasty and fibrinolysis in patients with acute myocardial infarction) [24], the CONDI 1 trial (examining the effectiveness of remote ischaemic conditioning as an adjunct to primary percutaneous coronary intervention) [25], and the CLARICOR trial (examining the effect of clarithromycin on mortality and cardiovascular morbidity in patients with stable coronary heart disease) [26].

Daily updates of information on migration and vital status also allow accurate calculation of person-time at risk. At the beginning of 2014, the CRS included more than 400 million person-years of follow-up for Danish residents, accounting for periods of temporary emigration of individuals. In addition, the CRS permits accurate estimation of person-time in cohort studies with time-dependent exposures [27]. The ability to calculate person-time at risk based on CRS data is independent of whether the cohort is fixed (at the time of a defining entry event) or dynamic (with individuals entering or leaving the cohort at different times). Entry into cohorts defined by a specific disease (e.g., myocardial infarction) is often fixed at the time of diagnosis [28], while in studies of disease incidence, participation is often dynamic with entry at a specified age (e.g., birth), immigration, or a move to the study area, and exit at the time of death, emigration, a move away from the study area, or occurrence of the disease of interest [28].

All-cause mortality

All-cause mortality can be obtained from the CRS as an individual outcome or as part of a composite outcome. Long-term nationwide mortality trends can also be estimated [20, 28]. Moreover, the CRS can be used to obtain information on all-cause mortality as a competing risk when studying the risk of non-fatal outcomes (Table 2) [19].

Comparison cohorts for cohort studies

The CRS can be used to sample large comparison cohorts from the general population. Comparison cohort members are typically matched to patients on age, sex, county of residence, or calendar period, but otherwise are randomly selected within the Danish population as registered in the CRS. Using a population comparison cohort identified from the CRS, a nationwide cohort study examined whether primary total hip replacement increased the risk of venous thromboembolism [29]. The comparison cohort (n = 257,895) was created by matching to each of the 85,965 patients three individuals from the general population, who were alive on the date of the patient’s surgery, on age and sex [29].

The CPR number can also be used to sample patient comparison cohorts. Linking the CRS, the Danish National Patient Registry, and the Danish Cancer Registry, one study examined the survival of patients who received a diagnosis of cancer at the same time as or after an episode of venous thromboembolism [30]. The survival was then compared with that of cancer patients without venous thromboembolism, who were matched on age, sex, and year of diagnosis, and type of cancer [30].

Controls in case–control studies

In case–control studies, controls can be sampled from the cohort at start of follow-up (case-cohort study), among non-cases at end of follow-up (case-noncase study), or from the risk set during follow-up (density case–control study) [31]. As examples of risk-set sampling, the CRS has been used to match up to ten controls to each case on age and sex in studies associating use of glucocorticoids and non-steroidal anti-inflammatory drugs with risk of atrial fibrillation [32, 33].

Family cohorts

Family cohorts can be constructed using the parent–child link recorded in the CRS [34]. This link also forms the basis for the Danish Family Relations Database, which contains pedigree information allowing for the identification of family members of individuals residing in Denmark [35].

Special considerations apply to family studies [34]. Left truncation before 1968 is a limitation of the CRS in general, but it is particularly important to take this limitation into account when constructing family cohorts [34]. For most individuals born in 1950 or later, first-degree relatives (parents, children, and siblings) are identifiable. Second-degree relatives (grandparents, grandchildren, half-siblings, aunts/uncles, and nieces/nephews) can be identified for approximately 90 % of individuals born after 1984 [36]. Biological and adoptive relatives cannot be distinguished in the CRS because adopted children are recorded under their adoptive parents. However, the magnitude of this misclassification is small owing to the small number of adopted children [37]. Donor semen insemination also may lead to misclassification, as fatherhood in these cases cannot be identified using the parent–child link in the CRS. In studies of disease occurrence among parents, it should be noted that parents may be misclassified as disease-free if their diagnosis occurred before the establishment of the Danish National Patient Register in 1977 [34]. The magnitude of this problem depends on the age of the relatives and the age-specific risk of disease, and may in part be quantified [34].

Family cohorts can be used to study, inter alia, family history as a risk factor or prognostic factor for disease [34]. One study compared the proportion of infants with facial-cleft defects who had older siblings with the same defect and found that having more than one child with this defect was not linked to the mother’s residence [37]. However, having a different partner reduced a woman’s risk of having a second child with this defect [37]. Another study compared the incidence of cardiovascular disease in the background population with that among relatives of patients with sudden cardiac death. It found that cardiovascular disease co-aggregated with sudden cardiac death in families, with young first-degree relatives at greatest risk [36]. Using a similar approach, pyloric stenosis in Danish children has been shown to have strong familial aggregation and heritability [38].

Surveys

Using the CRS, it is possible to randomly select samples of the Danish population for use in surveys. An example is a public health questionnaire called “How Are You?”, which was distributed by mail to randomly selected inhabitants of the Central Denmark Region in 2001 (n = 5,221), 2006 (n = 31,500), 2010 (n = 52,400), and 2013 (n = 54,300) [39]. Study administrators ensured maximum participation in these voluntary surveys by, e.g., sending reminders by mail to non-respondents and by offering assistance with filling out the questionnaire over the telephone [39]. A total of 76 % in 2001, 69 % in 2006, 65 % in 2010, and 61 % in 2013 of potential responders completed the questionnaire [39]. Profiles of participants, including medication use, have been examined using the “How Are You?” database, providing, for example, evidence that statin use in Denmark is not associated with a healthy lifestyle [40].

The Danish Health Examination Survey (DANHES) is another example of a cross-sectional study using the CRS to randomly select participants [41, 42]. In 2010, a national sample of 298,550 individuals (16 years or older) were invited to participate in DANHES [41]. The information was collected using both paper and web questionnaires. A total of 177,639 individuals (59.5 %) completed the questionnaire. The response rate was particularly low among young men, unmarried persons, and among individuals with a different ethnic origin than Danish [41].

Bias stemming from differential non-response (non-response bias) may limit the use of such survey data [43, 44]. However, individual-level linkage of study participants as well as non-participants to other registers using the CPR number allows researchers to compute calibrated weights that can be used in data analyses to reduce non-response bias [43, 44]. The weights are based on a range of variables for all individuals who were invited in the survey. For surveys conducted between July 27, 1995 and April 1, 2014, such weights unfortunately do not include information from individuals, who were registered with research protection and therefore not eligible for sampling in first place [43]. For the DANHES 2010 survey, calibrated weights were computed on age, sex, municipality of residence, highest completed educational level, occupational status, income, marital status, ethnic origin, number of visits to the general practitioner in 2007, a hospitalization in 2007 (yes/no), owner/ten-ant status, and research protection status for all individuals living in Denmark on January 1, 2010 [41].

Data quality

With a prevalence of disappeared persons around 0.3 %, the CRS is virtually complete. Data accuracy is ensured by control mechanisms at several levels:

  1. 1.

    Registration in the CRS is required by law [46].

  2. 2.

    All administrative systems in Denmark use CRS data continuously, which increases the likelihood that errors will be noticed and corrected [46].

  3. 3.

    CRS data have been checked systematically several times and errors have been corrected [46].

  4. 4.

    The modulus 11 control can, with few exceptions, be used to validate electronically the accuracy of a CPR number [45].

  5. 5.

    Residents can do an online check of the accuracy of the data appearing on their National Health Service Medical Card (Fig. 4) and on the validity of their CPR number [46].

    Fig. 4
    figure 4

    Danish National Health Service Medical Card. The National Health Service Medical Card includes information about the Civil Personal Register (CPR) number (123456-7890), name (Hans Andersen), and address (Strandvejen 100, Bakkerne, 9999 Vejstrand) of a Danish resident together with the name (Finn Hansen), address (Strandvejen 99), and phone number (88-88-88-88) of the person’s general practitioner. This anonymized example is provided with permission from the Danish Regions

Data access

The Danish Act on Processing of Personal Data provides the legal basis for public institutions, including universities, to retain individually identifiable health data for research purposes [47]. This Act protects against abuse of personal information and thus balances the privacy of Danish residents with society’s need for research [47]. The Act specifies that use of the CRS for research purposes does not require informed consent from study subjects [47]. There may be several reasons why Danes accept having their personal information registered by public authorities and used for research: they are accustomed to being registered in public systems (Danish law since 1924), they generally have a high degree of confidence in the authorities, and there have been virtually no incidents involving misuse of CRS data.

In order to access CRS data, researchers must first seek project approval from the Data Protection Agency [48] and, if relevant, from the National Committee on Health Research Ethics [49]. The Data Protection Agency establishes safety precautions for data processing and sets cancellation deadlines, ensuring that data traceable to individuals will not be stored any longer than required to complete a project [48]. Together with the approval from the Data Protection Agency, the project protocol, and a description of the requested data, an application can be submitted online to the Research Service at the Danish Serum Institute, which also administers access to many other nationwide registers in Denmark [50].

Conclusions

The CRS is a nationwide register of personal information on Danish residents. It has recorded information on migration and vital status for the entire Danish population since April 2, 1968, with daily updates. This registration allows for nationwide cohort studies with virtually complete long-term follow-up on emigration and death. The CPR number permits cost-effective and accurate individual-level linkage between Danish registers. Using the CRS, it is possible to sample general population comparison cohorts, controls for case–control studies, family cohorts, and target groups for population surveys. The CRS is therefore a key tool for epidemiological research in Denmark.