Data Collection

Data collection refers to systematically gathering information about a subject or question of interest. Data may be collected from a variety of sources including surveys, interviews, focus groups, and existing records and databases.

Change Main Topic

Filter Results

Resource Types
Research Centers

FAQs About Data Collection

How reliable are different sources of physician supply data?

An article by DesRoches et al (2015) compared the National Provider and Plan Enumeration System (NPPES), the American Medical Association Masterfile, and the SK&A physician file to evaluate data accuracy. The authors performed this analysis in the context of using the selected datasets for sampling frameworks and counting physicians in a given area. The authors found that while none of the files were perfect, the NPPES contained broader coverage and NPPES and SK&A data had reasonably accurate and current address information. The AMA Masterfile had lower rates of correct address information.

State licensure data are another matter. Some state medical boards require only basic information, including a mailing address for licensing correspondence. Some states collect more robust data through licensure, including multiple practice addresses, and demographic, education, and practice characteristics. Some states conduct regular surveys. States may or may not systematically verify the licensure or survey data.

Different data sources have different limitations. Before using any dataset as a sampling frame or for research, it is essential to understand the data’s purpose and how they are collected, verified, and updated.


Why don’t different data sources match?

There are multiple approaches to collecting data and data are often collected for different purposes. As a result, it is important to understand the methodology behind each dataset and its intended use in order to make valid comparisons. For example, the Bureau of Labor Statistics Occupational Employment Statistics collects data from employment surveys; the data count jobs, not workers, and they count employed, not self-employed positions. Professional association masterfiles (eg, AMA, ADA) are based on membership surveys and other sources, and data may not accurately account for professionals that are licensed and practicing in more than one state. State licensure data are self-reported through license applications and renewals, and hinge on the licensees accuracy and timeliness. The National Provider and Plan Enumeration System (NPPES) is a registry of providers that submit Medicare and Medicaid claims; this is an administrative database where the billing address of the provider may not match the provider’s practice location. Health professionals are mobile, some more than others, and change jobs and locations; these moves may not be reflected accurately or in a timely manner.


There are many sources of health workforce data. Some sources have known and documented limitations. It is important to understand the data’s purpose and how they are collected, verified, and updated.

There are 2 reports that describe multiple data sources:

Selected federal sources:

The Bureau of Labor Statistics (BLS) is a commonly used federal data source included in the compendium. The BLS tracks employment by industry and occupations, projects future employment, houses the Current Population Survey, and provides other employment statistics. A known limitation is that the BLS Occupational Employment Statistics data count jobs, not workers, and excludes workers who are self-employed, unemployed, or in certain industries. Professions with a large number of self-employed workers, such as physicians and dentists, may be underestimated, while professions with workers that work 2 or more part-time jobs, such as dental hygienists, may be overestimated. The Current Population Survey, which surveys households, is another commonly-used dataset from BLS that is used to estimate health workforce statistics.

The Area Health Resource File (AHRF) is a publicly available dataset that aggregates data from disparate data sources. It contains county-level and state-level data on healthcare workers and other demographic and health-related variables. Some variables based on data from the American Dental Association, the American Hospital Association and the American Medical Association are subject to copyright restrictions.

Selected nonfederal sources:

Professional associations, such as the American Medical Association (AMA), the American Dental Association (ADA), and the American Hospital Association (AHA) conduct their own surveys and maintain databases (eg, “Masterfiles”) for administrative and analytic uses. These data sources are often proprietary and available for purchase under strict data use agreements.

SK&A maintains databases on physicians and other healthcare practitioners. They claim that the lists are verified every 6 months and updated monthly, and that mailing lists are guaranteed 100% deliverable.

Another nonfederal data source is Kaiser Family Foundation State Health Facts. Like the AHRF, this source reports aggregated data on providers, service use, and other useful health-related variables obtained from outside resources.

State sources:

States may or may not collect their own workforce-related data. The State Health Workforce Data Collection Inventory lists states that collect data on health workforce supply, demand, and/or education.


What staff and resources are needed to undertake health workforce data collection and analysis?

This depends on many different factors, such as how many health professionals you want to track, the method used to collect data (licensure, survey, continuous monitoring, secondary data), the types of deliverables for which you’re accountable, and organization structure. If the data system is embedded within a larger organization, such as a university or state government office, it is likely that some administration, finance, and infrastructure resources are already available for basic operation. If the data system is a stand-alone organization, you will need to secure funding.

In terms of staff, you may consider having a director to guide the work, make decisions, present results and acquire funding; one or more project managers/researchers to analyze data, write reports and present results; and a data manager to collect, clean and analyze data. Other positions may include communications specialist, visualization specialist, research assistant, administrative assistant, grants manager, and financial manager.

Additional resources needed include computer hardware and software for data management, statistical analysis, GIS, and graphic design.


How do you measure demand for health workers?

Demand for health services can be difficult to measure, and data availability varies. Broadly speaking, demand for health services can be split into 2 categories:

  • Utilization
    • Those who utilize health care services, which includes people who need and receive services, and people who receive but may not need services (eg, elective procedures, the “worried well”)
  • Unmet Need
    • Those who need services but do not choose to seek them
    • Those who need services but cannot access them because of limiting factors such as cost, insurance coverage, time, transportation, availability of healthcare providers, or other reasons

Utilization can be measured by claims data and sample surveys such as the Medical Expenditures Panel Survey (MEPS), but this underestimates demand for services. Data measuring unmet need is not systematically collected, and thus must be estimated or captured through individual surveys.

From the supply side, job vacancy, turnover, recruiting bonuses, and employment projections are also indicators of demand for health care services and workers. The Bureau of Labor Statistics tracks changes in employment and projects future employment estimates. Job vacancy data can be tracked through job boards or proprietary data sources such as Burning Glass Industries. Other vacancy, turnover, and bonus data can be tracked through hospital and other industry surveys. An example of state-level demand tracking is the Washington Health Workforce Sentinel Network. The Sentinel Network links health care employers with educators, policymakers and workforce planners to identify and respond to new and changing demand for healthcare workers, skills and roles.

Patient population factors, such as aging of the population, and policy changes that affect insurance coverage and disease burden, also influence future estimates of demand.


Do you have examples of questions that we could ask?

Yes. The National Forum of State Nursing Workforce Centers, and the Federation of State Boards of Physical Therapy (FSBPT) have developed Minimum Data Set questions for their professions. Additionally, HRSA has developed MDS standards, and the WWAMI Center for Health Workforce Studies at the University of Washington has archived a questionnaire library containing data collection instruments volunteered by several states. The HWTAC is also including selected data collection instruments in the State Health Workforce Data Collection Inventory.


How easy is it to get licensure boards to add or change questions?

This will vary from state to state. It is important to remain cognizant of a) the financial cost to the board to change online renewal questions; b) the time that it takes respondents to complete their licensure renewal form; and c) the need for comparability across time. Only request changes or additions when absolutely necessary.

Some states mandate the collection of data through legislation, which affects how easy it is to add or change questions. For example, Florida’s data collection is legislated, and any question must go through a lengthy public comment period to be added or changed. This process has the potential to subject questions to bias from the public and special interest groups.


How do you work with licensure boards to collect and share data?

Relationships are key. Licensure boards are important partners in health workforce data collection, but their main priority is regulation to protect patient safety. They often don’t have resources (ie, funding, staff, time) to collect additional data, and in some states, current legislation restricts their ability to share data.

Show the boards the value of collecting additional workforce data as it relates to evidence-based regulation, and look for ways to minimize their burden, especially during the initial development period. Treat them as a valued partner and bring them into the conversation very early to build trust.

Collaborating With Licensing Bodies in Support of Health Workforce Data Collection: Issues and Strategies


What are some different ways to collect health workforce data?

There are generally 4 methods to collect health workforce data:

  1. Licensure Process. Data are collected as part of the licensure process when health professionals apply for their initial license and when they renew, capturing 100% of the workforce. This is one of the most efficient and cost-effective methods to collect data. Some questions on the licensure forms may be mandatory, while others are optional. The organizational structure of the licensing boards will present different opportunities and barriers to data collection. Examples: North Carolina, South Carolina, Virginia
  2. Surveys. Data are collected through surveys, either in conjunction with the licensure process or as a separate effort. This method requires more staff time and money. Response rates may vary, but this is a good option if health workforce questions cannot be included directly on the licensure forms. Examples: New York, Wisconsin
  3. Continuous Monitoring. Data collection begins with a list of all licensees in one or more professions. From there, states track individuals through surveys, news clipping services, and other methods to determine practice status, practice setting, and other characteristics. This method can be costly, but it may provide more up-to-date information. Examples: Iowa, Nebraska
  4. Secondary Data Sources. Secondary data sources can also be used to enumerate the workforce in a specific state. These data sources include the National Provider Identification (NPI) file, the American Medical Association (AMA) Physician Masterfile, the US Bureau of Labor Statistics, and the Census Bureau’s American Community Survey, as well as state professional associations. Additionally, all-payer claims databases can be used to enumerate the health workforce in select states, but there are significant limitations.

The Minimum Data Set (MDS) provides guidelines for collecting basic, minimum, and consistent data on health professionals. These guidelines are not requirements, but they do provide suggestions so that data are collected in a way that is useful for research purposes and comparable across professions and states. Some states ask questions that go beyond the MDS so they can better understand their workforce and answer questions from their policymakers.

The following resources provide information on basic MDS guidelines and going beyond the MDS to ask additional questions, plus examples of data collection instruments from various states.


What states have implemented the MDS?

Many states are already collecting health workforce data, with a customized MDS in place to collect any additional data they need for health workforce planning. Some examples of states that are already collecting an MDS include North Carolina, Virginia, New York, Indiana, and Minnesota.

For more information on which states are collecting data, visit our State Health Workforce Data Collection Inventory, or contact HWTAC.


What is the MDS?

The Minimum Data Set, or MDS, provides basic, consistent guidelines for fundamental health workforce questionnaires. These questions can be used by anyone who wants to collect data on the supply of health workers, whether through the licensure process or surveys, and can be adapted for additional professions. MDS questions focus on essential demographic, education, and practice characteristics.

For more information, click here.


How do you fund health workforce data collection and analysis?

Data systems can be funded through state appropriations, private foundations, grants and contracts, and on a cost-recovery basis. Each funding mechanism has its challenges. State appropriations are tenuous; administrations and priorities change, and budgets get cut. Foundations are often geared to fund initiatives that show more tangible results. Grants are often time-limited. Cost-recovery is subject to demand for data and services, and limits the type of analyses and reports that you can do. Stakeholders who require data may be persuaded to fund the analysis costs to meet their specific needs, but they frequently are not willing or able to fund the fixed infrastructure costs. Consider the appropriate funding source for the specifics of your data collection effort, given the meaning and value of the project.


I’m interested in allied health and administrative support workers. They’re not always licensed. How do you count them?

For those professions, it may be necessary to conduct surveys, or rely on other data sources such as professional associations or the BLS, noting limitations as appropriate.


    Want to stay up to date?

    Sign up for our mailing and never miss a new piece of information.

    I would like updates for:

    Filter Results


    Filter Search Results