Tag Archives: Big Data

Innovating for the Future

Erica L. Groshen was the 14th Commissioner of Labor Statistics. She served from January 2013 to January 2017. This is her final post for Commissioner’s Corner.

Image of former BLS Commissioner Erica L. Groshen

It didn’t take long after I became Commissioner of Labor Statistics in January 2013 for me to appreciate the skill, dedication, and innovation of the staff that works here. Whether they’re doing sampling, data collection, estimation, or dissemination; whether they’re the IT professionals or the statisticians or the HR staff; whether they’re the newest employees who are so tech-savvy or the more senior employees who hold a wealth of institutional knowledge. To a person they are phenomenal. I am honored to have had the pleasure of leading them — and letting them lead me — during the past 4 years.


I have had many opportunities to observe and encourage innovation during my tenure at the U.S. Bureau of Labor Statistics, from listening tours to senior staff conferences to regional office visits to discussions with a wide variety of stakeholders. From these efforts, we have identified several activities that will help us develop and implement the next generation of labor statistics. These days, we call these efforts a variety of names, such as “modernization” and “reengineering.” But, in truth, they just continue the impressive progress that has been the hallmark of BLS for the past 133 years.

In my final Commissioner’s Corner post, I want to tell you a little about some of our current reengineering efforts.

One of the things we do best at BLS is data collection, largely because we are always looking for ways to improve. Recent efforts include identifying alternative data sources, expanding electronic collection, and “scraping” information directly from the Internet. These efforts can expand the information we provide, lessen the burden we place on employers and households that provide data, and maybe even save some money to provide taxpayers the best value for their data dollar.

These efforts are not new. One source of alternative data we’ve used for many years comes from state unemployment insurance filings, which identify nearly every employer in the country. We tabulate these data but also use them as the source of our sample of employers for certain surveys and as a benchmark of detailed employment by industry. We also use information from private sources and from administrative sources, like vital statistics. Our latest efforts involve examining techniques to combine data across multiple sources, including mixing survey and nonsurvey data.

We want to give employers the opportunity to leverage the electronic data they already keep so it’s easier to respond to our surveys. These efforts include allowing employers to provide electronic information in multiple formats; identifying a single source of electronic data from employers, reducing the number of locations and number of requests made to multiple sites of the same organization; and working with employers to allow BLS to access their data directly from the Internet. We rely on good corporate citizens to supply the information that we use to produce important economic data. Making data collection easier is a win-win.

The innovation doesn’t stop at collection. We are using electronic text analysis systems extensively to streamline some of our data-processing activities. Much of the information we collect is in the form of text, such as a description of an industry or occupation, details about a workplace injury, or summaries of employee benefit plans. Transforming text into a classification system for tabulation and publication used to be a manual task. BLS has begun to transform this task through the use of machine-learning techniques, where computers learn by reviewing greater and greater amounts of information, resulting in accurate classification. As we expand our skills in this area and find more uses for these techniques, the benefits include accurate and consistent data and greater opportunities for our staff to use their brainpower to focus on new, unique, and unusual situations.

We are also modernizing our outputs, producing more with the information we have. For example, we have begun several matching projects, combining data from two or more sources to produce new information. One example is new information on nonprofit organizations. By linking our employment data with nonprofit status obtained from the Internal Revenue Service, we now have employment data separately for the for-profit and nonprofit sectors. And we took that effort one step further and produced compensation information for these sectors as well. Look for more output from these matching efforts in the future.

Finally, we’ve made great strides in how we present our information, including expanded graphics and video. And we are not stopping there. Each year we are expanding the number of data releases that include a companion graphics package. We are developing prototypes of a new generation of data releases, with more graphics and links to data series. And we have more videos to come.

My 4 years as Commissioner of Labor Statistics have flown by. I’m excited to see so many innovations begin, thrive, and foster additional innovations. I have no doubt that the culture of innovation at BLS will continue. As my term comes to an end, I know now more than ever that the skill, dedication, and creativity of the BLS staff will lead this agency to even greater advances in the years to come.

BLS Microdata Now More Easily Accessible to Researchers across the Country

I am pleased to announce that BLS is now part of the Federal Statistical Research Data Center Network.

Researchers at universities, nonprofits, and government agencies can now go to 24 secure research data centers across the United States to analyze microdata from our National Longitudinal Surveys of Youth and our Survey of Occupational Injuries and Illnesses. Before, researchers had to visit our headquarters in Washington, D.C., to use these data.Image of researchers examining data.

Making our underlying data more accessible for researchers from coast to coast is a huge step forward, and I hope it will lead to a surge in research using BLS data. I believe that having more researchers use BLS data not only will showcase new uses of the data but improve our products by encouraging researchers from BLS and other organizations to collaborate. It also supports transparency because external researchers can analyze inputs to our published statistics.

Another key benefit to having BLS data alongside datasets from the U.S. Census Bureau and the National Center for Health Statistics is that researchers can combine data from two or more agencies. Using multiple datasets allows researchers to match data to answer new questions with no more burden on our respondents. Put simply, more data = better research = better decisions that rely on research.

Researchers are enthusiastic about adding BLS data to the research data center network.

“We at the Federal Reserve Bank of Atlanta are excited that more BLS microdata are available to researchers. Policy questions are usually complicated. Matched data from different sources can give researchers a much better understanding of economic relationships. That will help us provide more informed policy advice,” said John Robertson, senior policy adviser at the Federal Reserve Bank of Atlanta.

Over the next year, we will add more BLS data to the research data centers based on user demand.

Researchers can also still visit us at our D.C. headquarters to access our full suite of microdata. To learn more and to apply, see our BLS Restricted Data Access page.

Entrepreneurship Facts: Announcing New Research Data on Job Creation and Destruction by Firm Age and Size

I’m delighted to announce that we now have new research data on job gains and losses by firm age and size across industries and states.

For many years, policymakers, economists, and others have debated whether small or large firms create more jobs. Our Business Employment Dynamics program, which measures gross job gains and losses to help us understand net employment changes, informs that debate with data on firm size. A related question is whether startups or older establishments create more jobs. Again, BLS has a stat for that. We have data on employment and business survival rates by the age of the establishment.

While it’s useful to know the age of an establishment—that is, a single location of a business—for some questions, we need to know the age of the firm. A firm may include several or even many establishments. To understand entrepreneurship in particular, we want to know how both the age and size of firms affect job gains, job losses, and employment growth.

With these new data we can answer many interesting questions, including:

  • How much do older firms contribute to job growth? Firms 10 years or older created 800,000 jobs, or 29 percent of the total 2.7 million net employment gain in the year ending March 2015. See the chart below.
  • How much do startup firms contribute to job growth? In the year ending March 2015, startup firms—firms less than 1 year old—created 1.7 million jobs or 60 percent of total employment growth. More than half these jobs were from firms with fewer than 10 employees.
  • How does the age or size of the firm affect the rate of business closures? In 2015, 788,000 establishments closed. Of these, 55 percent were from firms 10 years or older; 16 percent were from firms 5 to 9 years old; and 28 percent were from firms less than 4 years old. Of the establishments that closed from March 2014 to March 2015, 91,000 of them, or 12 percent of the total, had 500 or more employees.
  • Which firm-age group accounted for most job losses during the last two recessions? Firms 10 years or older lost the most jobs during both recessions. Again, see the chart below.


The new research data measure annual gross job gains and gross job losses by firm age and size from March of one year to March of the next. We get the data on firms from the Quarterly Census of Employment and Wages by linking individual establishments over time. Besides firm age and size, we also measure establishment age and size. We have two methods to examine size. One method compares the current size of firms or establishments with the size at the beginning of the year (the base-sizing method). The other method compares the current size with the average size over the year (the average-sizing method).

I really want to know how you like these new data and what we can do to make them more useful. I invite you to explore the data and share your comments. Your feedback will help us develop the dataset and possibly move it into our regular production. Please write your comments below, or you can email the Business Employment Dynamics staff.

How Does BLS Deal with Uncertainty in Our Measures?

I recently spoke in Pittsburgh at the 2015 Policy Summit on Housing, Human Capital, and Inequality. The Federal Reserve Banks of Cleveland, Philadelphia, and Richmond sponsored this event. I spoke on a panel with Professor Charles Manski of Northwestern University and Jeffrey Kling of the Congressional Budget Office about measuring uncertainty in federal statistics. You can watch the full discussion below.

When I speak to groups around the country or write in the Commissioner’s Corner, I always discuss the importance of having good information to make good decisions. Federal, state, and local policymakers use information from BLS, and so do private businesses, nonprofit organizations, and households. But how do the users of our data and analyses know they can rely on BLS information? Our users shouldn’t simply have blind faith. After all, households, businesses, and governments make decisions based on our data, and those decisions can involve a lot of money. Users of statistics need to understand that all measures have limitations. Data are a tool. Just like screwdrivers or spatulas, data have specific uses and different levels of precision. Data users need to choose the right tools for their purpose and use them correctly. Our goal is to measure the true state of the economy, but data users must recognize that all measures of the truth come with some uncertainty.

So what are the sources of uncertainty in our measures? One source is what we call sampling error. Most statistics we publish at BLS come from sample surveys. Sampling error is the uncertainty that results by chance because we collect the information from a sample instead of the full population. Even though we select our samples carefully using scientific methods, the characteristics of a sample still may differ from those of the population. We rely on sample surveys because it is far too expensive to ask questions of all workers or all businesses every time we need new information about the labor market and economy. Fortunately, statisticians have developed tools to measure sampling error. We publish these measures on our website. For example, you can see whether the most recent monthly changes in our measures of the labor force, employment, and unemployment are statistically significant. If we want to reduce sampling error, we can increase the size of our samples. Larger samples cost more money, but our measures of sampling error can help us decide whether the benefit of reducing that source of uncertainty is worth the cost.

Other types of uncertainty are harder to measure. For example, some people and businesses choose not to respond to our surveys. If those who don’t respond have different characteristics from those who respond, it could bias our measures. Even when people and businesses agree to participate in a survey, they might not answer every question or their answers might not be accurate. It’s hard to measure the effects of these challenges in collecting information about the economy. We try to minimize the sources of uncertainty, however. For example, we try to design our surveys to make it easier for people and businesses to respond. We show people and businesses how they benefit from responding. We test our survey questionnaires carefully to make sure they are clear and easy to answer. We seek out other sources of information to supplement our surveys, using what many people call “big data.”

Most of all, we communicate with our data users about the strengths and limitations of our data and the methods we use to compile them. We’re always looking for better, clearer ways to explain our data, and I welcome you to share your ideas.

Government Statistics in a World of Big Data

“Big data” is a buzzword you hear often these days. Long before the term even existed, BLS and other federal statistical agencies have used alternative data sources—that today would be labeled “big data”—to revolutionize the way we do business.

Last week I participated in a panel, sponsored by the American Enterprise Institute, to discuss the current and future role of federal statistical agencies in this era of big data. (See the video of the discussion.)

My fellow panelists and I agreed on one point early on: our dislike for the term “big data”!

Former U.S. Census Bureau Director Robert Groves prefers the term “organic data,” while Burning Glass CEO Matthew Sigelman refers to big data as “open market” data sources. Billion Prices Project cofounder Alberto Cavallo defines big data as “new technologies for data collection.”

Whatever term we use, we all agreed that government and private-sector data should be viewed as complementary or mutually reinforcing.

During my presentation, I discussed how big data can complement government surveys. I talked about how the Billion Prices Project, which Cavallo cofounded at the Massachusetts Institute of Technology, relates to the BLS Consumer Price Index.

The Billion Prices Project provides the extreme timeliness of a daily price index and large sample sizes that serve the almost instant needs of some data users, particularly investors.

The Consumer Price Index measures changes in the cost of living for a representative consumer buying a representative market basket. This comprehensive approach is critical to serving policymakers, Social Security recipients, and many others who use the Consumer Price Index in government programs and private contracts.

Far from being a competition, these two approaches provide important, though different, ways to measure and track the economy. Or, as I like to say, two lenses are always better than one.

I was happy to learn the panelists appreciate the key role that federal statistical agencies must play in the emerging world of big data. All parties need to work together to better use all the information we have, whether survey data or big data. Indeed, blending these two types of data creatively will produce new and better ways to inform sound decision making by our nation’s businesses, families, and policymakers. That’s a win-win for everyone.