Package 'epidatatools'

Title: Data tools used at the Economic Policy Institute
Description: Tools used by the Economic Policy Institute.
Authors: Ben Zipperer [aut, cre], Jori Kandra [ctb], Economic Policy Institute [cph, fnd]
Maintainer: Ben Zipperer <[email protected]>
License: MIT + file LICENSE
Version: 1.0.5
Built: 2026-05-15 09:04:27 UTC
Source: https://github.com/economic/epidatatools

Help Index


Calculate the averaged (smoothed) median

Description

Calculate the averaged (smoothed) median

Usage

averaged_median(
  x,
  w = NULL,
  na.rm = TRUE,
  quantiles_n = 9L,
  quantiles_w = c(1:4, 5, 4:1)
)

Arguments

x

numeric vector or an R object

w

numeric vector of sample weights the same length as x giving the weights to use for elements of x

na.rm

logical; if true, any NA or NaN's are removed from x before computation

quantiles_n

integer number of quantiles used for averaging; must be odd

quantiles_w

weights used for average quantiles; length must equal quantiles_n

Value

a scalar

Examples

averaged_median(x = mtcars$mpg)

Calculate the averaged (smoothed) quantile

Description

Calculate the averaged (smoothed) quantile

Usage

averaged_quantile(
  x,
  w = NULL,
  probs = 0.5,
  na.rm = TRUE,
  quantiles_n = 9L,
  quantiles_w = c(1:4, 5, 4:1)
)

Arguments

x

numeric vector or an R object

w

numeric vector of sample weights the same length as x giving the weights to use for elements of x

probs

numeric; percentile with value ⁠[0,1]⁠

na.rm

logical; if true, any NA or NaN's are removed from x before computation

quantiles_n

integer number of quantiles used for averaging; must be odd

quantiles_w

weights used for average quantiles; length must equal quantiles_n

Value

a numeric vector with length probs

Examples

averaged_quantile(x = mtcars$mpg, probs = c(0.25, 0.5, 0.75))

Summarize a data frame as binned interpolated percentiles

Description

[Superseded]

This function was superseded and EPI insteads uses averaged_quantile() to interpolate percentiles.

Usage

binipolate(data, x, probs = 0.5, bin_size, .by = NULL, w = NULL)

Arguments

data

data frame

x

column to compute

probs

numeric vector of percentiles with values ⁠[0,1]⁠

bin_size

size of binning

.by

optional, a tidy-selection of columns for single-operation grouping

w

numeric vector of weights the same length as x giving the weights to use for elements of x

Value

a tibble or data frame

Examples

binipolate(mtcars, mpg, bin_size = 0.25)
binipolate(mtcars, mpg, probs = c(0.25, 0.5, 0.75), bin_size = 0.25)
binipolate(mtcars, mpg, probs = c(0.25, 0.5, 0.75), bin_size = 0.25, .by = cyl, w = wt)

Cross-tabulate one or two variables

Description

Cross-tabulate one or two variables

Usage

crosstab(data, ..., w = NULL, percent = NULL)

Arguments

data

a data frame

...

one or two variables, for a one- or two-way cross-tabulation

w

weight

percent

for a two-way cross-tabulation, replace counts with row or column percentages

  • NULL, the default, shows counts rather than percentages

  • "row" will replace counts with row percentages, summing to 100% across columns within each row

  • "column" will replace counts with column percentages, adding to 100% across rows within each column

Value

a tibble

Examples

crosstab(mtcars, cyl)
crosstab(mtcars, cyl, gear)
crosstab(mtcars, cyl, gear, w = mpg, percent = "column")

Download a selection of IPUMS microdata extracts

Description

Convenience functions for downloading samples and variables from IPUMS microdata using their API and package ipumsr.

Usage

dl_ipums_micro(extract)

dl_ipums_acs1(years = NULL, variables, description = NULL, ...)

dl_ipums_asec(years = NULL, variables, description = NULL, ...)

dl_ipums_cps(months = NULL, variables, description = NULL, ...)

Arguments

extract

an IPUMS microdata extract as defined by ipumsr::define_extract_micro()

years

a vector of years

variables

a vector of variable names, or a list of detailed variable specifications as created by ipumsr::var_spec()

description

description for the extract

...

arguments passed to ipumsr::define_extract_micro() other than collection, description, samples, variables

months

a vector of dates representing months of CPS samples.

Details

These functions are simply wrappers around ipumsr and require you to have an IPUMS API key saved in the IPUMS_API_KEY environment variable.

Value

a tibble of microdata from the IPUMS API

Functions

  • dl_ipums_micro(): base function group

  • dl_ipums_acs1(): Download IPUMS ACS 1-year files

  • dl_ipums_asec(): Download IPUMS CPS ASEC

  • dl_ipums_cps(): Download IPUMS Monthly CPS

Examples

# example ASEC download
dl_ipums_asec(2021:2023, c("YEAR", "OFFPOV", "ASECWT"))

# example monthly CPS download
begin_month = lubridate::ym("2022 September")
end_month = lubridate::ym("2024 August")
cps_months = seq(begin_month, end_month, by = "month")
dl_ipums_cps(cps_months, c("EARNWT", "HOURWAGE2"))

# use dl_ipums_micro with a custom extract
extract = ipumsr::define_extract_micro(
  collection = "cps",
  description = "CPS ASEC extract",
  samples = c("cps2021_03s", "cps2022_03s", "cps2023_03s"),
  variables = c("YEAR", "OFFPOV", "ASECWT")
)
dl_ipums_micro(extract)

Find BLS Series by Search String

Description

Searches the BLS Data Finder for series matching a search string.

Usage

find_bls(
  search_string,
  max_results = 20,
  metadata = FALSE,
  survey = NULL,
  seasonality = NULL
)

Arguments

search_string

Character string to search for in series titles

max_results

Maximum number of results to return (default: 20)

metadata

Logical flag to retrieve catalog metadata (default: FALSE). When FALSE, only series_id and series_title are returned for faster performance.

survey

Character string specifying a BLS survey code to restrict the search (default: NULL). When NULL, searches across all surveys. See BLS Data Finder for available survey codes (e.g., "cw" for CPI Urban Wage Earners, "ln" for Labor Force Statistics, "ce" for Current Employment Statistics).

seasonality

Character string to filter by seasonal adjustment (default: NULL). When NULL, no filter is applied. Use "SA" to restrict to seasonally adjusted series, or "NSA" to restrict to not seasonally adjusted series.

Value

A tibble with columns series_id, series_title, and optionally metadata (a list column containing catalog metadata as one-row tibbles when metadata = TRUE)

Examples

find_bls("unemployment rate")

find_bls("wage", survey = "cw")

Find FRED Series by Search String

Description

Searches the FRED database for series matching a search string.

Usage

find_fred(
  search_string,
  max_results = 20,
  metadata = FALSE,
  seasonality = NULL,
  fred_api_key = Sys.getenv("FRED_API_KEY")
)

Arguments

search_string

Character string to search for in series titles

max_results

Maximum number of results to return (default: 20)

metadata

Logical flag to retrieve additional metadata (default: FALSE). When FALSE, only series_id and series_title are returned.

seasonality

Optional filter for seasonal adjustment: NULL (no filter, default), "NSA" (Not Seasonally Adjusted), or "SA" (Seasonally Adjusted)

fred_api_key

FRED API key (defaults to FRED_API_KEY environment variable)

Details

This function is a wrapper around fredr::fredr_series_search_text() and requires you to have an FRED API key.

Value

A tibble with columns series_id, series_title, and optionally metadata (a list column containing additional metadata when metadata = TRUE)

Examples

find_fred("unemployment rate")

find_fred("GDP", max_results = 10)

find_fred("inflation", metadata = TRUE)

find_fred("employment population ratio", seasonality = "SA")

Retrieve NIPA data from the BEA API

Description

Retrieves National Income and Product Accounts data from the Bureau of Economic Analysis API. Requires a BEA API key saved in the BEA_API_KEY environment variable.

Usage

get_bea_nipa(
  tables,
  years,
  frequency = c("year", "quarter", "month"),
  underlying = FALSE,
  metadata = FALSE,
  bea_api_key = Sys.getenv("BEA_API_KEY")
)

Arguments

tables

Character vector of NIPA table names (e.g., "T10101", "T20305"). Can be a named vector to add custom names.

years

Numeric vector of years or "ALL" for all available years

frequency

Character string: "year" (annual), "quarter" (quarterly), or "month" (monthly). Can specify multiple as comma-separated string.

underlying

Logical flag to use NIUnderlyingDetail dataset instead of NIPA (default: FALSE)

metadata

Logical flag to return additional metadata columns (default: FALSE)

bea_api_key

BEA API key (defaults to BEA_API_KEY environment variable)

Value

A tibble with columns: table_name, table_description, line_number, line_description, date_frequency, date, value, and date variables (year, quarter, month) appropriate to the frequency of data returned. Annual data includes only year; quarterly data includes year and quarter; monthly data includes year, quarter, and month. If metadata = TRUE, also includes unit_mult, metric_name, cl_unit, series_code, and note_text (list column). If tables is a named vector, includes a "name" column as the first column.

Examples

get_bea_nipa("T10101", years = 2020:2024, frequency = "quarter")

get_bea_nipa(
  c("gdp" = "T10101", "personal_income" = "T20305"),
  years = 2023:2024,
  frequency = "year"
)

get_bea_nipa("T10101", years = 2023, frequency = "year", metadata = TRUE)

Retrieve Regional data from the BEA API

Description

Retrieves regional economic data from the Bureau of Economic Analysis API. Requires a BEA API key saved in the BEA_API_KEY environment variable.

Usage

get_bea_regional(
  geo_fips,
  table_name,
  line_code,
  year = NULL,
  metadata = FALSE,
  bea_api_key = Sys.getenv("BEA_API_KEY")
)

Arguments

geo_fips

Character vector of geographic FIPS codes, or special values: "COUNTY" for all counties, "STATE" for all states, "MSA" for all MSAs. A single value can also be a state abbreviation (e.g., "NY", "CA").

table_name

Single table name (e.g., "CAINC1" for personal income)

line_code

Single line code specifying the statistic to retrieve, or "ALL" for all line codes

year

Optional numeric vector of years or "ALL" for all available years. Defaults to "LAST5".

metadata

Logical flag to return additional metadata columns (default: FALSE). When TRUE, also includes cl_unit, mult_unit, and note_text (list column).

bea_api_key

BEA API key (defaults to BEA_API_KEY environment variable)

Value

A tibble with columns: geo_fips, geo_name, table_name, table_description, line_number, line_description, date_frequency, date, year, value. If metadata = TRUE, also includes cl_unit, mult_unit, and note_text (list column containing all notes from the API response).

Examples

get_bea_regional(geo_fips = "STATE", table_name = "SAINC1", line_code = 1)

get_bea_regional(geo_fips = c("36000", "06000"), table_name = "SAINC1", line_code = 1)

Retrieve data from the BLS API

Description

This function is simply a wrapper around blsR and requires you to have an BLS API key saved in the BLS_API_KEY environment variable.

Usage

get_bls(
  series,
  start,
  end,
  metadata = FALSE,
  bls_api_key = Sys.getenv("BLS_API_KEY")
)

Arguments

series

BLS series code

start

Start year (numeric)

end

End year (numeric)

metadata

Flag for additional metadata

bls_api_key

BLS API key (defaults to BLS_API_KEY environment variable)

Value

a tibble

Examples

get_bls("LNU02300060", start = 2020, end = 2024)

bls_series_ids = c(
  emp_fb_2534 = "LNU02073399",
  epop_asianmen_2554 = "LNU02332330Q",
  cpi_semi = "CUUS0000SA0",
  "LNU02300060"
)
get_bls(bls_series_ids, start = 2024, end = 2024)

complete_results = get_bls(bls_series_ids, start = 2024, end = 2024, metadata = TRUE)
complete_results

complete_results |>
  tidyr::unnest(metadata)

Retrieve data from the FRED API

Description

This function is simply a wrapper around fredr and requires you to have an FRED API key.

Usage

get_fred(
  series,
  start = NULL,
  end = NULL,
  metadata = FALSE,
  fred_api_key = Sys.getenv("FRED_API_KEY")
)

Arguments

series

FRED series code

start

Start year or date (numeric year or Date object)

end

End year or date (numeric year or Date object)

metadata

Flag for additional metadata

fred_api_key

FRED API key (defaults to FRED_API_KEY environment variable)

Value

A tibble

Examples

get_fred("UNRATE")

series = c(
  gdp = "GDP",
  urate = "UNRATE"
)
get_fred(series, start = as.Date("2024-07-01"), end = 2024)

complete_results = get_fred(series, start = 2020, end = 2025, metadata = TRUE)
complete_results

complete_results |>
  tidyr::unnest(metadata)

Calculate the binned interpolated median

Description

#' @description [Superseded]

Usage

interpolated_median(x, bin_size, w = NULL, na.rm = TRUE)

Arguments

x

numeric vector or an R object

bin_size

size used for binning

w

numeric vector of weights the same length as x giving the weights to use for elements of x

na.rm

logical; if true, any NA or NaN's are removed from x before computation

Details

This function is superceded by EPI's current preferred method for interpolating medians, averaged_median().

Value

numeric vector

Examples

interpolated_median(x = mtcars$mpg, bin_size = 0.50)

Calculate the binned interpolated quantile

Description

[Superseded]

This function is superceded by EPI's current preferred method for interpolating quantiles, averaged_quantile().

Usage

interpolated_quantile(x, bin_size, probs = 0.5, w = NULL, na.rm = TRUE)

Arguments

x

numeric vector or an R object

bin_size

size used for binning

probs

numeric; percentile with value ⁠[0,1]⁠

w

numeric vector of weights the same length as x giving the weights to use for elements of x

na.rm

logical; if true, any NA or NaN's are removed from x before computation

Value

a numeric vector

Examples

interpolated_quantile(x = mtcars$mpg, bin_size = 0.50, probs = c(0.25, 0.5, 0.75))

Join data frames and create a merge indicator

Description

Join data frames and create a merge indicator

Usage

merge_status(x, y, ...)

## S3 method for class 'data.frame'
merge_status(x, y, ...)

Arguments

x, y

data frames

...

passed to dplyr::full_join()

Value

a merged data frame from full_join with an extra column ⁠_merge⁠

Examples

library(dplyr)
merge_status(band_members, band_instruments, by = "name")

State Geographic Codes

Description

A dataset containing geographic codes for US states including FIPS codes, Census codes, and region/division classifications.

Usage

state_geocodes

Format

A data frame with 51 rows and 8 variables:

state_name

Full state name

state_abb

Two-letter state abbreviation

state_fips

State FIPS code

state_census

State Census code

division

Census division number (1-9)

division_name

Census division name (e.g., "New England", "Pacific")

region

Census region number (1-4)

region_name

Census region name (Northeast, Midwest, South, West)

Source

https://github.com/Economic/state_geocodes


Summarize distinct groups

Description

Summarize distinct groups

Usage

summarize_groups(.data, .groups, ...)

Arguments

.data

a data frame

.groups

grouping variables as a tidy selection specification of columns, as used in dplyr::select()

...

name-value pairs passed to dplyr::summarize()

Value

a tibble

Examples

summarize_groups(mtcars, cyl|gear|carb, median(mpg), mean(hp))