| Title: | Data tools used at the Economic Policy Institute |
|---|---|
| Description: | Tools used by the Economic Policy Institute. |
| Authors: | Ben Zipperer [aut, cre], Jori Kandra [ctb], Economic Policy Institute [cph, fnd] |
| Maintainer: | Ben Zipperer <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.5 |
| Built: | 2026-05-15 09:04:27 UTC |
| Source: | https://github.com/economic/epidatatools |
Calculate the averaged (smoothed) median
averaged_median( x, w = NULL, na.rm = TRUE, quantiles_n = 9L, quantiles_w = c(1:4, 5, 4:1) )averaged_median( x, w = NULL, na.rm = TRUE, quantiles_n = 9L, quantiles_w = c(1:4, 5, 4:1) )
x |
numeric vector or an R object |
w |
numeric vector of sample weights the same length as x giving the weights to use for elements of x |
na.rm |
logical; if true, any NA or NaN's are removed from x before computation |
quantiles_n |
integer number of quantiles used for averaging; must be odd |
quantiles_w |
weights used for average quantiles; length must equal quantiles_n |
a scalar
averaged_median(x = mtcars$mpg)averaged_median(x = mtcars$mpg)
Calculate the averaged (smoothed) quantile
averaged_quantile( x, w = NULL, probs = 0.5, na.rm = TRUE, quantiles_n = 9L, quantiles_w = c(1:4, 5, 4:1) )averaged_quantile( x, w = NULL, probs = 0.5, na.rm = TRUE, quantiles_n = 9L, quantiles_w = c(1:4, 5, 4:1) )
x |
numeric vector or an R object |
w |
numeric vector of sample weights the same length as x giving the weights to use for elements of x |
probs |
numeric; percentile with value |
na.rm |
logical; if true, any NA or NaN's are removed from x before computation |
quantiles_n |
integer number of quantiles used for averaging; must be odd |
quantiles_w |
weights used for average quantiles; length must equal quantiles_n |
a numeric vector with length probs
averaged_quantile(x = mtcars$mpg, probs = c(0.25, 0.5, 0.75))averaged_quantile(x = mtcars$mpg, probs = c(0.25, 0.5, 0.75))
This function was superseded and EPI insteads uses
averaged_quantile() to interpolate percentiles.
binipolate(data, x, probs = 0.5, bin_size, .by = NULL, w = NULL)binipolate(data, x, probs = 0.5, bin_size, .by = NULL, w = NULL)
data |
data frame |
x |
column to compute |
probs |
numeric vector of percentiles with values |
bin_size |
size of binning |
.by |
optional, a tidy-selection of columns for single-operation grouping |
w |
numeric vector of weights the same length as x giving the weights to use for elements of x |
a tibble or data frame
binipolate(mtcars, mpg, bin_size = 0.25) binipolate(mtcars, mpg, probs = c(0.25, 0.5, 0.75), bin_size = 0.25) binipolate(mtcars, mpg, probs = c(0.25, 0.5, 0.75), bin_size = 0.25, .by = cyl, w = wt)binipolate(mtcars, mpg, bin_size = 0.25) binipolate(mtcars, mpg, probs = c(0.25, 0.5, 0.75), bin_size = 0.25) binipolate(mtcars, mpg, probs = c(0.25, 0.5, 0.75), bin_size = 0.25, .by = cyl, w = wt)
Cross-tabulate one or two variables
crosstab(data, ..., w = NULL, percent = NULL)crosstab(data, ..., w = NULL, percent = NULL)
data |
a data frame |
... |
one or two variables, for a one- or two-way cross-tabulation |
w |
weight |
percent |
for a two-way cross-tabulation, replace counts with row or column percentages
|
a tibble
crosstab(mtcars, cyl) crosstab(mtcars, cyl, gear) crosstab(mtcars, cyl, gear, w = mpg, percent = "column")crosstab(mtcars, cyl) crosstab(mtcars, cyl, gear) crosstab(mtcars, cyl, gear, w = mpg, percent = "column")
Convenience functions for downloading samples and variables from IPUMS microdata using their API and package ipumsr.
dl_ipums_micro(extract) dl_ipums_acs1(years = NULL, variables, description = NULL, ...) dl_ipums_asec(years = NULL, variables, description = NULL, ...) dl_ipums_cps(months = NULL, variables, description = NULL, ...)dl_ipums_micro(extract) dl_ipums_acs1(years = NULL, variables, description = NULL, ...) dl_ipums_asec(years = NULL, variables, description = NULL, ...) dl_ipums_cps(months = NULL, variables, description = NULL, ...)
extract |
an IPUMS microdata extract as defined by |
years |
a vector of years |
variables |
a vector of variable names, or a list of detailed variable specifications as created by |
description |
description for the extract |
... |
arguments passed to |
months |
a vector of dates representing months of CPS samples. |
These functions are simply wrappers around ipumsr and require you to have an IPUMS API key saved in the IPUMS_API_KEY environment variable.
a tibble of microdata from the IPUMS API
dl_ipums_micro(): base function group
dl_ipums_acs1(): Download IPUMS ACS 1-year files
dl_ipums_asec(): Download IPUMS CPS ASEC
dl_ipums_cps(): Download IPUMS Monthly CPS
# example ASEC download dl_ipums_asec(2021:2023, c("YEAR", "OFFPOV", "ASECWT")) # example monthly CPS download begin_month = lubridate::ym("2022 September") end_month = lubridate::ym("2024 August") cps_months = seq(begin_month, end_month, by = "month") dl_ipums_cps(cps_months, c("EARNWT", "HOURWAGE2")) # use dl_ipums_micro with a custom extract extract = ipumsr::define_extract_micro( collection = "cps", description = "CPS ASEC extract", samples = c("cps2021_03s", "cps2022_03s", "cps2023_03s"), variables = c("YEAR", "OFFPOV", "ASECWT") ) dl_ipums_micro(extract)# example ASEC download dl_ipums_asec(2021:2023, c("YEAR", "OFFPOV", "ASECWT")) # example monthly CPS download begin_month = lubridate::ym("2022 September") end_month = lubridate::ym("2024 August") cps_months = seq(begin_month, end_month, by = "month") dl_ipums_cps(cps_months, c("EARNWT", "HOURWAGE2")) # use dl_ipums_micro with a custom extract extract = ipumsr::define_extract_micro( collection = "cps", description = "CPS ASEC extract", samples = c("cps2021_03s", "cps2022_03s", "cps2023_03s"), variables = c("YEAR", "OFFPOV", "ASECWT") ) dl_ipums_micro(extract)
Searches the BLS Data Finder for series matching a search string.
find_bls( search_string, max_results = 20, metadata = FALSE, survey = NULL, seasonality = NULL )find_bls( search_string, max_results = 20, metadata = FALSE, survey = NULL, seasonality = NULL )
search_string |
Character string to search for in series titles |
max_results |
Maximum number of results to return (default: 20) |
metadata |
Logical flag to retrieve catalog metadata (default: FALSE). When FALSE, only series_id and series_title are returned for faster performance. |
survey |
Character string specifying a BLS survey code to restrict the search (default: NULL). When NULL, searches across all surveys. See BLS Data Finder for available survey codes (e.g., "cw" for CPI Urban Wage Earners, "ln" for Labor Force Statistics, "ce" for Current Employment Statistics). |
seasonality |
Character string to filter by seasonal adjustment (default: NULL). When NULL, no filter is applied. Use "SA" to restrict to seasonally adjusted series, or "NSA" to restrict to not seasonally adjusted series. |
A tibble with columns series_id, series_title, and optionally metadata (a list column containing catalog metadata as one-row tibbles when metadata = TRUE)
find_bls("unemployment rate") find_bls("wage", survey = "cw")find_bls("unemployment rate") find_bls("wage", survey = "cw")
Searches the FRED database for series matching a search string.
find_fred( search_string, max_results = 20, metadata = FALSE, seasonality = NULL, fred_api_key = Sys.getenv("FRED_API_KEY") )find_fred( search_string, max_results = 20, metadata = FALSE, seasonality = NULL, fred_api_key = Sys.getenv("FRED_API_KEY") )
search_string |
Character string to search for in series titles |
max_results |
Maximum number of results to return (default: 20) |
metadata |
Logical flag to retrieve additional metadata (default: FALSE). When FALSE, only series_id and series_title are returned. |
seasonality |
Optional filter for seasonal adjustment: NULL (no filter, default), "NSA" (Not Seasonally Adjusted), or "SA" (Seasonally Adjusted) |
fred_api_key |
FRED API key (defaults to FRED_API_KEY environment variable) |
This function is a wrapper around fredr::fredr_series_search_text()
and requires you to have an FRED API key.
A tibble with columns series_id, series_title, and optionally metadata (a list column containing additional metadata when metadata = TRUE)
find_fred("unemployment rate") find_fred("GDP", max_results = 10) find_fred("inflation", metadata = TRUE) find_fred("employment population ratio", seasonality = "SA")find_fred("unemployment rate") find_fred("GDP", max_results = 10) find_fred("inflation", metadata = TRUE) find_fred("employment population ratio", seasonality = "SA")
Retrieves National Income and Product Accounts data from the
Bureau of Economic Analysis API.
Requires a BEA API key saved in the BEA_API_KEY environment variable.
get_bea_nipa( tables, years, frequency = c("year", "quarter", "month"), underlying = FALSE, metadata = FALSE, bea_api_key = Sys.getenv("BEA_API_KEY") )get_bea_nipa( tables, years, frequency = c("year", "quarter", "month"), underlying = FALSE, metadata = FALSE, bea_api_key = Sys.getenv("BEA_API_KEY") )
tables |
Character vector of NIPA table names (e.g., "T10101", "T20305"). Can be a named vector to add custom names. |
years |
Numeric vector of years or "ALL" for all available years |
frequency |
Character string: "year" (annual), "quarter" (quarterly), or "month" (monthly). Can specify multiple as comma-separated string. |
underlying |
Logical flag to use NIUnderlyingDetail dataset instead of NIPA (default: FALSE) |
metadata |
Logical flag to return additional metadata columns (default: FALSE) |
bea_api_key |
BEA API key (defaults to BEA_API_KEY environment variable) |
A tibble with columns: table_name, table_description, line_number, line_description, date_frequency, date, value, and date variables (year, quarter, month) appropriate to the frequency of data returned. Annual data includes only year; quarterly data includes year and quarter; monthly data includes year, quarter, and month. If metadata = TRUE, also includes unit_mult, metric_name, cl_unit, series_code, and note_text (list column). If tables is a named vector, includes a "name" column as the first column.
get_bea_nipa("T10101", years = 2020:2024, frequency = "quarter") get_bea_nipa( c("gdp" = "T10101", "personal_income" = "T20305"), years = 2023:2024, frequency = "year" ) get_bea_nipa("T10101", years = 2023, frequency = "year", metadata = TRUE)get_bea_nipa("T10101", years = 2020:2024, frequency = "quarter") get_bea_nipa( c("gdp" = "T10101", "personal_income" = "T20305"), years = 2023:2024, frequency = "year" ) get_bea_nipa("T10101", years = 2023, frequency = "year", metadata = TRUE)
Retrieves regional economic data from the
Bureau of Economic Analysis API.
Requires a BEA API key saved in the BEA_API_KEY environment variable.
get_bea_regional( geo_fips, table_name, line_code, year = NULL, metadata = FALSE, bea_api_key = Sys.getenv("BEA_API_KEY") )get_bea_regional( geo_fips, table_name, line_code, year = NULL, metadata = FALSE, bea_api_key = Sys.getenv("BEA_API_KEY") )
geo_fips |
Character vector of geographic FIPS codes, or special values: "COUNTY" for all counties, "STATE" for all states, "MSA" for all MSAs. A single value can also be a state abbreviation (e.g., "NY", "CA"). |
table_name |
Single table name (e.g., "CAINC1" for personal income) |
line_code |
Single line code specifying the statistic to retrieve, or "ALL" for all line codes |
year |
Optional numeric vector of years or "ALL" for all available years. Defaults to "LAST5". |
metadata |
Logical flag to return additional metadata columns (default: FALSE). When TRUE, also includes cl_unit, mult_unit, and note_text (list column). |
bea_api_key |
BEA API key (defaults to BEA_API_KEY environment variable) |
A tibble with columns: geo_fips, geo_name, table_name, table_description, line_number, line_description, date_frequency, date, year, value. If metadata = TRUE, also includes cl_unit, mult_unit, and note_text (list column containing all notes from the API response).
get_bea_regional(geo_fips = "STATE", table_name = "SAINC1", line_code = 1) get_bea_regional(geo_fips = c("36000", "06000"), table_name = "SAINC1", line_code = 1)get_bea_regional(geo_fips = "STATE", table_name = "SAINC1", line_code = 1) get_bea_regional(geo_fips = c("36000", "06000"), table_name = "SAINC1", line_code = 1)
This function is simply a wrapper around blsR and requires you to have an BLS API key saved in the BLS_API_KEY environment variable.
get_bls( series, start, end, metadata = FALSE, bls_api_key = Sys.getenv("BLS_API_KEY") )get_bls( series, start, end, metadata = FALSE, bls_api_key = Sys.getenv("BLS_API_KEY") )
series |
BLS series code |
start |
Start year (numeric) |
end |
End year (numeric) |
metadata |
Flag for additional metadata |
bls_api_key |
BLS API key (defaults to BLS_API_KEY environment variable) |
a tibble
get_bls("LNU02300060", start = 2020, end = 2024) bls_series_ids = c( emp_fb_2534 = "LNU02073399", epop_asianmen_2554 = "LNU02332330Q", cpi_semi = "CUUS0000SA0", "LNU02300060" ) get_bls(bls_series_ids, start = 2024, end = 2024) complete_results = get_bls(bls_series_ids, start = 2024, end = 2024, metadata = TRUE) complete_results complete_results |> tidyr::unnest(metadata)get_bls("LNU02300060", start = 2020, end = 2024) bls_series_ids = c( emp_fb_2534 = "LNU02073399", epop_asianmen_2554 = "LNU02332330Q", cpi_semi = "CUUS0000SA0", "LNU02300060" ) get_bls(bls_series_ids, start = 2024, end = 2024) complete_results = get_bls(bls_series_ids, start = 2024, end = 2024, metadata = TRUE) complete_results complete_results |> tidyr::unnest(metadata)
This function is simply a wrapper around fredr and requires you to have an FRED API key.
get_fred( series, start = NULL, end = NULL, metadata = FALSE, fred_api_key = Sys.getenv("FRED_API_KEY") )get_fred( series, start = NULL, end = NULL, metadata = FALSE, fred_api_key = Sys.getenv("FRED_API_KEY") )
series |
FRED series code |
start |
Start year or date (numeric year or Date object) |
end |
End year or date (numeric year or Date object) |
metadata |
Flag for additional metadata |
fred_api_key |
FRED API key (defaults to FRED_API_KEY environment variable) |
A tibble
get_fred("UNRATE") series = c( gdp = "GDP", urate = "UNRATE" ) get_fred(series, start = as.Date("2024-07-01"), end = 2024) complete_results = get_fred(series, start = 2020, end = 2025, metadata = TRUE) complete_results complete_results |> tidyr::unnest(metadata)get_fred("UNRATE") series = c( gdp = "GDP", urate = "UNRATE" ) get_fred(series, start = as.Date("2024-07-01"), end = 2024) complete_results = get_fred(series, start = 2020, end = 2025, metadata = TRUE) complete_results complete_results |> tidyr::unnest(metadata)
interpolated_median(x, bin_size, w = NULL, na.rm = TRUE)interpolated_median(x, bin_size, w = NULL, na.rm = TRUE)
x |
numeric vector or an R object |
bin_size |
size used for binning |
w |
numeric vector of weights the same length as x giving the weights to use for elements of x |
na.rm |
logical; if true, any NA or NaN's are removed from x before computation |
This function is superceded by EPI's current preferred
method for interpolating medians, averaged_median().
numeric vector
interpolated_median(x = mtcars$mpg, bin_size = 0.50)interpolated_median(x = mtcars$mpg, bin_size = 0.50)
This function is superceded by EPI's current preferred
method for interpolating quantiles, averaged_quantile().
interpolated_quantile(x, bin_size, probs = 0.5, w = NULL, na.rm = TRUE)interpolated_quantile(x, bin_size, probs = 0.5, w = NULL, na.rm = TRUE)
x |
numeric vector or an R object |
bin_size |
size used for binning |
probs |
numeric; percentile with value |
w |
numeric vector of weights the same length as x giving the weights to use for elements of x |
na.rm |
logical; if true, any NA or NaN's are removed from x before computation |
a numeric vector
interpolated_quantile(x = mtcars$mpg, bin_size = 0.50, probs = c(0.25, 0.5, 0.75))interpolated_quantile(x = mtcars$mpg, bin_size = 0.50, probs = c(0.25, 0.5, 0.75))
Join data frames and create a merge indicator
merge_status(x, y, ...) ## S3 method for class 'data.frame' merge_status(x, y, ...)merge_status(x, y, ...) ## S3 method for class 'data.frame' merge_status(x, y, ...)
x, y
|
data frames |
... |
passed to dplyr::full_join() |
a merged data frame from full_join with an extra column _merge
library(dplyr) merge_status(band_members, band_instruments, by = "name")library(dplyr) merge_status(band_members, band_instruments, by = "name")
A dataset containing geographic codes for US states including FIPS codes, Census codes, and region/division classifications.
state_geocodesstate_geocodes
A data frame with 51 rows and 8 variables:
Full state name
Two-letter state abbreviation
State FIPS code
State Census code
Census division number (1-9)
Census division name (e.g., "New England", "Pacific")
Census region number (1-4)
Census region name (Northeast, Midwest, South, West)
https://github.com/Economic/state_geocodes
Summarize distinct groups
summarize_groups(.data, .groups, ...)summarize_groups(.data, .groups, ...)
.data |
a data frame |
.groups |
grouping variables as a tidy selection specification
of columns, as used in |
... |
name-value pairs passed to dplyr::summarize() |
a tibble
summarize_groups(mtcars, cyl|gear|carb, median(mpg), mean(hp))summarize_groups(mtcars, cyl|gear|carb, median(mpg), mean(hp))