--- title: "Getting started with swadlr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting started with swadlr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` The swadlr package provides access to the [EPI State of Working America Data Library](https://data.epi.org) (SWADL), a comprehensive resource for data on wages, employment, and the labor market in the United States. ```{r setup} library(swadlr) ``` ## Exploring available data Before fetching data, you can explore what's available in the SWADL API using `swadl_id_names()`. ### Topics Topics are broad categories that group related indicators: ```{r} swadl_id_names("topics") ``` ### Indicators Indicators are specific data series. You can list all indicators or filter by topic: ```{r} # List all indicators swadl_id_names("indicators") # List indicators for a specific topic swadl_id_names("indicators", topic = "wages") ``` ### Measures Measures are specific ways of presenting indicator data. For example, wage data might be available in nominal dollars, real (inflation-adjusted) dollars, or as a percentage: ```{r} # List all measures swadl_id_names("measures") # List measures for a specific indicator swadl_id_names("measures", indicator = "hourly_wage_percentiles") ``` ### Dimensions Dimensions allow subsetting data by demographic or other categories (e.g., gender, race, education). Each dimension has multiple values: ```{r} # List all dimensions and their values swadl_id_names("dimensions") # List dimensions for a specific indicator swadl_id_names("dimensions", indicator = "hourly_wage_percentiles") ``` ### Geographies The package supports national, regional, divisional, and state-level data: ```{r} # List all geographies swadl_id_names("geographies") # Filter to just states geographies <- swadl_id_names("geographies") geographies[geographies$level == "state", ] ``` ## Getting indicator information Before fetching data, use `swadl_indicator()` to get detailed information about an indicator, including available measures, dimensions, date ranges, and geographic availability: ```{r} info <- swadl_indicator("hourly_wage_percentiles") print(info) ``` You can also access specific components of the info object: ```{r} # Available measures info$measures # Availability by date interval, measure, and geography info$availability ``` ## Fetching time series data The main function for fetching data is `get_swadl()`. It returns a tibble with columns for date, value, geography, and any dimensions you request. ### Basic usage Fetch the median hourly wage over time: ```{r} wages <- get_swadl( indicator = "hourly_wage_percentiles", measure = "nominal_wage", dimension = list("wage_percentile" = "wage_p50") ) wages ``` ### Dimension syntax The `dimension` argument supports several formats: **Overall (aggregate data):** Use `"overall"` to get aggregate data without demographic breakdown: ```{r} get_swadl( indicator = "labor_force_emp", measure = "percent_emp", dimension = "overall" ) ``` **Single dimension (all values):** Pass a dimension ID to get all values for that dimension: ```{r} # All wage percentiles get_swadl( indicator = "hourly_wage_percentiles", measure = "nominal_wage", dimension = "wage_percentile" ) ``` **Single dimension (specific value):** Use a named list to filter to specific dimension values: ```{r} # Only the 90th percentile get_swadl( indicator = "hourly_wage_percentiles", measure = "nominal_wage", dimension = list("wage_percentile" = "wage_p90") ) ``` **Multiple dimensions (cross-tabulated):** Combine dimensions using a list. Named elements filter to specific values, while unnamed elements include all values: ```{r} # Employment rate for males, by all age groups get_swadl( indicator = "labor_force_emp", measure = "percent_emp", date_interval = "month", dimension = list("gender" = "gender_male", "age_group") ) ``` ### Date intervals Most indicators support both annual and monthly data. Use the `date_interval` argument: ```{r} # Annual data (default) get_swadl( indicator = "hourly_wage_percentiles", measure = "nominal_wage", date_interval = "year", dimension = list("wage_percentile" = "wage_p50") ) # Monthly data get_swadl( indicator = "labor_force_emp", measure = "percent_emp", date_interval = "month", dimension = "overall" ) ``` ### Geographic levels Fetch data for different geographic levels: ```{r} # National data (default) get_swadl( indicator = "hourly_wage_percentiles", measure = "nominal_wage", geography = "national", dimension = list("wage_percentile" = "wage_p50") ) # State data (by name) get_swadl( indicator = "hourly_wage_percentiles", measure = "nominal_wage", geography = "California", dimension = list("wage_percentile" = "wage_p50") ) # State data (by abbreviation) get_swadl( indicator = "hourly_wage_percentiles", measure = "nominal_wage", geography = "NY", dimension = list("wage_percentile" = "wage_p50") ) # Census region get_swadl( indicator = "hourly_wage_percentiles", measure = "nominal_wage", geography = "Midwest", dimension = list("wage_percentile" = "wage_p50") ) ``` ### Date filtering Filter to specific dates or date ranges: ```{r} # Single date get_swadl( indicator = "hourly_wage_percentiles", measure = "nominal_wage", dimension = list("wage_percentile" = "wage_p50"), date = "2023-01-01" ) # Date range get_swadl( indicator = "hourly_wage_percentiles", measure = "nominal_wage", dimension = list("wage_percentile" = "wage_p50"), date = c("2010-01-01", "2023-01-01") ) ``` ## Example: Wage percentiles over time Here's a complete example that fetches all wage percentiles and creates a summary: ```{r} # Fetch all wage percentiles wages <- get_swadl( indicator = "hourly_wage_percentiles", measure = "nominal_wage", dimension = "wage_percentile", date = c("2000-01-01", "2023-01-01") ) # View the data head(wages) # Summary by percentile aggregate(value ~ wage_percentile, data = wages, FUN = function(x) { c(start = x[1], end = x[length(x)], change = x[length(x)] - x[1]) }) ``` ## Example: State-level employment Fetch employment rates for all states with available data: ```{r} # Get info to see which geographic levels have data info <- swadl_indicator("labor_force_emp") info$availability # Fetch data for California ca_emp <- get_swadl( indicator = "labor_force_emp", measure = "percent_emp", date_interval = "year", geography = "California", dimension = "overall" ) ca_emp ``` ## Cache management The package caches metadata (topics, indicators, measures, dimensions, sources) within your R session to minimize API calls. If you need to refresh this cache: ```{r} clear_swadlr_cache() ```