---
title: "Getting started with swadlr"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting started with swadlr}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

The swadlr package provides access to the [EPI State of Working America Data
Library](https://data.epi.org) (SWADL), a comprehensive resource for data on
wages, employment, and the labor market in the United States.

```{r setup}
library(swadlr)
```

## Exploring available data

Before fetching data, you can explore what's available in the SWADL API using
`swadl_id_names()`.

### Topics

Topics are broad categories that group related indicators:

```{r}
swadl_id_names("topics")
```

### Indicators

Indicators are specific data series. You can list all indicators or filter by
topic:

```{r}
# List all indicators
swadl_id_names("indicators")

# List indicators for a specific topic
swadl_id_names("indicators", topic = "wages")
```

### Measures

Measures are specific ways of presenting indicator data. For example, wage data
might be available in nominal dollars, real (inflation-adjusted) dollars, or as
a percentage:

```{r}
# List all measures
swadl_id_names("measures")

# List measures for a specific indicator
swadl_id_names("measures", indicator = "hourly_wage_percentiles")
```

### Dimensions

Dimensions allow subsetting data by demographic or other categories (e.g.,
gender, race, education). Each dimension has multiple values:

```{r}
# List all dimensions and their values
swadl_id_names("dimensions")

# List dimensions for a specific indicator
swadl_id_names("dimensions", indicator = "hourly_wage_percentiles")
```

### Geographies

The package supports national, regional, divisional, and state-level data:

```{r}
# List all geographies
swadl_id_names("geographies")

# Filter to just states
geographies <- swadl_id_names("geographies")
geographies[geographies$level == "state", ]
```

## Getting indicator information

Before fetching data, use `swadl_indicator()` to get detailed information about
an indicator, including available measures, dimensions, date ranges, and
geographic availability:

```{r}
info <- swadl_indicator("hourly_wage_percentiles")
print(info)
```

You can also access specific components of the info object:

```{r}
# Available measures
info$measures

# Availability by date interval, measure, and geography
info$availability
```

## Fetching time series data

The main function for fetching data is `get_swadl()`. It returns a tibble
with columns for date, value, geography, and any dimensions you request.

### Basic usage

Fetch the median hourly wage over time:

```{r}
wages <- get_swadl(
  indicator = "hourly_wage_percentiles",
  measure = "nominal_wage",
  dimension = list("wage_percentile" = "wage_p50")
)
wages
```

### Dimension syntax

The `dimension` argument supports several formats:

**Overall (aggregate data):**

Use `"overall"` to get aggregate data without demographic breakdown:

```{r}
get_swadl(
  indicator = "labor_force_emp",
  measure = "percent_emp",

  dimension = "overall"
)
```

**Single dimension (all values):**

Pass a dimension ID to get all values for that dimension:

```{r}
# All wage percentiles
get_swadl(
  indicator = "hourly_wage_percentiles",
  measure = "nominal_wage",
  dimension = "wage_percentile"
)
```

**Single dimension (specific value):**

Use a named list to filter to specific dimension values:

```{r}
# Only the 90th percentile
get_swadl(
  indicator = "hourly_wage_percentiles",
  measure = "nominal_wage",
  dimension = list("wage_percentile" = "wage_p90")
)
```

**Multiple dimensions (cross-tabulated):**

Combine dimensions using a list. Named elements filter to specific values,
while unnamed elements include all values:

```{r}
# Employment rate for males, by all age groups
get_swadl(
  indicator = "labor_force_emp",
  measure = "percent_emp",
  date_interval = "month",
  dimension = list("gender" = "gender_male", "age_group")
)
```

### Date intervals

Most indicators support both annual and monthly data. Use the `date_interval`
argument:

```{r}
# Annual data (default)
get_swadl(
  indicator = "hourly_wage_percentiles",
  measure = "nominal_wage",
  date_interval = "year",
  dimension = list("wage_percentile" = "wage_p50")
)

# Monthly data
get_swadl(
  indicator = "labor_force_emp",
  measure = "percent_emp",
  date_interval = "month",
  dimension = "overall"
)
```

### Geographic levels

Fetch data for different geographic levels:

```{r}
# National data (default)
get_swadl(
  indicator = "hourly_wage_percentiles",
  measure = "nominal_wage",
  geography = "national",
  dimension = list("wage_percentile" = "wage_p50")
)

# State data (by name)
get_swadl(
  indicator = "hourly_wage_percentiles",
  measure = "nominal_wage",
  geography = "California",
  dimension = list("wage_percentile" = "wage_p50")
)

# State data (by abbreviation)
get_swadl(
  indicator = "hourly_wage_percentiles",
  measure = "nominal_wage",
  geography = "NY",
  dimension = list("wage_percentile" = "wage_p50")
)

# Census region
get_swadl(
  indicator = "hourly_wage_percentiles",
  measure = "nominal_wage",
  geography = "Midwest",
  dimension = list("wage_percentile" = "wage_p50")
)
```

### Date filtering

Filter to specific dates or date ranges:

```{r}
# Single date
get_swadl(
  indicator = "hourly_wage_percentiles",
  measure = "nominal_wage",
  dimension = list("wage_percentile" = "wage_p50"),
  date = "2023-01-01"
)

# Date range
get_swadl(
  indicator = "hourly_wage_percentiles",
  measure = "nominal_wage",
  dimension = list("wage_percentile" = "wage_p50"),
  date = c("2010-01-01", "2023-01-01")
)
```

## Example: Wage percentiles over time

Here's a complete example that fetches all wage percentiles and creates a
summary:

```{r}
# Fetch all wage percentiles
wages <- get_swadl(
  indicator = "hourly_wage_percentiles",
  measure = "nominal_wage",
  dimension = "wage_percentile",
  date = c("2000-01-01", "2023-01-01")
)

# View the data
head(wages)

# Summary by percentile
aggregate(value ~ wage_percentile, data = wages, FUN = function(x) {
  c(start = x[1], end = x[length(x)], change = x[length(x)] - x[1])
})
```

## Example: State-level employment

Fetch employment rates for all states with available data:

```{r}
# Get info to see which geographic levels have data
info <- swadl_indicator("labor_force_emp")
info$availability

# Fetch data for California
ca_emp <- get_swadl(
  indicator = "labor_force_emp",
  measure = "percent_emp",
  date_interval = "year",
  geography = "California",
  dimension = "overall"
)
ca_emp
```

## Cache management

The package caches metadata (topics, indicators, measures, dimensions, sources)
within your R session to minimize API calls. If you need to refresh this cache:

```{r}
clear_swadlr_cache()
```