Project task #1

Topic

For my topic I want to look into an issue of environmental justice where it has to do with low income housing and flood plains, this could be combined in social vulnerability.

Interest

For one of my classes last semester i wrote an op ed on a proposal in Raleigh to develop low income housing in response to the Dix Edge survey. This was a highly gentrified area and it it was displacing many affordable neighborhoods. It happened that the proposed spot for the low income housing was undeveloped land that would be likely to be at risk for flooding. Its made me think because it is common that low income communities are in places disproportionately impacted by natural hazards and I wanted to further explore the issue.

Data

https://hazards.fema.gov/nri/map

In this document there is information that I could make a choropleth revolving around communities in poverty then I would use flood data to see if there is any comment about the value of land and where impoverished communities are located. For this, I would use the “% of persons living below the federal poverty rate” dataset. That being said I don’t see a place to download data around poverty so if I cannot there is backup info on social vulnerability that could work for making a similar point. I saw online that there are capabilities to create a choropleth in R studio that could apply the different areas to a base shapefile that I could for sure find.

I think that the goal is to use CSV data that I can join to a shapefile that will show a breakdown of a social factor that can be related to low-income communities then put a vector layer on top representing flood risk to attempt to make a comment on the quality and risk of land and the communities surrounding it in wake county if there is a point to be made.

The resolution I aim to look at is at the census level as I want to see the data in Wake County and am interested in seeing the data on a smaller scale.

Project task #2

Data Summary

Data wrangling

One thing that I did in the wrangling section is work to filter out some of the unneeded information. I chose only wake county data because I was uninterested in any-other county. beyond that I only selected for the variables that I would potentially use. Another thing I wanted to be able to look at was the Density in each of these tracts so I created a new variable describing the people per unit of Area (though I was unsure on the unit of measurement) and they are all proportional with each other. Then next thing I did was rename the rows so it was easier for me to understand what each one is describing.

Project task 3

## read in libraries
library(tigris)
library(sf)
library(tidyverse)
library(ggplot2)
library(tmap)
library(psych)
library(kableExtra)
library(dplyr)

Data Preparation

#Get data from tigris. s
nc_tracts <-read_sf("../raw_data/nc_tracts.gpkg")

# Read in FEMA data
femahaz <- read_csv("../raw_data/FEMA/NRI_Table_CensusTracts_NorthCarolina.csv")

## Keep only the data I would be interested in looking at
femahaz <- femahaz |> 
  select(TRACTFIPS, TRACT, RISK_SCORE, SOVI_SCORE, POPULATION, AREA, COUNTY)


## Filter data to only include Wake county for simplicity
femahaz <- femahaz |> 
  filter(COUNTY == "Wake")

## Create a new column including a population normalized with the area of land. 
femahaz <- femahaz |> 
  mutate(DENSITY = POPULATION/AREA)

## Rename variables for clarity
femahaz <- femahaz |>
  rename( SCORE = RISK_SCORE,  SOCI_VULN = SOVI_SCORE, FIPS = TRACTFIPS)

# Choosing the data that I feel I can use to clean out what I see which includes County (to filter), Geoid (to join), lat and long (just in case I want to create central points), land and water (to be able to create the amount of water), and the geometry. 
nc_tracts <- nc_tracts |> 
  select(COUNTYFP, GEOID, INTPTLAT, INTPTLON, ALAND, AWATER)

# Here I choose only wake county as 183 is the fipcode for Wake county
nc_tracts <- nc_tracts |>
  filter(COUNTYFP == "183")

# Create the proportion of the land that is water by using the area that is water and the area that is land 
nc_tracts <- nc_tracts |>
  mutate(PROP_OF_WATER = AWATER/ALAND)

# make it so that the FIP is a number that can join with the other data
nc_tracts <- nc_tracts |>
  mutate(FIPS = as.numeric(GEOID))

# Rename variables to make them more understandable for me
nc_tracts <- nc_tracts |>
  rename( COUNTY = COUNTYFP, LAT = INTPTLAT, LONG =INTPTLON, LAND = ALAND, WATER = AWATER)

Table Join

## join data together with the key field FIPS which is a numeric join between my spatial and non spatial data. 
femhaz_wake <- left_join(nc_tracts,
                    femahaz,
                    by = c("FIPS" = "FIPS"))

One way that I know that the join was completed correctly is that there are the same amount of observations in femahaz, femahaz_wake, and nc_tracts. Because I skimmed down the data in both of the variables to only wake county it made it much easier to be able to compare the data and be able to see that the data was correctly joined. I also at first was getting an error message because one of my FIP codes was a character value so I was able to experience an incorrect join and witht he error message I could make it so both values were numeric. The numeric join also makes sure there is no confusion with the join as there would be no possibility of different spellings or capitalization that would impact the join. Another thing I did was verify that the join looked consistent though the data.

summary

## create a summary table and remove the fields I am not interested in. I kept one extra dataset so I can also see the risk scores along with the vulnerability score because I feel that they go hand in hand.
summary_table <- femahaz |> 
  st_drop_geometry() |>                   
  select(-FIPS, -TRACT, -AREA, -COUNTY, -POPULATION, -DENSITY) |>  
  na.omit() |>                           
  describe(fast = TRUE)


## remove the summary statistics that I don't want to disolay
summary_table <- summary_table |>
  select(-vars, -se, -range, -sd, -n)

## format the table
kable(summary_table, 
      digits = 1,
      format.args = list(big.mark = ",",
                         scientific = FALSE,
                         drop0trailing = TRUE),
      caption = "SCORE, and SOCI_VULN") |> 
  kable_styling(bootstrap_options = c("striped", 
                                      "hover", 
                                      "condensed", 
                                      "responsive"), 
                full_width = F)
SCORE, and SOCI_VULN
mean median min max skew kurtosis
SCORE 28.8 25.6 0.1 73.2 0.5 -0.4
SOCI_VULN 36.7 30.2 0.1 98.9 0.5 -0.9
## create code for inline text and round so that the data is easier to understand
score_mean <-  mean(femahaz$SCORE, na.rm=TRUE) |> round(digits = 3)
soci_vuln_mean <-  mean(femahaz$SOCI_VULN, na.rm=TRUE) |> round(digits = 3)
soci_vuln_max <-  max(femahaz$SOCI_VULN, na.rm=TRUE) |> round(digits = 3)
soci_vuln_min <-  min(femahaz$SOCI_VULN, na.rm=TRUE) |> round(digits = 3)
soci_vuln_med <- median(femahaz$SOCI_VULN, na.rm=TRUE)|> round(digits = 3)

The mean risk score is 28.753 and that corresponds to my target data which is the social vulnerability rating in rank county. for the Social Vulnerability the max score is 98.89 and the minimum value is 0.14. for the display of center we can look to the mean score which is 36.653 but to attempt to have a more robust center it is also useful to look at the median which would be a value of 30.16

histogram

## Create a histogram displaying the social vulnerability scores in Wake county with the score ans frequency displayed. 
ggplot(femhaz_wake, 
       aes(x = SOCI_VULN)) + 
  geom_histogram(binwidth = 5, 
                 na.rm = TRUE, 
                 color = "black",
                 fill = "pink") +
  labs(x = "Social vulnurability scores",
       y = "Count",
       title = "Histogram of Social Vulnurabiliy in Wake County") +
  theme_minimal()

In this histogram the data is a right skewed with the peak of data in between 0-25. It has a brad distribution of data and there looks to be no outliers based on first glance, statistical methods would need to be completed to confirm this assumption.

Chloropleth

tmap_mode("view")

tm_shape(femhaz_wake) + 
  tm_polygons("SOCI_VULN", 
              style = "jenks", 
              palette = "YlOrRd",
              lwd = 0.25,
              border.col = "black",
              border.alpha = 0.5,
              title = "Social Vulnurability by census tract") + tm_layout(frame = FALSE,
            main.title = "Social Vulnurability In Wake County")

In this distribution I see that there is more social vulnerability towards the east of the map.Some cities that seem to be darker, representings the higher social vulnerability score, are Raleigh, Zebulon, Knightdale. There is also interestingly moderate risk in Umstead park meaning that the social risk might not be totally dependent on density but this could be a factor going into the social vulnerability.