---
title: 2018 Property Assessment Analysis
author: Prepared by reports@taxreformyyc.com
geometry: margin=1.5cm
---

```{r loadLibraries, echo=FALSE, message=FALSE}
# Load the required libraries
library(knitr)
library(scales)
library(formattable)
library(ggplot2)
```

```{r defineConstants, echo=FALSE}
# Constants
address <- "<%= address %>"
myAssessedValue <- <%= assessed_value %>
csvFile <- "<%= csv_file %>"

# Get the house number and street name
#myHouseNumber <- gsub("[^\\d]+", "", address, perl=TRUE)
m <- regexpr("^\\d+", address, perl=TRUE)
myHouseNumber <- regmatches(address, m)
myStreetName <- gsub(".*[\\d]", "", address, perl=TRUE)

# How do this factors impact an assessment?
unknownImpact <- c('Taxation.Status', 'Assessment.Class',
                   'Property.Type', 'Property.Use', 'Valuation.Approach',
                   'Market.Adjustment', 'Community', 'Market.Area', 'Sub.Neighbourhood.Code..SNC.',
                   'Sub.Market.Area', 'Influences', 'Land.Use.Designation', 'Building.Count',
                   'Building.Type.Structure', 'Year.of.Construction', 'Quality', 'Basement.Suite',
                   'Walkout.Basement', 'Garage.Type', 'Fireplace.Count', 'Renovation',
                   'Constructed.On.Original.Foundation', 'Modified.For.Disabled',
                   'Old.House.On.New.Foundation', 'Basementless', 'Penthouse')

# These factors are informational and don't affect the assessment
noImpact <- c('Roll.Number', 'Location.Address')
```

```{r loadData, echo=FALSE}
# Load data
data <- read.csv(csvFile, header=TRUE)
```

```{r getMetaData, echo=FALSE}
streetAddresses <- data[,noImpact[2]]

# Remove the street names from the addresses
#houseNumbers <- as.numeric(gsub("[^\\d]+", "", streetAddresses, perl=TRUE))
m <- regexpr("^\\d+", streetAddresses, perl=TRUE)
houseNumbers <- as.numeric(regmatches(streetAddresses, m))

```

\center
`r address`

\flushleft

# Synopsis

This analysis pertains to the property located at **`r address`**.
It documents the treatment of the assessment data provided by the City of
Calgary for the pertinent property and those deemed comparable:

`r kable(data[order(data$Location.Address),]$Location.Address,
         col.names=c('Comparable Properties'), align=c('l'))`

The data under investigation was obtained from
[assessmentsearch.calgary.ca](https://assessmentsearch.calgary.ca) and
is presented alongside this document in a consolidated CSV file.

# Approach Overview

The data investigated in this analysis consists of properties chosen for their
similar features and proximity to one another. The number of properties chosen
was maximized to ensure a fair representation of valuations and to support the
integrity of this report and its conclusion.

Given that there are many factors contained in the data whose precise impact on
assessed values are unknown, commonalities are identified and omitted from 
consideration, as all such factors should have identical impact on the
final assessment.

Factors that vary are identified and presented for consideration, as
transparency and integrity is of the utmost importance. Again, the precise
impact of these factors on the final assessment is unknown, as the weights
assigned by the City of Calgary are not divulged in the assessment data they
provide. 

Having acknowledged the factors that cannot easily be quantified, the focus
turns to the properties' lot sizes, total developed area, and assessed values.
By visualizing the relationship between these quantifiable factors, the 
pertinent property's assessed value is contrasted with those of the selected
properties. The conclusion of this analysis is drawn from the underlying data.

# Identify common factors

Many of the factors that impact the assessments are identical. This data can
safely be removed from consideration, as the impact on the assessed values
should be the same for all the properties under investigation.

```{r commonFactors, echo=FALSE}
# Identify common factors
headers <- c()
identical <- c()
displayHeaders <- c()
for (col in colnames(data)) {
  values = data[,col][!duplicated(data[,col])]
  if (length(values) == 1) {
    headers <- append(headers, col)
    displayHeaders <-append(displayHeaders, gsub("\\.", " ", col, perl=TRUE))
    identical <- append(identical, toString(values))
  }
}
commonFactors <- data.frame(headers, displayHeaders, identical)
```

Here, **`r length(commonFactors$headers)`** common factors can safely be 
removed from the data set:

`r kable(data.frame(commonFactors$displayHeaders, commonFactors$identical),
         col.names=c('Factors', 'Identical Values'), align=c('l', 'r'))`

```{r removeCommonFactors, echo=FALSE}
# Remove common factors
data <- data[,!(names(data) %in% commonFactors$headers)]
```


# Identify unknown and non-impacting factors

Of the **`r length(colnames(data))`** remaining columns, some cannot be
quantified. Others certainly impact the assessed value of a property, but the
assessment data provided by the City of Calgary does not reveal to what extent.

## Non-impacting factors

```{r noImpactDisplayHeaders, echo=FALSE}
# Remove the dots from the header name
noImpactDisplayHeaders <- gsub("\\.", " ", noImpact, perl=TRUE)
```

These factors cannot be quantified and are administrative in purpose:

`r kable(noImpactDisplayHeaders, col.names=c('Non-Impacting Factors'))`

```{r removeIrrelevantFactors, echo=FALSE}
data <- data[,!(names(data) %in% noImpact)]
```

These are removed and the remaining **`r length(colnames(data))`** columns are
carried forward.

## Unknown factors

The impact these remaining columns have on assessment values is unknown:

```{r identifyUnknowns, echo=FALSE}
# Identify unknown factors
unknownFactors <- names(data[,names(data) %in% unknownImpact])
```

`r kable(gsub("\\.", " ", unknownFactors), col.names=c('Unknown Factors'))`

The variability within these unknown columns is presented here in the interest
of transparency:

```{r consolidateUnknownFactors, echo=FALSE}
consolidatedUnknownFactors <- data[,(names(data) %in% c(unknownFactors))]
rownames(consolidatedUnknownFactors) <- houseNumbers
```

`r kable(consolidatedUnknownFactors[order(as.numeric(row.names(consolidatedUnknownFactors))),],
         align=c(rep('c', length(unknownFactors))),
         row.names=TRUE,
         col.names=gsub("\\.", " ", unknownFactors))`

These undoubtedly have an impact on the valuation, but their precise weighting
and significance are not presented in the assessment data provided by the City
of Calgary. As such, they are removed from the dataset.

```{r removeUnknownFactors, echo=FALSE}
# Remove unknown factors from data set
data <- data[,!(names(data) %in% unknownFactors)]
```

The remaining **`r length(colnames(data))`** columns contain the following data:

```{r addRowNamesToData, echo=FALSE}
# Add row names to data
rownames(data) <- houseNumbers
```

`r kable(data[order(as.numeric(row.names(data))),], row.names=TRUE, col.names=gsub("\\.", " ", colnames(data)))`

# Visualization

The raw data presented above is summarized in Figure 1. It
illustrates the disparity between the assessed property values. The pertinent
property is coloured red.

The blue line running through the graph is _best fit_ for the visualized model.
It serves as a predictor, or indicator, as to where the properties in question
should be positioned.

The pertinent property's overassessment is determined by measuring the distance
between the red point and the blue line.


```{r adjustValues, echo=FALSE}
# Sum each property's lot size and total developed space
areaTotals <- rowSums(data[,-1])

# Isolate all assessed values
assessedValues <- data[,1]

# Plot the best fit regression line
reg <- lm(assessedValues~areaTotals)

# Plot distances between points and the regression line
assessedDifferences <- residuals(reg)
adjustedValues <- predict(reg)

# Reconcile adjusted property values
adjustedProperties <- data.frame(houseNumbers, adjustedValues, assessedDifferences)
```

```{r generatePlot, fig.cap=paste(address, "Overassessment", " "), fig.width=12, echo=FALSE}
plot.title <- paste(address, "and Comparable Properties", sep=" ")
plot.subtitle = 'Current Assessed Property Values'
ggplot(data, aes(x=areaTotals, y=assessedValues)) +
    # Plot title
    ggtitle(bquote(atop(bold(.(plot.title)), atop(italic(.(plot.subtitle)), "")))) +
    theme(plot.title=element_text(size=24, hjust = 0.5)) +

    # Axis labels
    ylab("Assessed House Values on your Street") +
    xlab("Total House and Lot Size (Sq. Feet)") +
    theme(axis.title.x=element_text(size=18, face="bold", margin=margin(20,0,0,0)),
          axis.title.y=element_text(size=18, face="bold", margin=margin(0,20,0,0))) +

    # Axis tick labels
    scale_x_continuous(labels=comma) +
<% if opts.ylimit? %>
    scale_y_continuous(labels=dollar, breaks=pretty_breaks(n=10),
                       limits=c(min(assessedValues)-<%= opts.ylimit %>, max(assessedValues)+<%= opts.ylimit %>)) +
<% else %>
    scale_y_continuous(labels=dollar, breaks=pretty_breaks(n=10)) +
<% end %>
    # Scatter plot points
    geom_point(shape=ifelse(assessedValues==myAssessedValue, 16, 1),
               size=ifelse(assessedValues==myAssessedValue, 5, 4),
               colour=ifelse(assessedValues==myAssessedValue, "red", "blue")) +

    # Point labels
    geom_text(aes(label=houseNumbers), hjust=0.5, vjust=-2, size=5) +
    geom_text(aes(label=paste("$", accounting(assessedValues, format="d"), sep="")),
              hjust=0.5, vjust=-1.2) +

    # Best fit line
    geom_smooth(method=lm)
```

# Conclusion 

The data investigated in this analysis describes the factors considered in
assessing the property located at `r address`.
It was collected and provided by the City of Calgary. This report set out to
quantify the disparity between the pertinent property and similar properties
in the the neighbourhood.

Common factors were identified and eliminated from the analysis. Similarly,
varying factors were identified, catalogued, and removed from consideration. The
City of Calgary's property reports do not describe how these characteristic
features are quantified and weighted in determining a property's assessed
value. As such, they could not be included.

The conclusions that follow are drawn from comparing lot sizes, total developed
area, and assessed values. The underlying data and the overall approach have
been presented in full.

## Overvaluation: `r currency(adjustedProperties[houseNumbers==myHouseNumber,]$assessedDifferences)`

This investigation compared the assessed values with lot sizes and developed
square footage. It has revealed that that the pertinent property is overvalued
by 
**`r currency(adjustedProperties[houseNumbers==myHouseNumber,]$assessedDifferences)`**.

## Corrected Assessed Value: `r currency(adjustedProperties[houseNumbers==myHouseNumber,]$adjustedValues)`

In order to bring the pertinent property's assessed value in line with those of
similar properties in the neighbourhood, it must be reassessed at
**`r currency(adjustedProperties[houseNumbers==myHouseNumber,]$adjustedValues)`**.