Predicting Dartmouth's Total Daily Energy Usage
Tev'n Powers and Henry Stewart
Problem
Utility infrastructure is essential for modern living. Those who own homes or sometimes rent apartments become aware of the costs of their utilities when they open their monthly energy bills. Most college students typically only pay one bill per semester or quarter. They are particularly insulated from the costs of their daily or monthly kW usage. At Dartmouth, the GreenLite project attempts to make students more aware of their hourly electricity consumption.
Given the rising cost of energy, we would like to be able to predict how much energy the entire Dartmouth campus will use on a daily basis and a monthly basis. Knowing this information could be useful to the Dartmouth Power Plant (Facilities Operations & Management) in offsetting fuel costs or finding ways to reduce utility costs.
Based on the historic daily total kilowatt, water, and paper usage of the entire Dartmouth campus, our goals for this project are the following:
1. predict tomorrow's energy usage (kW, water, paper) based on our training data
2. compare tomorrow's actual energy usage with the prediction
Methods
Our initial plan is to recognize the three sets of independent features that will affect our estimation:
1. Non-Human Factors provided by the GreenLite Project data set: Power (kW), Temperature (deg), Relative Humidity (%rh), etc.
2. Human Factors: Class in session vs. breaks, campus holidays (homecoming, winter carnival, greenkey, graduation), finals period, concerts/events, etc.
3. Anomalies that may cause fluctuation in power usage: individual birthdays, small cluster events, parties, etc.
We will use Support Vector Machines (SVM) to find the correlation between the first two sets of features and the total energy consumption of the campus (we expect a high correlation with the predictions of the total campus energy usage).
For the third set of features, we will not use a SVM, but rather we expect these anomalies to be captured by our error function since we do not expect these isolated events to have a high correlation with estimating the total campus energy consumption (if we were to predict the energy of one building or cluster, then we would expect a high correlation between these small events and the consumption).
Lastly, with the high volume of features that we may have in our sets, we will look into using Kernel functions with our SVM in order to reduce the dimensionality of each feature set. Although the sets are independent from one another, they are not necessarily independent of other features from the same set. So we may be able to find correlation between features of the same set using this Kernel function.
Datasets
Our training data will come from the data aggregated by the GreenLite project (started at Dartmouth College). The data is polled from digital meters and stored in a database. The database will include kW usage, water consumption (hot & cold), heat generated, printer usage (form of carbon output), and energy creation from any renewable energy devices used by campus.
The database is found at:
GreenLite Data (authorization requires a valid username and password combination)
We will also have to use data from the Office of the Registrar in order to account for the features related to the Dartmouth calendar and schedule (classes, breaks, holidays, exams, etc.)
Timeline
- April 17 - gather 15 minute kW data into a txt file from database
- April 30 - program regression model & classify break day vs non-break day (weekday vs weekend)
- May 1 - 6 - compare daily data with model's predictions
- May 8 - Milestone presentation in class
References
http://dev.greenlite.cs.dartmouth.edu
Lecture on linear regression by Lorenzo Torresani
Interview with Tim Tregubov
Interview with Laurie Loeb
Meeting with Weifu Wang