Training Conditional Random Fields using Virtual Evidence Boosting
1> settled on the class label.
It was infeasible to use the data to categorize detailed human activity. In the ontology there are 8 big classes (Cleaning,Yardwork,Laundry,Dishwashing,Meal Preparation,Hygiene, Grooming, Personal, Information/Leisure) divided into more than 100 activities(Using telephone, Taking medication,Washing hands,Applying makeup). The dataset is not large enough or indicative enough to classify down to this level. So for the purpose of more illustrative results, I choose location of the person as the class label. Then the nodes of CRF model will denote where the person is at a certain time. There are 8 possible locations of the person: Living Room, Dining room,Kitchen,Office,Bedroom,Bathroom,Power room. The data might be providing relatively enough information for predicting the location of the people.
2> Preprocessing of feather data.
The workload of preprocessing of data was underestimated in the project proposal. The algorithm is pretty straight forward to implement once the formatted data are ready. There are several challenges on how to process the data:
First of all, all the data were taken at different rate. For example, there are strictly 4 data points of each second of the "video difference" data, while there are not necessarily a data point in a minute of the "water flow" data. (It only has a value when the water is running.Most of the time there was only a small number indicating the dripping of water. What I did was to look at the data corresponding to the current class label (where the person is at this time, it could be a time range as long as 10 minutes when this person is hanging in a room long or it could be several seconds when this person is walking around). Then process the corresponding data at the same time range, take the most frequently occurred data at that time slot as the value of the formatted data. More detailed description will be provided for each feather.
Second of all, the value of the data are not in a unified frame at all. For some of the data, what would be really useful is actually the number of TINI board which the sensor is on. This provide us information about the possible place this person at. For example, the switch sensor on board 68 sends an value 2 indicating the light in the kitchen was switched on shows the possibility of the person being in the kitchen then. So for 4 of the 8 feathers I am going to use, the value of the data are the number of the TINI board. For the other 4 feathers, the meaning of the value will be described below.
Last but not least, there are some feathers that are obviously useless so I discarded them without doing computation on them. For example the pressure data, after looking through the data one can see the pressure did not change at all throughout the whole time. Also there is only one pressure sensor it is not going to give any useful information about where this person is. Similarly some other feathers were discarded.
The feathers in use and the knowledge learned about them will be listed later. Note the knowledge learned about the feathers can be used to predict roughly how the result should look like and validate if the result looks alright.
3> Implementation achieved so far.
I have written the code for computing the likelihood, weight and working response. I am still debuging for the messages sent after running BP. So the result will have to wait till I finish doing that. A brief algorithm will be described later.
Description of feather data and their characters:
1>Wired switch
Detects on/off and open/closed events, such as doors being opened/closed and knobs being turned using switches built into the infrastructure.
1WireSwitch.dat
Pre-processing shows this feather won't be very accurate on predicating where the person is. At a period of time a lot of lights could be on even if the person is not at that room. That is the main reason for the errors of this data. A lot of data value are 56 which indicates hallway. It's saying the light of hallway is on for a long time, which doesn't mean the person is in the hallway.
2.Wired humidity
Measures relative humidity in various locations using a wired sensor.
1WireHumidity.dat
The humidity data is not very informative either. The pre-processing data shows it might be more humidity change in kitchen and bathroom. It could be the result of some human activity at kitchen or bathroom, which could mean the person is there. There are not much humidity change in other places in the PlaceLab so the people at other location might be categorize wrong into other places.
3.Wired temperature
Measures ambient temperature at floor and ceiling around the apartment.
1WireTemperature.dat
The pre-processed data actually shows that there are noticeable temperature change when the person is at the kitchen, bathroom, dining room, bedroom and sometimes other places. It might be due to the cooking, eating (temperature of food), taking bath etc.
4.Wired light
Measures degree of illumination in several areas (especially those without cameras) in order to detect if lights are on.
1WireIllumination.dat
This feather is not very informative due to the same kind of reason of wire switch. The light being on doesn't necessarily mean the person is there. Also by looking at the data one can tell the sensor in the nearby place might be giving out more data than where the people really were at. For example when the people is at the dinning area the data is suggesting kitchen. When the person is at the kitchen the data might be suggesting hallway. It might work better if take the nearby sensors into consideration, having some overlapped rather than mutually independent sets of boards defining one location. This way will give a better prediction. Another thing to note is that due to the possible fact that during the 4 hours some lights were on for a long time ( bathroom, hallway), the data from other locations are quite rare which made the prediction of those places very unlikely. (office, powder room, kitchen, dinning area).
5.Wired gas
Measures amount of gas flow of the hot water heater and the stovetop burners.
1WireGasFlow.dat
This data only provide information and help predict if this person is in the kitchen. Also it's not totally accurate because the whole time the gas is on doesn't mean the person is in the kitchen. For example there are times the person is cooking, the gas is on for a long time but the person went out to the hallway and went back several times. This data won't be able to provide information about that. (The value 5 indicates large gas-flow in the kitchen. No data are very small or no gas-flow.)
6.Wired water
Measures amount of water flow for hot or cold faucets and toilets.
1WireWaterFlow.dat
This data provides information only on whether this person is in the kitchen or the bathroom. The data are pretty sparse and due to the same reason with gas-flow it is not very informative or accurate. ( The value of 6 indicates large amount of water going by in the kitchen, 1 indicates water going by in the bathroom, no data means very small water flow which might just be dripping.)
7.Wireless static
Measures movement of the object to which it is attached and wirelessly sends data to receivers scattered around the apartment.
MITesStatic.dat
This data is very informative and accurate, cross checking with the location of the board sending the signal it has almost 100% accuracy rate except for the hallway. (because there is no board in the hallway, no item in the hall way with sensor attched to it). It is because almost all the moving of items would mean the person is doing something with it, which means the person is there. In all it is a very "strong" feather which should show after the trainning.
8.Areas of visual motion from cameras
Indicates the amount of motion detected in all 18 camera views. Can be useful for determining location of a PlaceLab resident.
VideoDifferencer.dat
This is also a very informative and accurate feather. The data is constantly taking, (4 data every second). And the large number indicates a large change in the frames during this time, which is very useful in predicting there are some human activity at that location since the person is the only one moves by himself in the lab. It is almost 100% accurate in the places except for the bathroom and powder room because for privacy reasons there are no cameras in this two places. In all this is a very "strong" feather which should show after the training.
Below is a snapshot of the processed and useable data from the raw data, the first column are the labels (where the person actually is at a certain time),column 2 to 9 are the 8 feathers described above, the numbers either are the number of TINI boards or are described at front as kitchen or bathroom. The video difference and MITeStatic gives class labels which should turn out to be accurate on certain classes.
Brief discription of algorithm:
inputs: structure of CRF and training data (i,yi), with yi in {1,2,..8}
outputs:Learned F
for m= 1,2..8 do  
  Run BP using F to get virtual evidences ve(xi, n(yi));
  for k=1,2...,8 do
    for i=1,2..226 do
      compute likelihood p(yi|ve(xi,n(yi)));
      compute weight wki= p(yi|ve(xi,n(yi)))(1-p(yi|ve(xi,n(yi)));
      Compute working response zki
    end
  end
Obtain "best" fm;
Update F=F+fm;
end
Summary and Comments:
My project is going roughly on time. Although underestimate the preprocessing data workload. I should be able to meet my goal set for the end of this week which is to almost finish the implementation and get the results. I have already done pretty much the analysis and interpretation of the data. So that leaves me more time to polish the results and finish up the project.
Rong Yang May 12th 2009