CS 34 - Final Write-up
Ray Di Ciaccio
Yoon-ki Park
Classifying Inrush Currents for Device Identification

Intro
Raw Data
Sources of Data
Problems with Data
Pre-processing
Algorithms
Peak Extraction
Steady-state sampling
LMNN
Extracted Features
Results
References


Introduction:

Current inrush is a phenomenon that occurs when certain devices are switched on. A spike in current during the switching on of an electronic device happens because components in these devices (i.e. heating elements, transformers, motors) have very low impedance until they reach their normal operational conditions. An incandescent light bulb, for example, has very low impedance when the tungsten element is cold but once it is switched on, the element heats up in a matter of milliseconds and the impedance increases dramatically. The way in which the impedance settles to its steady state is unique to the light, transformer or motor and can provide a fingerprint for identifying the device.

In our project, the goal was to classify different inrush current decay curves so as to determine the type of device. We approached the problem from an experimental point of view, looking at different aspects of the raw data (the current curves) until we were able to identify some useful features in the data. We obtained the decay curves by extracting the peaks in the current curves (see the section on "Peak Extraction"). Noting that these features alone were inadequate to effectively classify the 173 test examples our seven classes of electrical devices, we added a new set of features (the steady-state samples) that allowed us to see the shapes of the stead-state current curves. We then used largest margin nearest neighbor (LMNN) to classify the devices based on a learned distance metric, as opposed to a simple Euclidean distance metric. In the end, with these new additions to our methodology we were able to accomplish our goals with an exciting degree of accuracy.


-top-

Raw Data:

Sources of Data

We recorded data from 7 different types of devices:

These are single examples of the absolute values of the current curves for each device type:



Problems with Data

The main difficulty that arises has to do with the effect that AC (alternating current) has on the data we collect from the oscilloscope. The inrush peak for a device like a lightbulb, is proportional to the amount of instantaneous current available when switched on. In the case of alternating current, the magnitude of current that is available from the socket is sinusoidal with a frequency of 60Hz. Therefore, 120 times a second, there will be times of peak current and zero current, each 90 degrees phase shifted from each other. When the device is turned on at the peaks, the inrush current is greatest. When the device is turned on at times of zero current, or "zero crossings," the inrush is barely measureable.

This results in two things: (1) there is a different amount of time between the initial inrush peak and the second peak (for an example, see the image above). And (2) the initial peaks can have varying magnitudes (see above image). We felt that both of these results would make extracting meaningful feutures via regression problematic. Instead we have devised a two-fold feature extraction scheme, which is outlined in the "Peak Extraction" and "Steady-state sampling" sections below.

Another problem with the data that affected our feature extraction algorithm was noise. Here is a section of a curve showing the amount of noise:

This forced us to do some pre-processing of the data that would help alleviate this issue.


Pre-processing

First, to remove some of the noise from the raw data, we employed a MATLAB smoothing function, smooth(V, 15, 'lowess'), where V was the vector containing the Current values for some example curve. This smoothing function employed local regression using weighted linear least squares and a 1st degree polynomial model with a span of 15 points. Here we see how well it smoothed the noisy data shown in image above (the red line is the result of the smoothing function):

After smoothing, we took the absolute value of the current values in order to make the process of finding peaks easier -- instead of searching for local maxima and local minima, we only need to search for local maxima.

-top-

Algorithms:

Peak extraction

Less Successful Method
For each example curve:

  1. We begin with some preprocessing during which we find the global maximum of the example curve; we remove all data from before this point because this is the inrush current peak. We then take the absolute value of all current values (because AC gives us both positive and negative currents).
  2. Our algorithm for feature extraction calculates the peaks of the current signal as follows:
    1. We examine the curve incrementally in 3 millisecond segments (this is our 'step size'). On either edge of the current segment, there are 1 millisecond wide 'gray areas' or boundaries. The peaks we look for must be at least 0.0065 seconds apart: since the frequency of the current is 60 Hz, the current has a period of 1/60 second. Because we are looking at the absolute value of the current, we get two peaks every period (i.e. period is now actually 1/120 second). This is approximately 0.0083 seconds. To allow for some fluctuation in the data, we look for a minimum time difference of 0.0065 seconds. This time interval is an attempt to ensure we do not pick out false peaks (smaller peaks due to noise that occur too soon). We know by the regularity of the current that the true peaks should occur after this amount of time has passed.
    2. We attempt to find the local maximum within the current segment. Noise becomes a problem, though. The way we determine a local maximum is when the maximum within the segment appears somewhere in the middle of the segment (i.e. not at either of the edge points). Unfortunately, within our segment there might be an apparent local maximum due to noise (see the image below). This is where our 'gray areas' come into play. If the apparent local maximum occurs within a 'gray zone', we wait to verify it when we move to the next segment. The new left segment edge is placed at the left edge of the 'gray area'. Now if the local maximum for the new segment does not appear in that gray area, then we know that it was only noise and that the true local maximum is elsewhere.

    3. We keep track of all the local maximums that we verify and use these as features for classification.
  3. Unfortunately, due to the variable position of the initial peak (as mentioned above) this method does not always pick out all the peaks. The time interval constraint described above is thrown off by initial peaks that occur too close to the second peak. We then begin missing peaks because they don't seem to satisfy the time interval constraint. The improved method described below takes care of this problem.

More Successful Method
This method behaves exactly the same as the above method, except that we search backwards (from a point where the peaks are consistently spaced).

This method performed significantly better than our first attempt. For some example curves the first method would only extract 3 peaks, whereas this method never extracted fewer than 8.
Here is an example of the results of our algorithm (the raw data is green and the extracted peaks are blue):


Steady-state sampling

This algorithm is built off of the peak data extracted by the above algorithm. Here is an example of the data extracted by this process (as above, the raw data is green and the extracted peaks are blue):


LMNN

Largest margin nearest neighbor (LMNN) learns a Mahanalobis distance metric for kNN. The goal is a metric where the “k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin”[1].

For our project, we used the implementation of LMNN written by Kilian Q. Weinberger, 2006. It was found at http://www.weinbergerweb.net/Downloads/LMNN.html.

-top-

Extracted Features

We essentially extracted 3 groups of features from the raw data: We used these 19 features to classify the devices with LMNN.
Before training our model, we scaled all of the peak values by dividing them all by the value of the initial peak. This was intended to allow us to classify devices based on the shapes of their current graphs, rather than on the amplitude which differs from device to device.
-top-

Results

We ran a 'rough ablative analysis' to see the effects of our choice of features. It was not a true ablative analysis because we did not remove each feature one-by-one. Instead we looked at our classification results after removing one of the sets of features that were described in the section above. For these tests k = 3.

We began with the normal case, including all 19 features (peak features, time feature, and steady-state sample features).

Here we immediately see the impact of the improved distance metric learned through LMNN. We had a test error of 5.2% with kNN using Euclidean distance; but when we learned an optimal distance metric with LMNN, we ended up with a test error of only 0.58%. Furthermore it only took 11.33 seconds to train this model. We have both accuracy and speed here.

Now we will look at what happens when we remove the 8 peak features -- we're classifying with 11 features: the 10 steady-state sample features and the 1 time feature.

This gives us some strange results in the case of normal kNN. The training error is actually higher than the test error. We get a good test error here for normal kNN with Euclidean distance (around 1.5%). It actually beats the LMNN test error here, which is quite high compared to the previous test. Another thing to note here is that training this time took almost 10 times as long as the previous example. It took just under 104 seconds. One might expect that reducing the number of features would reduce the training time, but here with some of the features withheld LMNN needs to work much longer to find an optimal distance metric. This certainly shows the importance of the peak features as a group.

Last we will explore the effect of withholding the steady-state sample set. Now we are using 9 features to classify: the 8 peak features and the 1 time feature.

Here the LMNN reduces the amount of error slightly from the error that Euclidean kNN got. However, both errors are very high -- over 16% for Euclidean kNN and around 15% for LMNN. This very clearly identifies the steady-state sample features as the most important set of features for our classification system. For LMNN we get a drop of around 14.5% when we add the steady-state sample features back into our full set of features. The training time here was 34.82 seconds, which is still significantly higher than the training time with the full set of features (11.33 seconds). This again tells us that the features used here are insufficient for LMNN and it therefore take a longer time trying to find an optimal metric, because an optimal metric is harder to come by with this set of features.

Removing the time feature from the set had no effect whatsoever on the training or test error for either Euclidean kNN or LMNN. We could safely remove the attribute without any repercussions.

Lastly we will look at all of the LMNN errors together:

It becomes quite clear that our full set of features (minus the time feature) is the best set of features to use for classification with LMNN. As we remove features (moving right along the chart), we see that our model is underfitting. The reduced set of features is inadequate for describing the complexity of the problem. On the other hand, the full set of features combined with the new distance metric learned through LMNN allows for a high degree of accuracy while maintaining an impressive training speed.


-top-

References

[1] K. Weinberger, J. Blitzer, and L. Saul. Distance metric learning for large margin nearest neighbor classification. In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages 1473-1480, Cambridge, MA, 2006. MIT Press.
[2] L. Yang. An Overview of Distance Metric Learning. http://www.cs.cmu.edu/~liuy/distlearn.htm.
[3] M. H. J. Bollen and I. Y.-H. Gu. Signal processing of power quality disturbances. IEEE Press series on power engineering. Chapter 9. Piscataway, NJ: 2006. IEEE Press.
-top-