Algorithms

Parsing a Time String with JavaScript

Wed, 31 Jul 2013 14:56:00 GMT

Let’s say you are building a UI where you’d like a user to enter a time into a text box. This time value they might enter needs to handle a lot of different formats, but we’d like it to end up as the same value either way:

1pm
1:00pm
1:00p
13:00

Here’s a little JavaScript function to do this work. Most of the interesting bits are the regular expression which helps handle of these various scenarios. It takes a string representation of a time and attempts to parse it, setting the time on the specified date object sent to the function as the second parameter. In the case where you do not provide a second parameter, the current date will be used.

function parseTime(timeStr, dt) {
    if (!dt) {
        dt = new Date();
    }

    var time = timeStr.match(/(\d+)(?::(\d\d))?\s*(p?)/i);
    if (!time) {
        return NaN;
    }
    var hours = parseInt(time[1], 10);
    if (hours == 12 && !time[3]) {
        hours = 0;
    }
    else {
        hours += (hours < 12 && time[3]) ? 12 : 0;
    }

    dt.setHours(hours);
    dt.setMinutes(parseInt(time[2], 10) || 0);
    dt.setSeconds(0, 0);
    return dt;
}

This function will return NaN if it can’t parse the input at all. The logic immediately following the match() call is to handle the noon/midnight case correctly.

Here’s a jsFiddle of this in action:

Fun with Statistics Calculations

Fri, 26 Jul 2013 17:47:00 GMT

A while back I was working on a system where we were going to score work items to measure risk of auditing. Higher numbers would most likely result in an audit while lower numbers would pass. The exact mechanism of measuring the risk is immaterial for this post, so we’ll treat it as a black box number. Furthermore, we calculate the risk on all work items but only update our statistics (as described below) on work items that actually did get audited.

I wanted to know whether the audit score for a particular work item was very far away from the mean or very under the mean. If it was low, the audit risk should be low and vice versa. What we are looking for here is a “sigma” level – or a number that indicates how far away from the mean something is. If something has a sigma level of zero, it means it is equal to the mean. If it has a sigma level of 1, it means it is 1 standard deviation above the mean. -1 means that it is one standard deviation below the mean. Lower levels of sigma are generally better than higher ones in this system. In normalized data, we’d expect over two-thirds of the work items to score within +/- 1 sigma. A sigma number of 6 or higher means that the score would be a very large outlier.

To calculate this sigma value, we need two primary pieces of data – the mean and the standard deviation of the population or sample (i.e. the audit risk scores). I did not want to calculate these values over the entire set of data each time I wanted to compute the sigma level – I just wanted to add it to the previous mean and standard deviation to make calculations really fast.

Let’s start with the mean. If we save the number of data points used (n) and the previous mean calculated (ca), we can derive the new mean given a new data point (x) with the following formula:

new_mean = (x + n * ca) / (n + 1)

Or in C#:

public static double CalculateNewCumulativeAverage(int n, int x, double ca)
{
    return (x + n * ca) / (n + 1);
}

The standard deviation calculation is a little harder. The Wikipedia article at http://en.wikipedia.org/wiki/Standard_deviation#Rapid_calculation_methods describes a method for rapid calculation that requires you to only provide the following variables to compute a new standard deviation given a new data point (x): n – the number of previous data points used, s1 – sum of all previous x’s, and s2 – sum of all previous x^2 (squared). Here’s the formula in C#:

public static double CalculateNewStandardDeviation(int n, int x, int s1, long s2)
{
    if (n == 0)
        return double.NaN;
    s1 += x;
    s2 += x * x;
    double num = (n + 1) * s2 - (s1 * s1);
    double denom = (n + 1) * n;
    return Math.Sqrt(num / denom);
}

This will be a very fast way of calculating standard deviation because you simply don’t have to go over all data points (which also means not reading values out of a database).

The sigma value I talked about earlier can then be calculated given the data point (x), cumulative mean (ca) and standard deviation (s):

public static double CalculateSigma(int x, double ca, double s)
{
    return (x - ca) / s;
}

So all you will need to store in your database is the following scalar values to calculate these stats:

Total number of data points (n).
Sum of all data points (s1).
Sum of the squares of all data points (s2).
Cumulative average or mean (ca).
Current standard deviation (s).

To add a new data point (x) and update all variables to new values:

public static void AddDataPoint(int x, ref int n, ref int s1, ref int s2, ref double ca, ref double s)
{
    ca = CalculateNewCumulativeAverage(n, x, ca);
    s = CalculateNewStandardDeviation(n, x, s1, s2);
    n += 1;
    s1 += x;
    s2 += x * x;
}

Distributing Monetary Amounts in C#

Tue, 16 Jul 2013 17:42:00 GMT

Many times in financial applications, you will be tasked with distributing or allocating monetary amounts across a set of ratios (say, a 50/50 split or maybe a 30/70 split). This can be tricky getting the rounding right so that no pennies are lost in the allocation.

Here’s an example: split $0.05 in a 30/70 ratio. The 30% amount becomes $0.015 and the 70% will be $0.035. Now in this case, they both add up to the original amount (which is an absolute requirement) but they must have a half penny each to accomplish this. We can’t have half pennies in this situation, so something has to be done with the extra penny.

Now the specifics on where you allocate the extra penny are up to your business requirements, so the solution I present below may not be exactly what you need. It should, however, give you an idea on how to do this split without losing the extra penny. Here’s a static method that will take a single monetary amount (as a C# decimal) and an array of ratios. These ratios are your allocation percentages. They could be the set [0.30, 0.70] or even [30, 70]. They don’t even need to sum to 1 or 100, it doesn’t matter:

public static decimal[] DistributeAmount(decimal amount, decimal[] ratios)
{
    decimal total = 0;
    decimal[] results = new decimal[ratios.Length];

    // find the total of the ratios
    for (int index = 0; index < ratios.Length; index++)
        total += ratios[index];

    // convert amount to a fixed point value (no mantissa portion)
    amount = amount * 100;
    decimal remainder = amount;
    for (int index = 0; index < results.Length; index++)
    {
        results[index] = Math.Floor(amount * ratios[index] / total);
        remainder -= results[index];
    }

    // allocate remainder across all amounts
    for (int index = 0; index < remainder; index++)
        results[index]++;

    // convert back to decimal portion
    for (int index = 0; index < results.Length; index++)
        results[index] = results[index] / 100;

    return results;
}

(Another thing here is that I am assuming pennies and dollars, or at least a currency system that divides their unit of currency into hundredths – notice the multiply by 100. You can change that for universal currency systems to support any currency).

This code works by converting amounts to a fixed point value with no mantissa. So in our example, $0.05 turns into 5. We then iterate over each ratio and compute the amount to distribute using a simple division. The trick here is the Math.Floor(). We round all half pennies down. The half-penny will stay in the remainder variable.

At the end of the distribution, we then distribute all of the fractional pennies that have built up evenly across the distributed amounts. If there are no remaining pennies to distribute, it simply ends. So in this implementation, the first ratios in the set tend to get the extra pennies and the last one loses out. You can change this behavior to be almost anything you like (such as random, even-odd, or something else).

At the very end, we convert back to a decimal portion by dividing each amount by 100.

The final results for $0.05 would be { 0.02, 0.03 }, which adds up to 0.05.