• # Fun with Statistics Calculations

A while back I was working on a system where we were going to score work items to measure risk of auditing.  Higher numbers would most likely result in an audit while lower numbers would pass.  The exact mechanism of measuring the risk is immaterial for this post, so we’ll treat it as a black box number.  Furthermore, we calculate the risk on all work items but only update our statistics (as described below) on work items that actually did get audited.

I wanted to know whether the audit score for a particular work item was very far away from the mean or very under the mean.  If it was low, the audit risk should be low and vice versa.  What we are looking for here is a “sigma” level – or a number that indicates how far away from the mean something is.  If something has a sigma level of zero, it means it is equal to the mean.  If it has a sigma level of 1, it means it is 1 standard deviation above the mean.  -1 means that it is one standard deviation below the mean.  Lower levels of sigma are generally better than higher ones in this system.  In normalized data, we’d expect over two-thirds of the work items to score within +/- 1 sigma.  A sigma number of 6 or higher means that the score would be a very large outlier.

To calculate this sigma value, we need two primary pieces of data – the mean and the standard deviation of the population or sample (i.e. the audit risk scores).  I did not want to calculate these values over the entire set of data each time I wanted to compute the sigma level – I just wanted to add it to the previous mean and standard deviation to make calculations really fast.

Let’s start with the mean.  If we save the number of data points used (n) and the previous mean calculated (ca), we can derive the new mean given a new data point (x) with the following formula:

new_mean = (x + n * ca) / (n + 1)

Or in C#:

```public static double CalculateNewCumulativeAverage(int n, int x, double ca)
{
return (x + n * ca) / (n + 1);
}
```

The standard deviation calculation is a little harder.  The Wikipedia article at http://en.wikipedia.org/wiki/Standard_deviation#Rapid_calculation_methods describes a method for rapid calculation that requires you to only provide the following variables to compute a new standard deviation given a new data point (x): n – the number of previous data points used, s1 – sum of all previous x’s, and s2 – sum of all previous x^2 (squared).  Here’s the formula in C#:

```public static double CalculateNewStandardDeviation(int n, int x, int s1, long s2)
{
if (n == 0)
return double.NaN;
s1 += x;
s2 += x * x;
double num = (n + 1) * s2 - (s1 * s1);
double denom = (n + 1) * n;
return Math.Sqrt(num / denom);
}
```

This will be a very fast way of calculating standard deviation because you simply don’t have to go over all data points (which also means not reading values out of a database).

The sigma value I talked about earlier can then be calculated given the data point (x), cumulative mean (ca) and standard deviation (s):

```public static double CalculateSigma(int x, double ca, double s)
{
return (x - ca) / s;
}
```

So all you will need to store in your database is the following scalar values to calculate these stats:

1. Total number of data points (n).
2. Sum of all data points (s1).
3. Sum of the squares of all data points (s2).
4. Cumulative average or mean (ca).
5. Current standard deviation (s).

To add a new data point (x) and update all variables to new values:

```public static void AddDataPoint(int x, ref int n, ref int s1, ref int s2, ref double ca, ref double s)
{
ca = CalculateNewCumulativeAverage(n, x, ca);
s = CalculateNewStandardDeviation(n, x, s1, s2);
n += 1;
s1 += x;
s2 += x * x;
}
```

Categories: C#

Tags: Algorithms

• # Using C# Implicit Type Conversion

I was recently required to connect to a SOAP-based web service that had a very nasty WSDL.  It contained tons of complex type definitions for objects that really didn’t need to exist (strings and decimals and integers would have been sufficient).  For example, here’s a snippet of a complex type defined in the WSDL that I needed to consume:

```public partial class TOTAL_AMT_TypeShape : object, INotifyPropertyChanged
{
private decimal valueField;

public decimal Value
{
get { return this.valueField; }
set
{
this.valueField = value;
this.RaisePropertyChanged("Value");
}
}

// snip...
}

```

This class is created by Visual Studio and is the proxy for the TOTAL_AMT_TypeShape object.  As you can probably surmise, this is simply a complex type wrapping a number (a C# decimal to be exact).  The name is awful, and the whole premise of requiring a complex type for a simple number amount (a dollar amount in this case) makes the use of this type really awkward:

```decimal amount = 100000.0m;
TOTAL_AMT_TypeShape totalAmt = new TOTAL_AMT_TypeShape() { Value = amount };
```

Now imagine this like times 100.  It can get really ugly, really fast.

My solution was to rely on partial classes and implicit type conversion.  Implicit type conversion is a C# feature that allows you to convert between data types automatically by informing the compiler how to perform the conversion.  The conversion happens automatically and should only be used where the conversion will not result in any data loss or possibly throw an exception (if either of those scenarios exists, you should use an explicit cast conversion instead).  An example of an implicit conversion built into C# would be int to long conversion.  Since an int always fits inside a long, we can assign a long variable to an int without any special effort.  The opposite is not true, however, and we’d need to explicitly cast.

Here’s my partial class with the implicit conversion operator added:

```public partial class TOTAL_AMT_TypeShape
{
public static implicit operator TOTAL_AMT_TypeShape(decimal value)
{
return new TOTAL_AMT_TypeShape() { Value = value };
}
}
```

The implicit conversion operator overload is defined for converting a decimal to a TOTAL_AMT_TypeShape (the output type is always the name of overload method).  We could also go the other way (convert a TOTAL_AMT_TypeShape into a decimal, but I didn’t need to in my case).  And because C# allows partial class definitions, if our proxy object definition changes because of a WSDL refresh, we keep our partial intact and the code for the implicit conversion won’t be overwritten.

Here’s how we’d use it now:

```TOTAL_AMT_TypeShape totalAmt = 100000.0m;
```

Nice and neat.

Categories: C#

• # Distributing Monetary Amounts in C#

Many times in financial applications, you will be tasked with distributing or allocating monetary amounts across a set of ratios (say, a 50/50 split or maybe a 30/70 split).  This can be tricky getting the rounding right so that no pennies are lost in the allocation.

Here’s an example:  split \$0.05 in a 30/70 ratio.  The 30% amount becomes \$0.015 and the 70% will be \$0.035.  Now in this case, they both add up to the original amount (which is an absolute requirement) but they must have a half penny each to accomplish this.  We can’t have half pennies in this situation, so something has to be done with the extra penny.

Now the specifics on where you allocate the extra penny are up to your business requirements, so the solution I present below may not be exactly what you need.  It should, however, give you an idea on how to do this split without losing the extra penny.  Here’s a static method that will take a single monetary amount (as a C# decimal) and an array of ratios.  These ratios are your allocation percentages.  They could be the set [0.30, 0.70] or even [30, 70].  They don’t even need to sum to 1 or 100, it doesn’t matter:

```public static decimal[] DistributeAmount(decimal amount, decimal[] ratios)
{
decimal total = 0;
decimal[] results = new decimal[ratios.Length];

// find the total of the ratios
for (int index = 0; index < ratios.Length; index++)
total += ratios[index];

// convert amount to a fixed point value (no mantissa portion)
amount = amount * 100;
decimal remainder = amount;
for (int index = 0; index < results.Length; index++)
{
results[index] = Math.Floor(amount * ratios[index] / total);
remainder -= results[index];
}

// allocate remainder across all amounts
for (int index = 0; index < remainder; index++)
results[index]++;

// convert back to decimal portion
for (int index = 0; index < results.Length; index++)
results[index] = results[index] / 100;

return results;
}

```

(Another thing here is that I am assuming pennies and dollars, or at least a currency system that divides their unit of currency into hundredths – notice the multiply by 100.  You can change that for universal currency systems to support any currency).

This code works by converting amounts to a fixed point value with no mantissa.  So in our example, \$0.05 turns into 5.  We then iterate over each ratio and compute the amount to distribute using a simple division.  The trick here is the Math.Floor().  We round all half pennies down.  The half-penny will stay in the remainder variable.

At the end of the distribution, we then distribute all of the fractional pennies that have built up evenly across the distributed amounts.  If there are no remaining pennies to distribute, it simply ends.  So in this implementation, the first ratios in the set tend to get the extra pennies and the last one loses out.  You can change this behavior to be almost anything you like (such as random, even-odd, or something else).

At the very end, we convert back to a decimal portion by dividing each amount by 100.

The final results for \$0.05 would be { 0.02, 0.03 }, which adds up to 0.05.

Categories: C#

Tags: Algorithms

• # Revisiting War of Words

Back in March 2010, I released a game called War of Words on the XBox 360 platform (Indie games).  This game was a hybrid RPG/Word game that used word spelling as the princaipal combat mechanism in the encounters.  It was very similar to Puzzle Quest in spirit, although the core game play mechanisms were quite different.

I had a lot of fun working on that game (and a lot of frustrations too).  I had hoped that it would have done better in the marketplace than it did, but Indie Games was not promoted by Microsoft much and I think the game also lacked some polish that would have made it more professional like an Arcade title.  For example, some graphics weren’t so great (as they were created by me) and the storyline was not very interesting (my fault again, as I am not a good writer either).  I do think the game was better than the average Indie game.  It currently is rated as 3.5/5 stars on the XBox marketplace with 232 votes (hardly any Indie games score over 3 stars, and many Arcade and AAA titles struggle to get over 4 stars).

## Economics

This game cost me about \$600 USD to make.  About \$250 of it was for a few hours of an artist’s time to draw the majority of the graphics.  Another \$200 was spent on audio/music licensing.  I also had a domain name and website (which has been taken down) which cost \$100 for a year.  I bought a few misc. things like video capture software to take video for promotions.

As far as revenue, I can’t say exactly how much it made because the history of payment is long gone (most of the profits were made in the first 3 months of release and Microsoft does not keep more than about 18 months of history).  I started out selling the game for 400 points (\$5 USD) but later dropped it to 240 points (\$3 USD).  I make about 70% of that amount per game.  I do know that dropping the price increased the purchase to trial ratio to almost 25% which is quite excellent.  I think before the price drop, the ratio was between 10-12%, which is pretty good too.

I also could not translate the game to multiple languages (the fact that it is an English word game makes this doubly challenging than perhaps a simple shooter game).  I sold it in all XBox markets in the hopes that English speakers there would play it.  Foreign sales were pretty strong actually compared to what I thought.

## Sequel Plans

I originally was very optimistic about a sequel even with low sales numbers.  I knew that I could take all of the money I made and the code/content I had and make a better sequel that would probably have made more money and taken less time to create.  But I lost interest.

I basically quit Indie Games because I felt that the peer review concept was not optimal.  In fact, I was pretty much correct as Microsoft has basically dumped the technology behind Indie Games (XNA) and has focused instead on DirectX 11 for Windows 8 and whatever XBox One has.  Their stance on Indie publishing on the XBox One makes it likely that Indie Games won’t even run on it let alone Game Studio being ported to it.  You could tell something was up when guys like Shawn Hargreaves started to leave the team.

I have a lot of design documents and ideas in my head for a sequel (including turning it into a turn-based game), but it will never happen with XNA, which is a real bummer to be honest.  I liked the platform and Game Studio was really cool.

I toyed with putting it on Windows Phone 7 for a while and even had a very small prototype but in the end, it just didn’t feel right on a phone (without major changes to its design) and WP7 has no users and so a very small market.  I could try iOS, but it’s flooded and I don’t like Apple and I’m not versed in any of their programming languages/platforms.

I think turning it into a HTML5 game would be interesting.  This type of game would be best for touch or mouse-clicking I think than using a controller.  But I’m pretty busy doing real paying work and being with family than doing this kind of thing.

## More to Come

I’m planning on blogging about this game in more detail, especially with technical stuff like building directed acyclic word graphs (DAWGs), searching word graphs, building AIs that play games, character and RPG stats systems, etc.

Categories: C#, Games

Tags: War of Words

• # Another Technique for Tree Traversals

In the previous post, I talked about doing a DFS on a rooted tree using the C# foreach loop/iterator concept.  In this post, I will discuss a different technique that is more function-based.

A common task on a tree is to traverse it in some defined order – pre-order, level-order, post-order, etc.  When we visit a node, we’d like to perform some action on it.  We can use delegates in C# (reference to a function) and lambdas to make this really compact and elegant:

```public static void VisitDfs(TreeNode node, Action<TreeNode> func)
{
if (func != null)
func(node);
foreach (TreeNode child in Children)
{
VisitDfs(child, func);
}
}
```

If you pass in the root node to this method, you can call the function you send in on each node in that order.  You can apply this concept for other traversal patterns too:

```public static void VisitBfs(TreeNode node, Action<TreeNode> func)
{
Queue<TreeNode> q = new Queue<TreeNode>();
q.Enqueue(node);
while (q.Count > 0)
{
node = q.Dequeue();
if (func != null)
func(node);
foreach (TreeNode child in Children)
{
q.Enqueue(child);
}
}
}
```

This performs a breadth-first traversal (also called a ‘level order’ traversal with trees), calling the specified function at each node visit.

Here’s a sample usage of the DFS to count the number of leaves in a tree:

```public static int CountLeaves(TreeNode root)
{
int leafCount = 0;
VisitDfs(root, node => {
if (node.Children.Count == 0) {
leafCount++;
}
});

return leafCount;
}
```

Categories: C#

• # Performing a DFS over a Rooted Tree with a C# foreach Loop

I’ve been working with trees a lot lately and one thing that occurs quite frequently in my problem space is the action of performing a depth-first search (DFS) or a 'pre-order traversal’.  In a DFS, you visit the root first and then search deeper into the tree visiting each node and then the node’s children.

Since this is so common, I’d like a routine that helps me do that very simply.  What I’d really like to do is simply do a foreach loop over the tree and with each iteration pull out the next node that would satisfy a DFS.  This is where the C# yield keyword comes into play.

The following snippet presents a TreeNode class which is just a generic node in a tree that has N children:

```public class TreeNode
{
public TreeNode()
{
Children = new List<TreeNode>();
}

public TreeNode Parent { get; set; }
public List<TreeNode> Children { get; private set; }
public bool IsLeaf { get { return Children.Count == 0; } }
public bool IsRoot { get { return Parent == null; } }

public IEnumerator<TreeNode> GetEnumerator()
{
yield return this;
foreach (TreeNode child in Children)
{
IEnumerator<TreeNode> childEnumerator = child.GetEnumerator();
while (childEnumerator.MoveNext())
yield return childEnumerator.Current;
}
}
}
```

Most of the functionality is in the GetEnumerator() method.  When a class in C# has a method called GetEnumerator() that returns a type IEnumerator or IEnumerator<T>, you’ll be able to use that type in a foreach loop.  Also notice that the type returned by my method does not return IEnumerator<TreeNode> but rather just TreeNode.  The C# compiler will fill in the proper details automatically.

In order to do a DFS, we first visit the node itself by returning it (this) in a yield statement.  When the iterator executes the next loop, the GetEnumerator() method remembers the execution state of the method and picks up right where it left off.  At that point, we enter into a loop over the Children collection.  We basically repeat the enumerator process over this collection manually by doing a while loop where we exit the loop if no more children exist.  In the body of this while loop, we execute another return within a yield, returning each child node.

To use this enumerator to perform a DFS, you’d foreach over the root node of your tree (I’m assuming a singly rooted-tree here):

```foreach (TreeNode node in root)
{
// perform action on node in DFS order here
}
```

Categories: C#

• 1