Building a Spreadsheet with Formulas in Javascript

May 26, 2017

Typescript/Javascript Spreadsheet and Formula - https://github.com/vogtb/spreadsheet

I started this project a couple of months ago with the idea of reverse engineering a spreadsheet in order to better understand them. I started by guessing my way through Google Spreadsheets, using trial and error to figure out the types, errors, and ways that data is stored an formatted. I could have looked all this information up, but I found that figuring it out yourself is more fun. With that as my starting point, I decided not just to blindly fumbly my way through the features of a spreadsheet, but design and build some of those features myself. With a few open-source projects (RuleJS, and FormulaJS) as my starting points, I was eventually able to implement a spreadsheet in Typescript/Javascript that has over 120 formulas, and matches many of the basic features of GS.

Here are some more complex examples of what the library can do.

var Sheet = require("js-spreadsheet").Sheet;
var sheet = new Sheet();
sheet.setCell("A1", 27.13);
sheet.setCell("A2", 44.48);
sheet.setCell("A3", 17.32);
sheet.setCell("A4", 46.27);
sheet.setCell("A5", 68.37);
sheet.setCell("A6", 90.59);
sheet.setCell("A7", 88.95);
sheet.setCell("A8", 55.30);
sheet.setCell("A9", 86.82);
sheet.setCell("A10", 38.68);
sheet.setCell("A11", 35.53);
sheet.setCell("A12", 12.45);
// average of column
sheet.setCell("C1", '=AVERAGE(A1:A12)');
sheet.getCell("C1"); // returns: 50.990833333333335
// average of of rows over 40
sheet.setCell("C2", '=AVERAGEIF(A1:A12, ">=40")');
sheet.getCell("C2"); // returns 68.68285714285715
// sum of rows
sheet.setCell("C3", '=SUM(A1:A12)');
sheet.getCell("C3"); // returns: 611.89

This is a blog post about how I built a spreadsheet library in Typescript/Javascript, and some of the things I learned along the way.

The Open Source Springboard

Writing processing logic and language parsers is really difficult work. I wanted to understand the core of how spreadsheets function, but writing an entire virtual machine in javascript sounded like a nightmare. Luckily, I didn't have to. I was able to take advantage of the work that Handsontable has already done with their Javascript implementation of a spreadsheet, RuleJS. By using their lexer/parser, I was able to get a leg up on understanding how we process tokens, fetch cells, and call functions. More importantly, I was able to focus on defining some higher-level features that I wanted in my spreadsheet.

Javascript In-Memory Storage

While Handsontable's implementation required a browser, I wasn't sure the spreadsheet library I was writing was targeted for the browser. Plus, using JS memory as storage would allow me to write tests for node and debug without a browser. I also wanted a clean JS interface for creating, updating and deleting cells. If someone, maybe my future self, wants to extend this project into an actual application, they would want to save cell values to a storage medium, and having a clean interface for this would enable extendability.

I started by stripping down the parser so that it only interacted with a set of helper and utility functions, and then by adding on a Sheet model for storage. All updates to cells would go to the Sheet model, which would be swapped out for a more durable storage medium later on. Once I had the basic lexer and sheet storage layers function according to my manual tests, I started writing a basic set of tests to ensure that it could parse formulas, and save both the raw text, and computed values to cells.

Dependency diagram.

Diagram of dependency refactoring.

Custom Formulas

I also wanted the formulas library to be flexible, and extensible. If a JS function could be written that threw the same errors as a spreadsheet, it should be able to be called from a cell. But at this point my spreadsheet was still relying on FormulaJS formulas. One by one I started replacing the formulas, implementing and adding in my own layer of errors as I went. Many of the formulas I wrote were directly influenced by FormulaJS, but used my own custom implementation of the errors that Google Spreadsheets or Excel throws.

Error Parity

In Google Spreadsheets, errors are very specific, and often come with a message detailing their origin and cause. I wanted my formulas to be the same. The FormulaJS functions threw many, but not all, of the same errors. I wanted my spreadsheet library to show you not just that an error occurred, but where. As I added more formulas I found that there were a number of rules that I could depend on, and write helper classes for. For example, all formulas had a minimum required number of parameters, some with a maximum. Calling a function outside of this lower, or possibly upper bound, would throw an !N/A error. Below you can see how my spreadsheet library, js-spreadsheet formats and throws !N/A errors, and how Google Spreadsheets throws the same errors.

JS Spreadsheet NA Error.

Example of `js-spreadsheet` throwing NA error.


Google Sheets NA Error.

Example of Google Spreadsheets throwing NA error.


To make it easier to check and throw NA errors inside of the formula functions, I wrote an ArgsChecker class.

class ArgsChecker {
  static checkAtLeastLength(args: any, length: number, caller?: string) {
    if (args.length < length) {
      var functionName = caller !== undefined ? " to " + caller : "";
      throw new NAError("Wrong number of arguments" + functionName + ". Expected " + length
        + " arguments, but got " + args.length + " arguments.");
    }
  }
}
...
var SUM = function (...values) : number {
  ArgsChecker.checkAtLeastLength(values, 1, "SUM");
  ...
};

Example usage of ArgsChecker.

I also found that almost every function would throw an error if it was passed an empty range or array, nested inside of an argument. When parsing arguments passed into a function, I added empty array checks to ensure that we were throwing REF errors as well.

var SUMSQ = function (...values) {
  ArgsChecker.checkAtLeastLength(values, 1, "SUMSQ");
  var result = 0;
  for (var i = 0; i < values.length; i++) {
    if (values[i] instanceof Array) {
      if (values[i].length === 0) {
        throw new RefError("Reference does not exist.");
      }
      result = result + SUMSQ.apply(this, Filter.filterOutNonNumberValues(values[i]));
    } else {
      var n = TypeConverter.valueToNumber(values[i]);
      result = result + (n * n);
    }
  }
  return result;
};

Example usage of RefError being thrown.

All formulas in GS and MSE would throw errors if you attempted to divide by zero, unlike javascript. For these cases, I figured out that the easiest way to handle them is to breakdown the math sections into computing the numerator and denominator first, and checking them with a simple checkDivideByZero function.

Formula Parity

Although there are significant differences between Microsoft Excel and Google Spreadsheets, I wanted my functions to match the two spreadsheets as much as possible. Unfortunately, this meant that I found myself facing something almost no programmer wants to do: write a date-time parser. Excel and Google Spreadsheets parse a dozen different formats automatically, and while both applications have every thorough documentation, neither fully specifies what constitutes a valid date string. Through a long process of trial and error in a Google Spreadsheet, I was able to figure it out though. Rather than try to find a single regular expression that could parse all dates, I broke it down into 8 regular expressions that I could check against a string in a row. I should note here that, even though this is not the most efficient way to parse date strings, it was the easiest to understand, and to modify in the future.

After about a couple weeks of working through it here are just a few of the types of dates the spreadsheet was able to parse:

6/24/92
06/24/1992
6-24-92
6 24 92
6.24.92
6, 24, 92
Sunday 6/24/92
6-24-92 10am
1999/1/01 200000000:999999999:923231312
1992/1/13 12:1989198298am
1992/6/24
1992-6-24 10am
Sun 09 Feb 2017
01 Jan 2017
24/June/1992 10am
2017/01
01-2017 10:10:10
Jul 2017

The process of ensuring that all formulas had feature parity with one of the two big-name spreadsheets involved a lot of manual checking. It also included writing tests directly against output values from GS and MSE. In the end I was able to write and test formulas for even the most difficult formulas, such as WEEKEND, WEEKNUM, and NETWORKDAY.ITL.

Type Parity

All formulas throw errors like Excel, and all formulas use primitive types like Excel. Date formulas return numbers that can be used in mathematical operations. For example =DATE(1999, 1, 1) + 10. But it looks like cells, in addition to having dependencies, have an array of dependency types, and types have hierarchies. If a cell has a formula that fetches the value from a second cell, and if the second cell has a type of a higher order that the first, the first cell will display its resulting value in the format of the second cell's type.

A1=DATE(1999, 1, 1)
A2=SUM(A1, 3)

A1 and A2 will display in the format of a Date, since the type hierarchy seems to be [Date, DateTime, Time, number, boolean, string]. Cells have a typed value, and a format type. I ended up not building this feature, mostly because it's closely tied to the View layer of a spreadsheet, and is more of a UI concern, rather than a performance concern.

Formulas Written

After a couple months of work, here are the formulas I was able to write or re-write.

DATE
DATEVALUE
EDATE
EOMONTH
DAY
DAYS
DAYS360
MONTH
YEAR
WEEKDAY
WEEKNUM
DATEDIF
YEARFRAC
TIMEVALUE
HOUR
MINUTE
SECOND
TWORKDAYS
INTL
NOW
TODAY
TIME
WORKDAY
INTL
BIN2DEC
BIN2HEX
BIN2OCT
DEC2OCT
DEC2HEX
DEC2BIN
DELTA
DDB
DB
DOLLAR
DOLLARDE
DOLLARFR
EFFECT
PMT
CUMPRINC
CUMIPMT
ACCRINT
AND
COUNTIF
COUNTIFS
ROUND
ROUNDDOWN
ROUNDUP
SUMIF
SUMSQ
TRUNC
RADIANS
DEGREES
ERFC
ERF
SUMX2PY2
SUMX2MY2
UNTUNIQUE
UMPRODUCT
COMBIN
DEVSQ
MEDIAN
AVERAGE
AVEDEV
AVERAGEA
CORREL
PEARSON
EXPONDIST
EFTTAILED
FINV
FISHER
FISHERINV
MAX
MAXA
MIN
MINA
AVERAGEIF
COUNT
COUNTA
ARABIC
CHAR
CODE
SPLIT
NCATENATE
CONVERT
EXACT
TRUE
FALSE
NOT
OR
XOR
ABS
ACOS
ACOSH
ACOTH
ASIN
ASINH
ATAN
ATAN2
ATANH
EVEN
MOD
ODD
POWER
SUM
SQRT
SQRTPI
COS
COSH
COT
COTH
INT
ISEVEN
ISODD
SIN
SINH
PI
LOG10
LOG
LN
TAN
TANH
CEILING
FLOOR
IF

For documentation see https://github.com/vogtb/spreadsheet/blob/master/DOCS.md.

Code Organization

Since I started this repo based on code from RuleJS, I wanted to limit the other libraries and packages I used. It's so easy to start a node project, and before you know it, you've got a dozen dependencies, some of which do something you could have written yourself. So I started with a firm rule of no dependencies. I was forced to break this to include momentjs, because it would have been nearly impossible to deal with dates and times without it.

Another reason I wanted to avoid pulling in other packages is that I wanted as much of the logic of the formulas to remain inside the formulas. At the risk of breaking the DRY rule, I wanted anyone (including my future-self) to be able to look through the formulas and be able to see exactly what that formula was doing, rather than just seeing something like jStat.erf(lower_bound); and not quite knowing what was going on. I wanted to emphasize code-comprehension over encapsulation.

The testing framework is another extension of the no-dependencies-rule. Rather than using a node testing package, I wrote a simple set of functions to compare equality, and log to the console when you run tests, allowing me to organize the tests in a way that made sense for me and what I was trying to accomplish. The result is a simple set of tests that are easy to understand and debug.

I also wanted to keep my documentation system simple. With a couple lines of bash in doc.sh I was able to have a documentation build system that had no dependencies, and wrote doc-strings to a simple Markdown document of all Formulas inside the project.


If you'd like to take a look at the code you can find at https://github.com/vogtb/spreadsheet and https://www.npmjs.com/package/js-spreadsheet, or you can use it in a node project with npm (npm install js-spreadsheet). Once again, my acknowledgments to Handsontable and Formula.JS for their code that has helped me with this project.


This blog post was edited for spelling on May 29, 2017.

Thanks for reading! If you liked this, you can sign up for my email list-->