**Typescript/Javascript Spreadsheet and Formula - https://github.com/vogtb/spreadsheet**

I started this project a couple of months ago with the idea of reverse engineering a spreadsheet in order to better understand them. I started by guessing my way through Google Spreadsheets, using trial and error to figure out the types, errors, and ways that data is stored an formatted. I could have looked all this information up, but I found that figuring it out yourself is more fun. With that as my starting point, I decided not just to blindly fumbly my way through the features of a spreadsheet, but design and build some of those features myself. With a few open-source projects (RuleJS, and FormulaJS) as my starting points, I was eventually able to implement a spreadsheet in Typescript/Javascript that has over 120 formulas, and matches many of the basic features of GS.

Here are some more complex examples of what the library can do.

var Sheet = require("js-spreadsheet").Sheet; var sheet = new Sheet(); sheet.setCell("A1", 27.13); sheet.setCell("A2", 44.48); sheet.setCell("A3", 17.32); sheet.setCell("A4", 46.27); sheet.setCell("A5", 68.37); sheet.setCell("A6", 90.59); sheet.setCell("A7", 88.95); sheet.setCell("A8", 55.30); sheet.setCell("A9", 86.82); sheet.setCell("A10", 38.68); sheet.setCell("A11", 35.53); sheet.setCell("A12", 12.45); // average of column sheet.setCell("C1", '=AVERAGE(A1:A12)'); sheet.getCell("C1"); // returns: 50.990833333333335 // average of of rows over 40 sheet.setCell("C2", '=AVERAGEIF(A1:A12, ">=40")'); sheet.getCell("C2"); // returns 68.68285714285715 // sum of rows sheet.setCell("C3", '=SUM(A1:A12)'); sheet.getCell("C3"); // returns: 611.89

This is a blog post about how I built a spreadsheet library in Typescript/Javascript, and some of the things I learned along the way.

Writing processing logic and language parsers is really difficult work. I wanted to understand the core of how spreadsheets function, but writing an entire virtual machine in javascript sounded like a nightmare. Luckily, I didn't have to. I was able to take advantage of the work that Handsontable has already done with their Javascript implementation of a spreadsheet, RuleJS. By using their lexer/parser, I was able to get a leg up on understanding how we process tokens, fetch cells, and call functions. More importantly, I was able to focus on defining some higher-level features that I wanted in my spreadsheet.

While Handsontable's implementation required a browser, I wasn't sure the spreadsheet library I was writing was targeted for the browser. Plus, using JS memory as storage would allow me to write tests for node and debug without a browser. I also wanted a clean JS interface for creating, updating and deleting cells. If someone, maybe my future self, wants to extend this project into an actual application, they would want to save cell values to a storage medium, and having a clean interface for this would enable extendability.

I started by stripping down the parser so that it only interacted with a set of helper and utility functions, and then by adding on a Sheet model for storage. All updates to cells would go to the Sheet model, which would be swapped out for a more durable storage medium later on. Once I had the basic lexer and sheet storage layers function according to my manual tests, I started writing a basic set of tests to ensure that it could parse formulas, and save both the raw text, and computed values to cells.

Diagram of dependency refactoring.

I also wanted the formulas library to be flexible, and extensible. If a JS function could be written that threw the same errors as a spreadsheet, it should be able to be called from a cell. But at this point my spreadsheet was still relying on FormulaJS formulas. One by one I started replacing the formulas, implementing and adding in my own layer of errors as I went. Many of the formulas I wrote were directly influenced by FormulaJS, but used my own custom implementation of the errors that Google Spreadsheets or Excel throws.

In Google Spreadsheets, errors are very specific, and often come with a message detailing their origin and cause. I wanted my formulas to be the same. The FormulaJS functions threw many, but not all, of the same errors. I wanted my spreadsheet library to show you not just that an error occurred, but where. As I added more formulas I found that there were a number of rules that I could depend on, and write helper classes for. For example, all formulas had a minimum required number of parameters, some with a maximum. Calling a function outside of this lower, or possibly upper bound, would throw an `!N/A`

error. Below you can see how my spreadsheet library, `js-spreadsheet`

formats and throws !N/A errors, and how Google Spreadsheets throws the same errors.

Example of `js-spreadsheet` throwing NA error.

Example of Google Spreadsheets throwing NA error.

To make it easier to check and throw NA errors inside of the formula functions, I wrote an `ArgsChecker`

class.

class ArgsChecker { static checkAtLeastLength(args: any, length: number, caller?: string) { if (args.length < length) { var functionName = caller !== undefined ? " to " + caller : ""; throw new NAError("Wrong number of arguments" + functionName + ". Expected " + length + " arguments, but got " + args.length + " arguments."); } } } ... var SUM = function (...values) : number { ArgsChecker.checkAtLeastLength(values, 1, "SUM"); ... };

Example usage of

`ArgsChecker`

.
I also found that almost every function would throw an error if it was passed an empty range or array, nested inside of an argument. When parsing arguments passed into a function, I added empty array checks to ensure that we were throwing `REF`

errors as well.

var SUMSQ = function (...values) { ArgsChecker.checkAtLeastLength(values, 1, "SUMSQ"); var result = 0; for (var i = 0; i < values.length; i++) { if (values[i] instanceof Array) { if (values[i].length === 0) { throw new RefError("Reference does not exist."); } result = result + SUMSQ.apply(this, Filter.filterOutNonNumberValues(values[i])); } else { var n = TypeConverter.valueToNumber(values[i]); result = result + (n * n); } } return result; };

Example usage of

`RefError`

being thrown.
All formulas in GS and MSE would throw errors if you attempted to divide by zero, unlike javascript. For these cases, I figured out that the easiest way to handle them is to breakdown the math sections into computing the numerator and denominator first, and checking them with a simple `checkDivideByZero`

function.

Although there are significant differences between Microsoft Excel and Google Spreadsheets, I wanted my functions to match the two spreadsheets as much as possible. Unfortunately, this meant that I found myself facing something almost no programmer wants to do: write a date-time parser. Excel and Google Spreadsheets parse a dozen different formats automatically, and while both applications have every thorough documentation, neither fully specifies what constitutes a valid date string. Through a long process of trial and error in a Google Spreadsheet, I was able to figure it out though. Rather than try to find a single regular expression that could parse all dates, I broke it down into 8 regular expressions that I could check against a string in a row. I should note here that, even though this is not the most efficient way to parse date strings, it was the easiest to understand, and to modify in the future.

After about a couple weeks of working through it here are just a few of the types of dates the spreadsheet was able to parse:

6/24/92 06/24/1992 6-24-92 6 24 92 6.24.92 6, 24, 92 Sunday 6/24/92 6-24-92 10am 1999/1/01 200000000:999999999:923231312 1992/1/13 12:1989198298am 1992/6/24 1992-6-24 10am Sun 09 Feb 2017 01 Jan 2017 24/June/1992 10am 2017/01 01-2017 10:10:10 Jul 2017

The process of ensuring that all formulas had feature parity with one of the two big-name spreadsheets involved a lot of manual checking. It also included writing tests directly against output values from GS and MSE. In the end I was able to write and test formulas for even the most difficult formulas, such as `WEEKEND`

, `WEEKNUM`

, and `NETWORKDAY.ITL`

.

All formulas throw errors like Excel, and all formulas use primitive types like Excel. Date formulas return numbers that can be used in mathematical operations. For example `=DATE(1999, 1, 1) + 10`

. But it looks like cells, in addition to having dependencies, have an array of dependency types, and types have hierarchies. If a cell has a formula that fetches the value from a second cell, and if the second cell has a type of a higher order that the first, the first cell will display its resulting value in the format of the second cell's type.

A1=DATE(1999, 1, 1) A2=SUM(A1, 3)

A1 and A2 will display in the format of a Date, since the type hierarchy seems to be [Date, DateTime, Time, number, boolean, string]. Cells have a typed value, and a format type. I ended up not building this feature, mostly because it's closely tied to the View layer of a spreadsheet, and is more of a UI concern, rather than a performance concern.

After a couple months of work, here are the formulas I was able to write or re-write.

DATE DATEVALUE EDATE EOMONTH DAY DAYS DAYS360 MONTH YEAR WEEKDAY WEEKNUM DATEDIF YEARFRAC TIMEVALUE HOUR MINUTE SECOND TWORKDAYS INTL NOW TODAY TIME WORKDAY INTL BIN2DEC BIN2HEX BIN2OCT DEC2OCT DEC2HEX DEC2BIN DELTA DDB DB DOLLAR DOLLARDE DOLLARFR EFFECT PMT CUMPRINC CUMIPMT ACCRINT AND

COUNTIF COUNTIFS ROUND ROUNDDOWN ROUNDUP SUMIF SUMSQ TRUNC RADIANS DEGREES ERFC ERF SUMX2PY2 SUMX2MY2 UNTUNIQUE UMPRODUCT COMBIN DEVSQ MEDIAN AVERAGE AVEDEV AVERAGEA CORREL PEARSON EXPONDIST EFTTAILED FINV FISHER FISHERINV MAX MAXA MIN MINA AVERAGEIF COUNT COUNTA ARABIC CHAR CODE SPLIT NCATENATE CONVERT

EXACT TRUE FALSE NOT OR XOR ABS ACOS ACOSH ACOTH ASIN ASINH ATAN ATAN2 ATANH EVEN MOD ODD POWER SUM SQRT SQRTPI COS COSH COT COTH INT ISEVEN ISODD SIN SINH PI LOG10 LOG LN TAN TANH CEILING FLOOR IF

For documentation see https://github.com/vogtb/spreadsheet/blob/master/DOCS.md.

Since I started this repo based on code from RuleJS, I wanted to limit the other libraries and packages I used. It's so easy to start a node project, and before you know it, you've got a dozen dependencies, some of which do something you could have written yourself. So I started with a firm rule of no dependencies. I was forced to break this to include momentjs, because it would have been nearly impossible to deal with dates and times without it.

Another reason I wanted to avoid pulling in other packages is that I wanted as much of the logic of the formulas to remain inside the formulas. At the risk of breaking the DRY rule, I wanted anyone (including my future-self) to be able to look through the formulas and be able to see exactly what that formula was doing, rather than just seeing something like `jStat.erf(lower_bound);`

and not quite knowing what was going on. I wanted to emphasize code-comprehension over encapsulation.

The testing framework is another extension of the no-dependencies-rule. Rather than using a node testing package, I wrote a simple set of functions to compare equality, and log to the console when you run tests, allowing me to organize the tests in a way that made sense for me and what I was trying to accomplish. The result is a simple set of tests that are easy to understand and debug.

I also wanted to keep my documentation system simple. With a couple lines of bash in `doc.sh`

I was able to have a documentation build system that had no dependencies, and wrote doc-strings to a simple Markdown document of all Formulas inside the project.

If you'd like to take a look at the code you can find at https://github.com/vogtb/spreadsheet and https://www.npmjs.com/package/js-spreadsheet, or you can use it in a node project with npm (`npm install js-spreadsheet`

). Once again, my acknowledgments to Handsontable and Formula.JS for their code that has helped me with this project.

This blog post was edited for spelling on May 29, 2017.