A Guesser's Guide to Google Spreadsheets

May 4, 2017

Microsoft Excel is a programming language, and arguably the most popular one in existence. Almost every Fortune 500 company uses it, in addition to a hunge number of small businesses. If you're using Google Spreadsheets - considering the overlap in features, formulas, and functionality - it wouldn't be far from the truth to say that you're using Excel. When I started reading up online about MSE and GS features, it dawned on me that the most popular programming language on the planet is one that I'm not familiar with. With that in mind, I started exploring the features of MSE and GS through the lens of a programmer; What are the data types? How are types cast? What are the errors and how are they handled? Through my trial-and-error experiments in Google Spreadsheets, I've figured out the answers to most of thee questions. So this is a post about an exploration of GS, and reverse engineering it.

Types

Before we jump into types, it's worth writing about how GS displays values. It seems that while the values inside cell are stored as primitives, the cells also store a display type which is used to determine what the formatted value looks like. For example, Dates are stored as number values representing the number of days since Jan 31st, 1899, but the cells keep track of the data type, so the number can be displayed in the cell as a more human readable string. Furthermore, it appears as if a cell will inherit the display type of child values. For example, =10 + $100 will result in a displayed value of $110.

So here are the types:

  • String: Many formulas call this type "text", possibly because it's easier to understand to a non-technical user.
  • Number: Numeric values only. Although I've not fully worked out the Regular Expression to parse numerical values, it seems that number strings with commas in them (e.g. "1,000,000") and strings with currency values (e.g. "$101.49") are parsed to numbers.
  • Boolean: Anything logically interpreted as true or false. Again, when interpreting raw text input inside a cell, GS will automatically convert anything matching the regex ^(?i)(true|false)$ to a boolean value. It will also cast 0 and 1 to false and true, respectively.
  • Date: Inherits from primitive number. Any operation that can be performed on a number can also be performed on a Date. Represents the number of days since January 31st, 1899, inclusively.
  • Time: Also inherits from the number. Represents the fraction of a full day, between 0 and 1, exclusive on the end.
  • Array or Range: Some parts of the documentation refers to lists of values as arrays, while parameter documentation for individual formulas call them "ranges." Ranges can be literals: {1, 2, 3, 4} or cell ranges in A1-Notation: A1:A4, but it seems that some formulas handle the two differently (more on that later.)

Ranges and Arrays are really just columns or rows. But surprisingly, GS doesn't specify the difference in documentation. When selecting a range you can select across both dimensions, but some functions return a ColumnArray, and some will return a RowArray. It's implicit, and, like type conversion/coercion, the GS documentation doesn't mention it at all. For instance, =SPLIT("1,2,3", ",", TRUE) will return values that fill the columns after it (e.g.: A2:D2, a RowArray). The formula =UNIQUE(F1:F10) will not. It will instead return values filling the rows after it (e.g.: A2:A4), unless you call it with a literal range, like this: =UNIQUE({1, 2, 3, 4}).

The difference between a ColumnArray and RowArray with the same function.

It is an important distinction to be aware of when working with data in GS, because depending on how you think of your data and how you think of functions/formulas "normally" operating, you may be surprised at the result. If you think of your columns as "fields" and your rows as "objects" you might be bewildered to find that the formula you're using thinks of columns as "objects", and rows as "fields". To be fair, many of the functions that project data out in one dimension instead of another aren't functions that you would really want to flip, and GS does offer the =TRANSPOSE() formula to convert columns to rows and visa versa.

Converting Types

Now that we know the types, the natural next step is type conversion, casting, or as GS calls it, coercion. While the GS documentation doesn't explicitly tell you how it casts types, I've found that GS will always attempt to convert the input types into the type the formula expects. It seems to use a type hierarchy to convert down or up until the value is the type the formula expects. Strings will be coerced down to numbers, and numbers down to booleans.

GS has a concept of null or undefined, but doesn't expose it directly to the user. Fetching a range of empty cell values returns something akin to a null or undefined value. Null and undefined values will really only occur on a cell fetch. So under the hood, it looks like encountering these values will cause a number cast to fail without throwing a user-facing error. An example of this is the SUM function.

The difference between fetching a range, and literal range on coercing values.

GS also handles ranges passed as single values gracefully. When calling a single parameter formula with a range as that single parameter, GS will take the first non-null or non-undefined value from the range, and use that as the parameter. For example, =ATAN(0.9) will return the same value as =ATAN({0.9}).

Errors

This brings me to errors, and error handling. Interestingly enough, GS doesn't have documentation for the different types of errors, instead opting for telling you the info you need to know when the error is thrown. So here are the different types of errors, and how they can occur.

#N/A: This is the simplest one, and results from calling a function with the wrong number of arguments. E.g. "Wrong number of arguments to ABS. Expected 1 arguments, but got 0 arguments."

#DIV/0!: Unlike some programming languages that will handle a 0-divisor by returning Infinity, GS appears to throw a specific error for this case. E.g. "Evaluation of function ATAN2 caused a divide by zero error."

#VALUE!: This is one of the coercion errors I mentioned before. When unable to coerce a value of one type to another, GS will throw this error. E.g. "Function ATAN parameter 1 expects number values. But 'invalid_string_here' is a text and cannot be coerced to a number."

#REF!: Reference error indicating that either the literal range passed to the formula is empty, or the cell range specified is inaccessible. E.g. For the formula =SUM({}) we will get "Reference does not exist." but for =SUM(Sheet4!ZZ10:ZZZ44) we will get "Unresolved sheet name 'Sheet4'."

#NAME?: Thrown when attempting to use a formula name that does not exist. E.g. =HELLO() results in "Unknown function: 'HELLO'."

#NUM!: Thrown when a numerical value is out of the accepted range for a formula. E.g. The formula =ACOS(2) will throw "Function ACOS parameter 1 value is 2. Valid values are between -1 and 1 inclusive."

Before this deep-dive into GS, I really didn't know that you could catch and handle errors yourself. But you can! You can cast errors as enums, and then check them before they propigate by using IF, ISERROR, and ERROR.TYPE.

=IF(ISERROR(A8/B8), IF(ERROR.TYPE(A8/B8)=2, "Missing a Quantity", "Check numbers"), A8/B8)

It's not exactly an elegant solution, but for dirty datasets it can be useful.

Through my experiments in GS, I've found that most of the components of a programming language are there, sometimes cleverly hidden, sometimes perplexing. However, when it comes to the implementation of MS Excel formulas inside GS, some of them return considerably different values. But that's a post for another day...