Blog

Validate user files with Google Apps Script

today January 6, 2023

Let's say that you have a Google Apps Script that processes files that are generated by other people. Maybe it's your clients, or your team members, or your channel partners who provide the files – whatever the case may be, you build your script to operate on other people's files. In such a scenario I would assume that as a good developer, you would reach out to these people and agree with them on how they should populate and handle those files. Maybe you even give them a file template and ask them not to change its structure, but merely to duplicate it and fill out the information and then submit it to your script.

Writing a script under the assumption that people would always abide by agreed-upon rules is asking for trouble. Even if they mean well, people are still human, and humans can and will – for whatever reason – break the rules. And when they break the rules, they can break down your script. And tracking down the cause of the script error because of a malformed input file can be difficult and time consuming.

What we need is a trust-and-verify approach to Google Apps Script that handles other people's files. We need to run validations on the files before we start processing them to surface any potential issues. If we find such issues then we log them, and we don't process the file. We provide the log back to the file submitters and ask them to fix the issues. They fix the issues, resubmit the file, and we re-run the validations. Now the file is valid, so we can proceed to process it. Problem solved.

I'll show you how I implement my trust-and-verify approach in Apps Script. Let's say that I have a Google Drive folder with a Google Sheet that serves a log and as a container of my script. I have a sub-folder where my clients drop their own Google Sheet files. I'll start with how to structure the validation, and then discuss how to implement a few tests.

Interested in customizing this script? Contact me

Open the Log spreadsheet (mine has one tab also named "Log"). Open Apps Script, and in Code.gs, enter the following:

const g = {
  inputFilesFolderId: '',
  messages: [],
};

function onOpen() {
  const ui = SpreadsheetApp.getUi();
  ui.createMenu('🟢 Automation')
    .addItem('Run automation', 'runAutomation')
    .addToUi();
    
function runAutomation() {
  validateInputs_();
  if (g.messages.length > 0) {
    return;
  }
  // Perform the actual automation...
}
}

The g global object will hold key information that will be used in several functions. The first item is the id of the folder where clients deposit their files. The messages array will contain the messages for the log tab.

The onOpen function will be used by Apps Script to incorporate a custom menu. When I select the Run automation option, the script will execute the runAutomation function. The latter makes a call to a validateInputs_() function, which will run the validation tests. When a test fails, it will add a message to the messages array. So if the array's length is bigger than zero, then the automation will not continue to work and avoid running into problems.

Here's the code for validateInputs_():

function validateInputs_() {
  const testSuite = [];
  let testData = [];
  for (let i = 0; i < testSuite.length; i++) {
    const stopTest = testSuite[i].call(null, testData[i]);
    if (stopTest) {
      break;
    }
  }
  const ss = SpreadsheetApp.getActive();
  const sh = ss.getSheetByName('Log');
  sh.clearContents();
  if (g.messages.length > 0) {
    sh.getRange(1, 1, g.messages.length, 1).setValues(
      g.messages.map((message) => [message])
    );
  }
}

The testSuite array references the name of test functions that we will define momentarily. testData is an array that references the data that each test function will receive in order to run its test. We iterate over the array of test functions. We use the Function prototype.call method to execute the function and pass it the test data. We store the function's output in stopTest. If the latter is true then we stop running the tests. This mechanism helps us with test dependency (e.g, don't run test 2 to check for file type if test 1 that checks for the existence of the file fails).

After we execute the tests, we clear out the Log tab. If there are messages to log then we convert them to a nested array and log them into column A in the "Log" tab.

So this is the approach I take to validate users' files: Before running the automation, I conduct a series of tests. If one or more tests fail then I log their output and do not carry on with the automation. Now we can look at a few examples of file-based testing.

  const testSuite = [hasGoogleSheet];
  let testData = [
    g.inputFilesFolderId,
  ];
            
function hasGoogleSheet(folderId) {
  const fileRefs = DriveApp.searchFiles(`"${folderId}" in parents`);
  if (!fileRefs.hasNext()) {
    g.messages.push('Input folder is empty');
    return true;
  }
  g.file = fileRefs.next();
  const mimeType = g.file.getMimeType();
  if (mimeType !== 'application/vnd.google-apps.spreadsheet') {
    g.messages.push(`File "${g.file.getName()}" isn't a Google Sheet`);
    return true;
  }
}

In hasGoogleSheet, we take in the id of the file folder, and use DriveApp to search for files in that folder. If the client didn't deposit a file then there won't be any file references, in which case we will message that folder is empty and return true, to stop the test suite. Otherwise, we grab the file and point to it in g.file for later use. We check its mime type. If it's not a Google Sheet then we message that and again stop the execution.

  const testSuite = [hasGoogleSheet, hasDataTab];
let testData = [
  g.inputFilesFolderId,
  'Timesheet',
];

function hasDataTab(tabName) {
  const ss = SpreadsheetApp.openById(g.file.getId());
  const sh = ss.getSheetByName(tabName);
  if (!sh) {
    g.messages.push(
      `File "${g.file.getName()}" doesn't contain the "${tabName}" tab`
    );
    return true;
  }
}

In hasDataTab, we ensure that the client-supplied Google Sheet contains a specific tab name (Timesheet) that we wish to run our automation on. We message if the tab doesn't exist and return true.

  const testSuite = [hasGoogleSheet, hasDataTab, hasHeaders];
let testData = [
  g.inputFilesFolderId,
  'Timesheet',
  ['Timesheet', ['Day', 'Task', 'Duration']],
];

function hasHeaders([tabName, required]) {
  const ss = SpreadsheetApp.openById(g.file.getId());
  const sh = ss.getSheetByName(tabName);
  const headers = sh
    .getRange(1, 1, 1, sh.getLastColumn())
    .getValues()
    .flat()
    .map((h) => h.toString().trim().toLowerCase());
  required.forEach((r) => {
    if (!headers.includes(r.toString().trim().toLowerCase())) {
      g.messages.push(`Header row doesn't contain a required column "${r}"`);
    }
  });
  return !headers.includes('duration');
}

In hasHeaders we check if the first row of the Timesheet tab contains specific headers. For this function we pass an array where the first item is the name of the tab and the second is an array of the required headers. We iterate over the first row of the data and message any time the row doesn't include one of our required headers. We return true if the first row doesn't contain 'duration', because we need to have that column present for our next test.

const testSuite = [hasGoogleSheet, hasDataTab, hasHeaders, hasMinHours];
let testData = [
  g.inputFilesFolderId,
  'Timesheet',
  ['Timesheet', ['Day', 'Task', 'Duration']],
  ['Timesheet', 20],
];

function hasMinHours([tabName, minHours]) {
  const ss = SpreadsheetApp.openById(g.file.getId());
  const sh = ss.getSheetByName(tabName);
  const [headers, ...data] = sh.getDataRange().getValues();
  const durationCol = headers.indexOf('Duration');
  const totalHours = data
    .map((row) => row[durationCol])
    .reduce((a, c) => (a += c), 0);
  if (totalHours < minHours) {
    g.messages.push(
      `Total hours of ${totalHours} is less than required ${minHours}`
    );
  }
}

In the last test function we check that the client supplied entries that total at least twenty hours. First, we find the location of the "Duration" column. Then we use map/reduce to pull out the duration value from every row and to sum up all the durations. We compare the total to the desired minimum and message if necessary.

That's it for now. I realize that these types of validations require a lot of code – some of which may be repetitive with the actual automation. But having supported dozens and dozens of such implementations, I can tell you that they are totally worth it. They will save you a lot of time trying to diagnose problems that stem from people doing creative things with the files that you process.

Interested in customizing this script? Contact me