Flox downtime due to glitch in Google App Engine
Holger Weissböck on March 11, 2015As some of you may have noticed, Flox was down for approximately four hours yesterday afternoon (CET). With this blog post, we’d like to give you a quick update on what caused this downtime and how we were able to fix it. In a nutshell, there was a glitch in the way our service provider (the Google App Engine) handles reading files. The most important news first: the service is back to normal!
Problem analysis
Flox uses FreeMarker templates to render its web interface. In order to noticeably speed up delivery of your Flox web interface these templates are pre-rendered and cached whenever a new machine instance in the Flox cluster is started. Unfortunately at 3:00pm CET yesterday Google’s App Engine started to disallow FreeMarker to load templates by throwing exceptions like this one:
java.security.AccessControlException: access denied
("java.io.FilePermission" "/WEB-INF/freemarker/panel/root.ftl" "read")
at com.google.appengine.runtime.Request.process-e5a6df6e4f6e9c58(Request.java)
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:382)
at java.security.AccessController.checkPermission(AccessController.java:572)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
at java.lang.SecurityManager.checkRead(SecurityManager.java:888)
at java.io.File.isFile(File.java:961)
at freemarker.cache.FileTemplateLoader$2.run(FileTemplateLoader.java:165)
at java.security.AccessController.doPrivileged(AccessController.java:63)
...
As you can see, the App Engine environment started denying access to the FreeMarker template files, making it impossible to render the web interface for Flox. Since Flox pre-renders all templates when it starts a machine instance, this problem also caused the Flox REST API to fail. This is why, after investigating and identifying the problem we disabled the prerendering of templates as a workaround to get the API back online. This means that, although the web interface was still not available, the Flox API was back up and running at about 7:30pm.
Further info
At the same time we reached out to the Google folks in every which way we could (e.g. Stack Overflow and Google Groups). The GAE maintenance team reacted rather quickly and whitelisted Flox to allow FreeMarker template access again until they are able to fix the root cause of the problem, as you can see here: code.google.com.
Conclusion
While it seems that the problem was not caused by any error on our side, we know that you rely on a stable and reliable backend at all times. Thus, we are sorry for this downtime and hope that you accept our apologies!