Here at DreamFactory, we frequently get inquires about the scalabilty and security of the DreamFactory platform. We're not surprised. There are many thousands of users running DreamFactory as a REST API backend for important real-world web, mobile, and IoT applications. To help answer the security question, I blogged about DreamFactory security a couple weeks ago. Today, I want to address questions regarding DreamFactory scalability.
The simple answer is that DreamFactory is exceptionally scalable since it was architected that way from the start. The other part of the simple answer is that DreamFactory scalability has a lot more to do with the hardware you run it on than any inherent limitations of the software itself.
To dive into the question more fully, let's look at DreamFactory in terms of horizontal-, vertical-, and cloud-scaling capabilities.
The most important thing to know is that DreamFactory is just a standard LAMP stack running PHP. As far as the server is concerned, DreamFactory looks like a website written in WordPress or Drupal. Instead of delivering HTML pages, DreamFactory delivers JSON documents. But the scaling requirements are otherwise similar. For more demanding deployments, we suggest using the NGINX web server instead of Apache. More on that later.
The reason this is important is because all of the standard things that you and your team already know about scaling simple websites can be directly applied to scaling DreamFactory. This is not an accident. We did it on purpose to make DreamFactory easy to install on any server, and easy to scale for massive transactional deployments of mobile and Internet of Things (IoT) applications.
Vertical Scaling
DreamFactory can be scaled vertically on a single server through the addition of extra processing power, extra memory, better network connectivity, and more hard-disk space. This section discusses how vertical scaling and server configuration can impact DreamFactory performance.
By increasing server processor speed, number of processors, and RAM, you can improve the performance of the DreamFactory engine. Processor speed will especially improve round-trip response times. In our testing, DreamFactory can usually respond to a service request in about 100 to 200 milliseconds.
The other important characteristic is the number of simultaneous requests that DreamFactory can handle. On a single server with vertical scaling, this will depend on processor speed and available RAM to support multiple processes running at the same time. Network throughput is important for both round-trip time and handling a large number of simultaneous transactions.
Default SQL Database
Each DreamFactory instance has a default SQL database that is used to store information about users, roles, services, and other objects required to run the platform. The default Bitnami installation package includes a SQL default database, but you can also hook up any other database for this purpose. When this database is brought online, DreamFactory will create the additional system tables that are needed to manage the platform.
Developers can also create tables on the default database for their own projects. Based on application requirements, mobile projects can query this database in various ways, and this activity can impact performance. The platform user records are also stored in the default database. Anything that you do to boost the performance of this database will increase the speed of the DreamFactory Admin Console and your applications.
Local File Storage
Each DreamFactory instance also needs some file storage for HTML5 web application hosting. Each application is given a folder where the developer might store HTML files, graphic images, CSS style sheets, JavaScript files, etc. Native applications might store other documents or resources in local storage for simple download. The size and access speed of the local file system will impact application performance, just like a normal website.
DreamFactory also stores server-side scripts in the local file storage area. These files are written in JavaScript to customize either the request or response of the API calls running through the engine. DreamFactory uses the V8 engine to execute these scripts. This allows developers to use JavaScript both on the client and on the server, and call the API from either side of the stack. The V8 engine is included in the Bitnami installers and must exist for server-side scripting to work.
Issues with Persistent Storage
Many of the Platform as a Service (PaaS) systems such as Pivotal, Bluemix, Heroku, and OpenShift do not support persistent local storage. For these systems, developers should use the desktop version of DreamFactory to create an application and “push” the application image to the PaaS cloud. The database for PaaS needs to be a remote SQL database like ClearDB, or whatever the vendor recommends. If you use the local file system to create files at runtime, these will disappear when the PaaS image is restarted or when multiple instances are scaled. Working with PaaS is discussed in greater detail under the Cloud Scaling section, below.
External Data Sources
You can hook up any number of external data sources to DreamFactory. DreamFactory currently supports MySQL, Oracle, SQL Server, DB2, S3, MongoDB, Cloudant, and many more. Some of the NoSQL databases are specifically designed for massive scalability on clustered hardware. You can hook up any SQL database running in your data center in order to mobilize legacy data. You can also hook up cloud databases like DynamoDB and Azure Tables.
DreamFactory acts as a proxy for these external systems. DreamFactory will inherit the performance characteristics of the database, with some additional overhead for each REST API call. DreamFactory adds a security layer, a customization layer, normalizes the web services, and implements role-based access control for each service. The scalability of these external data sources will depend on service-level agreements with your cloud vendor, the hardware behind database clustering, and other factors.
DreamFactory vs. Node.js
I’m going to take a small detour here and discuss some of the differences between DreamFactory and Node.js. This is helpful background information in order to understand how DreamFactory can be scaled horizontally with multiple servers, load balancers, and clustered databases.
We considered using Node.js for the DreamFactory engine, but we were concerned that a single thread would be insufficient to support a massively scalable mobile deployment. The workload in a sophisticated REST API backend is quite comparable to an HTML website written in Drupal or WordPress, where multiple threads are required to process all the data.
Another issue was that we needed reliable connectors to a variety of SQL and NoSQL databases. This was a challenge with Node.js. Instead, we choose PHP because the language is in widespread use and has great frameworks like Laravel. The main thing we liked about Node.js was the V8 engine. This allows developers to write JavaScript on the client and on the server. DreamFactory harnesses the power of the V8 engine by using the V8JS extension for PHP, except we run it in parallel for scalability. The V8 engine is also sandboxed for security.
On an Apache server running DreamFactory, DreamFactory uses Prefork MPM to create a new child process with one thread for each connection. You need to be sure that the MaxClients configuration directive is big enough to handle as many simultaneous requests as you expect to receive, but small enough to assure that there is enough physical RAM for all processes.
There is some risk here that you will have more incoming requests than the server can handle. In this case, DreamFactory will issue an exponential backoff message telling the client to try again later. But still, the total number of transactions will be limited. Node.js can potentially handle a very large number of simultaneous requests with event-based callbacks, but in that situation you are stuck with a single thread for all of the data processing.
If you are expecting a massive number of incoming requests, then you should consider running DreamFactory on an NGINX server with PHP-FPM instead of Apache. NGINX can maximize the requests per second that the hardware can handle while reducing the memory required for each connection. This is a “best of both worlds” scenario that allows a conventional web server to handle massive transaction volumes without the processing bottleneck of Node.js.
Horizontal Scaling
This section discusses ways to use multiple servers to increase performance. The simplest model is just to run DreamFactory on a single server. When you do a Bitnami install, DreamFactory runs in a LAMP stack with the default SQL database and some local file storage. The next step up is to configure a separate server for the default SQL database. There are also SQL databases that are available as a hosted cloud service.
Multiple Servers
You can use a load balancer to distribute REST API requests among multiple servers. A load balancer can also perform health checks and remove an unhealthy server from the pool automatically. Most large server architectures include load balancers at several points throughout the infrastructure. You can cluster load balancers to avoid a single point of failure. DreamFactory is specifically designed to work with load balancers and all of the various scheduling algorithms. A REST API request can be sent to any one of the web servers at any time and handled in a stateless manner.
Shared Local Storage
All of the web servers need to share access to the same local file storage system. In DreamFactory Version 1.9 and below, you will need a shared “storage” drive mounted with NFS or something similar. DreamFactory Version 2.0 and higher supports a more configurable local file system. The Laravel PHP Config File specifies a driver for retrieving files and this can be on a local drive, NFS, SSHFS, Dropbox, S3, etc. This simplifies multiple server setup and also PaaS delivery options.
Multiple Databases
The default SQL database can be enhanced in various ways. You can mirror the database, create database clusters for enhanced performance, and utilize failover clusters for high-availability installations. A full discussion of this topic is beyond the scope of this blog post but some additional resource links are provided below.
Performance Benchmarks
Here are some results from Apache Benchmark for DreamFactory running on an NGINX server. The REST API was called 500 times by 100 concurrent users. A large array of JSON objects was returned for each request. Three different Amazon Web Services EC2 instances were tested. The first one was m3.medium, and then m3.large, and finally m3.xlarge.
As you can see, doubling the number of processors doubles the requests per second that the platform can handle. Scaling DreamFactory for very large installations can be accomplished with additional processors on the same box or with additional servers and a load balancer.
Cloud Scaling
Most of the Infrastructure as a Service (IaaS) vendors have systems that can scale web servers automatically. For example, Amazon Web Services can scale EC2 instances with Auto Scaling Groups and Elastic Load Balancers. Auto scaling is built into Microsoft Azure and Rackspace as well. If you want to deploy in the cloud, then please check with your vendor for the options they support.
We discussed Platform as a Service (PaaS) deployment options earlier. These systems do not support persistent local file storage, but the trade-off is that your application instance is highly scalable. You can simply specify the maximum number of instances that you would like to run. As traffic increases, additional instances are brought online. If a server stops responding then the instance is simply restarted.
Scaling Up and Out
Scalability is a key concern for many backend systems. And with large-scale mobile, web, and IoT applications potentially requiring many tens of thousands of transactions per second, we designed DreamFactory to be exceptionally scalable and to give users the freedom to control many of the scalability variables. As I mentioned, DreamFactory is designed to be scaled like a simple website. It supports standard practices for scaling up with additional server capabilities and out with additional servers. We also provide installers or installation instructions for all of the IaaS and PaaS clouds, and some of these vendors handle scaling for you automatically.
Here is a list of resources below that will help you get the most out of your DreamFactory implementation:
Scaling High Traffic Drupal Websites
Automatic Scaling on Microsoft Azure
Automatic Scaling on Amazon Web Services
Automatic Scaling on Rackspace
Installing DreamFactory on NGINX
Laravel Documentation for Shared File Storage
Database Mirroring and Failover Cluster Instances