AWS Redshift – SQL Functionality on Planet-Scale Hardware

The Problem

Your manager’s peers have been bragging a lot lately about their data warehouses, analytics, and charts, and now a steady stream of data-related questions are being sent your way.  Your department maintains several databases, and the data they contain has the potential to answer everything management is asking for. But the databases are needed for day-to-day operations, and can’t scale to answer these often highly specific questions such as, “How many asparaguses were consumed by men named Fonzie in Cleveland on Tuesdays in 2013?”. How to unlock the potential of this data?

You’ve probably heard of data warehouses, which are tailor-made for this sort of witchcraft. They make it possible to unlock every bit of value from data, and find answers wickedly fast. In the past, creating and maintaining data warehouses meant large, ongoing investments in hardware, software, and people to run them. This would be a hard sell – isn’t the company already spending enough?! Good news, however! In this day of cloud computing, it’s incredibly simple to create, load, and query data warehouses. They typically charge on a usage basis, meaning you don’t need the initial upfront capital investment to get off the ground. And they are super fast – far more powerful than anything you could run in-house.

This post will focus on Amazon Web Services Redshift (Amazon Web Services = AWS). And as a bonus, I’ll demonstrate the incredible Dreamfactory, which automatically builds a slick REST API interface over the top. From there, you’re a GUI away from giving management everything they could ask for, and wowing them with extras they hadn’t even thought of. They can now stand tall amongst their fellow executives, knowing you have their back.

AWS Redshift

AWS Redshift is built upon PostgreSQL, but has been dramatically enhanced to run at “cloud scale” within AWS. There are a few ingredients to this secret sauce:

Column-oriented storage

While you don’t need a deep understanding of what’s happening under the hood to use it, Redshift employs a fascinating approach to achieve it’s mind-boggling performance. Let’s say you have data that looks like the following:
ID NAME CREATED DESCRIPTION AMOUNT

1 Harold 2018/01/01 Membership 10.00

2 Susan 2017/11/15 Penalty 5.00

3 Thomas 2016/10/01 Membership 8.00
Most SQL databases you’ve probably used in the past are row-based, which means they store their data something like this:
1:Harold,2018/01/01,Membership,10.00;

2:Susan,2017/11/15,Penalty,5.00;

3:Thomas,2016/10/01,Membership,8.00;
This is the efficient way to maximize storage, and works well for retrieving data in the “traditional fashion” (rows at a time). But when you want to slice and dice this data, it doesn’t scale very well. If you’ve got large (business-scale) volumes of data, and a variety of ways you want to query it, you can really start to strain your database. Column-based databases, on the other hand, flip this idea on its head, and store the information in a column-based format, with the *data* serving as the *key*. So the above might look something like this:
Harold:1;Susan:2;Thomas:3;

2018/01/01:1;2017/11/15:2;2016/10/01:3;

Membership:1,3;Penalty:2;

10.00:1;5.00:2;8.00:3;
This drastically improves query performance. For example, when searching for “DESCRIPTION == ‘Membership'”, the query only needs to make one database call (“give me the items with a ‘DESCRIPTION’ of ‘Membership'”), instead of inspecting each row individually (as it would have to do in a traditional, row-based database). Very cool, very fast!

Massive Parallelization

When I picture what the AWS cloud must look like, I usually conjure something up from the Matrix (except it’s full of regular computers, rather than, well, humans). Or maybe Star Trek’s “Borg”, a ridiculous planet-cube flying through space, sucking up other civilizations. I guess both of those images are a little disturbing. A safer mental image is this – data centers spanning the globe, loaded with racks and racks of computers, all connected and working together. For most computing tasks, throwing more hardware at the problem doesn’t automatically increase performance. There are bottlenecks that remain in place no matter how many processors are churning away. In our “traditional database” example, this bottleneck is typically disk I/O – the processors are all trying to grab data from the same place. To overcome this, the architecture and storage have to be arranged in a way that can benefit from parallelization. Which is exactly the case with AWS Redshift. Due to the column-based design described above, Redshift is able to take full advantage of adding processors, and it’s almost linearly scalable. This means if you double the number of computers (“nodes”, in Redshift-speak), the performance doubles. And so on. Combine this scalability with the ridiculous number of computers AWS has at it’s disposal (specifically, several Borgs-worth), and it’s like staring out at a starry night. It goes on forever in all directions.

How this works for you

If you’re sold on the power of AWS Redshift, then you’ll be pleased to learn that setup is incredibly simple. AWS documentation is top notch, a crucial thing in this brave new world. When writing this post, I followed their tutorial, and it all went smoothly. Probably took me 15 minutes, and I had the example up and running. If you already have SQL expertise, you won’t have any problem picking up Redshift syntax. There are some differences and nuances, but the standard “things” (joins, where clauses, etc) all work as expected. I typically use Microsoft’s SQL Server Management Studio (SSMS), and was able to connect to Redshift with no problem (after setting it up as a linked server). Your favorite SQL client will presumably work here as well (anything that supports JDBC or ODBC drivers). Once you get your feet wet, there are myriad tools that will load your business data into Redshift. If you’ve got SQL chops in house, I’d start with the AWS documentation, and go from there. If you need a little (or a lot) of help, a whole ecosystem of companies and tools have sprung up around Redshift. A quick Google search will introduce you to them. When you’re up and running, and growing more comfortable demanding more from the system, AWS makes it incredibly simple to add capacity. Thanks to the brilliant Redshift architecture, you just add nodes, and AWS takes care of the rest. Their billing dashboard will show you what it’s costing in real time, with no hidden or creeping costs of data centers, hardware upgrades, things going bump in the night, etc. So much magic happening under the covers, and you get the credit. The joys of cloud computing!

My Humble Example

When writing this, I used the example AWS provides (it consists of a few tables containing some fake Sales data). With everything in place, I can query from SSMS (with a little bit of “linked server” glue syntax):
exec ('-- Find total sales on a given calendar date.

SELECT sum(qtysold)

FROM sales, date

WHERE sales.dateid = date.dateid

AND caldate = ''2008-01-05'';') at redshift

sum

--------------------

210

(1 row affected)
I get a thrill when a chain of systems, architectures, and networks all flow together nicely. Somewhere in a behemoth of a data center, a processor heard my cry, and spun out this result in response. Amazing.

DreamFactory

Now that the company has access to the data, and can gleefully ask any question, they are going to want the dashboards and pretty graphs. Typically you’d use a REST API to feed the data to some sort of UI, but how to do this with Redshift? While management is tickled with their new toy, they will cloud over with suspicion if you now propose a months-long project to make it shinier. In keeping with the theme of “easy, automatic, and powerful”, I’d propose using DreamFactory. In a matter of minutes (literally), it will connect to a data store (both SQL or NoSQL), intelligently parse all the schema, and spin up a REST API layer for doing all the things (complete with attractive documentation). What used to take a team of developers months can now happen in an afternoon! Here are some screenshots of my REST API, completely auto generated from the Redshift example above. It took me about 15 minutes (12 of those spent poking around the documentation) to get this done. For my simple example, I followed their Docker instructions, and in no time was playing with the REST API depicted below:
let’s get our rest on!
 
what pretty documentation you have!
  Powerful stuff!

To Infinity and Beyond!

Now that you’ve witnessed how easily you can warehouse all your data, and bootstrap it into a REST API, it’s time to bring this to your organization. Play with it a little, get comfortable with the tools, then turn up the dials.   Want to learn more about how DreamFactory and Redshift can work together (or how to put a REST API on any database)?  Schedule a demo with us. The next time management comes calling for data, you can give it to them with a fire hose!

Why You Shouldn’t Build Your Own REST API

 
BenBusseWhat’s the story behind the DreamFactory Services Platform? We make applications ourselves on cloud platforms like Salesforce, Windows Azure, and AWS. Every new application we created for our customers required the same manual steps:
  • Set up backend databases, schema, and file storage
  • Create a user management system with secure authentication
  • Design and create backend services for data, files, and external APIs
  • Write our own REST API to access all these services
  • Integrate the frontend application with these backend services
  • Test all of that integration end-to-end
Ouch! All that time spent creating the backend services and API took away valuable time creating the actual application that customers would be using everyday. We searched for an open source solution that could solve this problem. Alas, it didn’t exist. So we decided to build it. We realized that other app developers faced the exact same problems and could benefit from our work. So it made perfect sense to open source it. Many of our customers are large enterprises with sophisticated requirements, especially around security. The platform had to satisfy several goals:
  1. Dramatically simplify life for frontend developers. The platform should eliminate the need to write any server-side code.
  2. Support HTML5 and native mobile applications running on performance and bandwidth-constrained phones and tablets.
  3. Provide a comprehensive palette of backend services and a unified REST API to power sophisticated, data-driven applications at scale.
  4. Provide world-class security that large enterprises could adopt.
  5. Provide open source flexibility. A developer or sys admin should be able to install the DreamFactory software package in the cloud or on premise.
The unified REST API mentioned in goal 3 above is a key feature of the platform. Now you don’t have to write your own REST API. It’s automatically created for every backend service that your application needs. The API includes 123 standard GET, POST, PUT, and DELETE calls for:
  • /user – 11 API calls for user authentication, registration, profiles, and sessions
  • /system – 45 API calls for managing apps, app groups, email, roles, services, and users
  • /app – 16 API calls for application containers, files, and folders
  • /db – 8 API calls for database CRUD operations
  • /doc – 16 API calls for document containers, files, and folders
  • /email – 1 API call to send email
  • /lib – 16 API calls for lib containers, files, and folders
  • /schema – 10 API calls for managing schema
And every time you add a new service, the corresponding REST API for that service is automatically created and documented. For example, say you connect to a MongoDB database with DreamFactory. updateservice The new API ‘/mongo’ is automatically created and documented. Presto, now you have a REST API to access your remote MongoDB database from the client! mongodb Before you start building anything, spend 5 minutes browsing the API. The API is documented with an awesome tool called Swagger. Swagger lets you try out live API calls right in your browser.  It’s interactive, so you can quickly learn the capabilities of the API without writing a line of application code. Try it out! You can browse the API in two places: on our website and in the API Documentation tab of the admin console. apidocumentationtab Also check out Jason’s recent blog post on getting started with the DreamFactory API. Have fun with the API and let us know what you think!

Why We Like Swagger for API Docs

 
LeeHicksNothing makes a REST API easier to use than good documentation. Well, nothing except maybe a live test environment right there in the documentation. And help with generating client side code to use your API would be awesome too. Come to think of it, having the documentation and test environment dynamic enough to update as you on-board more services would be a major plus. Well, you get all that and more with the latest DreamFactory DSP REST API with Swagger. Portable network graphics Why Swagger We wanted our REST API, and that of any added web services, to be easy to understand, quick to test, and simple to use right out of the box. The Swagger framework solves our server, client, documentation and testing sandbox needs, all in a language-agnostic specification, with plenty of open-sourced server and client side resources already available to help with generation. It also comes with an open-sourced front end, the Swagger UI framework, which quickly allows developers to work with the API, giving them a clear picture of how the API responds to requests with various parameters and options. It helps us to provide what a developer needs to get an app up and running with powerful web services with little time and effort. How does DreamFactory use Swagger? Documenting static services of our API is as easy as annotating server-side code, just as you would use javadoc or phpdoc. You define the API services, including path variables, query parameters, and even members of the posted body data, along with models of any data types passed between client and server. Documentation is then easily generated in various formats (json, xml, etc) and cached for speedy access from clients. We then generate client-side code from the resulting documentation for our API. We use this generated code in our apps, as well as, presenting them as SDK jump-starters for app developers using our platform. At DreamFactory, we added a little twist by allowing web services to be added or removed dynamically. For any non-native service that is added to your platform’s API, a json or xml Swagger document meeting the specification can be added and/or modified to define that service, immediately making it testable from the included Swagger UI framework and usable by client-side generated code. Want to see our REST API in action using Swagger? Try it here. Or create your own free DreamFactory Services Platform and develop until your hearts content.

Shedding Development Pounds

 
BenBusseMuch has been written about how cloud computing, open source software, and programming frameworks have reduced software development costs exponentially in the last decade. As more than a few pundits have said, “anyone with a good idea and an Amazon Web Services account can now create the next Facebook!” Sure, the hyperbole is for dramatic effect, but the underlying point is true: anyone with programming skills can now create a web or mobile application faster and more cheaply than ever before. The benefits of these advances are clear in the world of startup companies and developers of consumer applications. But how do enterprises, with a far more complex set of requirements and stakeholders, benefit? How do enterprises leverage “agile” and “lean” software development practices, tools, and infrastructure to be more competitive and meet the increasingly demanding expectations of customers, business partners, and their own employees? And how does the technology shift to cloud infrastructure relate to the explosion of “bring your own device” mobile usage at work? There’s no single silver bullet answer to these questions, but several important trends are relevant to how developers of business apps should be thinking about the intersection of cloud technologies and mobile to make their projects successful.
  • HTML5 is ready for prime time as the best front-end standard for companies that need to build and deploy applications that work on desktop computers, tablets, and phones. HTML5 enables companies to write one application, instead of having to write entirely separate applications for the desktop, iOS, Android, Windows, and other mobile OS platforms.
  • Front-end mobile development toolkits like PhoneGap and Sencha and frameworks like jQuery Mobile and Angular.js will gain market influence since they complement platform- independent development across mobile and web. These toolkits enable developers to package a single application for installation on all mobile operating systems. Users can easily download and install these packaged apps onto their mobile devices just as they would a normal “native” mobile application from an app store.
  • Likewise, the mobile back-end stack for mobile applications will largely be open source,  standards-based, pre-configured, highly secure, and architected specifically for mobile usage. And the mobile back-end will not be subject to someone else’s control. You can install it on your own cloud infrastructure or data center. This is the area that we at DreamFactory are tackling. We’ve written more about this in a blog post about the evolution and design goals of the Dreamfactory Services Platform.
  • Developers will benefit enormously from both front-end and back-end simplification and standardization. Now an individual or small team of application developers can quickly create a prototype app, iterate on the design, create the application, and deliver it to end users in a production environment. Standardization also enables reusable design patterns (and reusable source code in many cases). We’ve written more about this in a blog post about the rise of the front-end developer.
  • The promise of these game changing trends is exciting but enterprises are by necessity risk averse and therefore reluctant to introduce new technologies that haven’t been battle tested in production environments. Enterprises need the flexibility and control to use the core back-end infrastructure that suits their application requirements, whether public cloud, hybrid public-private cloud, or on premise. Back-end service platforms like DreamFactory must be easily installed on the desired server infrastructure and must also integrate seamlessly with existing IT development, deployment, scaling, and security tools and practices. We’ll be writing a lot more about this topic in future posts. Stay tuned!

Getting Started: DSP Account Setup Screencast

 
BenBusseWe’ll be posting many screencasts to help you quickly learn how to use DreamFactory to build great apps. The first one to watch is how to sign up and get started building apps with your free trial of the DSP.

In our first ever Account Setup screencast, Todd Appleton, manager of software engineering, explains all you need to know to get up and running with DreamFactory. In just over 7 minutes you’ll learn how to:

  • Create your Dreamfactory account
  • Spin up a new DSP free trial instance
  • Launch applications on your free trial instance

describe the image

After watching Tom’s quick screencast, you’ll be ready to explore the platform on your own and begin building your first application.

Once you’ve signed up for a free trial, don’t forget to check out more screencasts, explore the REST API, and get familiar with our documentation.

We love feedback! Let us know what you think in the comments.

The Rise of the Front-End Developer Part II

 
BenBusseWhat is the DreamFactory Services Platform? In a nutshell it’s a pre-built, open source back-end that you can install on any cloud infrastructure (AWS, Azure, RackSpace, etc.) or in your own data center. Released under the Apache license, the DSP is designed with the power, security, and flexibility to deploy a world-class mobile application. It includes a palette of secure services that you can access with REST (or SOAP) to fetch and return JSON (or XML) documents to your client app. You get a fully dedicated SQL database, standard BLOB storage, an external services interface, and highly configurable user management functions. The DSP works with everything from lightweight CRUD applications to the most sophisticated enterprise mobile apps. And you still get to use your client-side technologies of choice — whether HTML5 in the browser, hybrid, or native, it’s all good. There’s no need to master a brand new language or framework. It’s a great time to be a front-end developer. And we’re super excited to be a part of a new wave of mobile technologies that enable front-end developers to build amazing apps without relying on a “roll your own” back-end approach. If this sounds interesting, take the DSP for a spin by signing up for a free trial. After you sign up, check out our screencast tutorials, REST API, and developer documentation. You can also import some sample apps on our GitHub page to start learning how it all works. Let us know what you think by posting on the DreamFactory website forums, message us on Twitter, or reach out to our support team. We’re here to help! And stay tuned for future blog posts and screencasts that will dive into a lot more technical detail about how to build great mobile apps with the Dreamfactory Services Platform. Happy coding!