In this blog series we’ll try to explain DBcloudbin for our non-technical audience.
We believe it is a great product and we want to open it and allow a reasonable understanding for those that are not familiar with the data management and database concepts, helping with this blog. In this first post, we will deal with the basics, structured in 5 topics. In the next one, we will introduce the basic problem of many applications that DBcloudbin comes to solve. Let’s start…
1.- How a typical enterprise application works.
For those used to work with applications running in their phone or laptop, enterprise applications are substantially different in most cases. They usually run in a centralized infrastructure where an application server is executing the application intelligence and the users are connecting to it remotely, in many cases through an web interface (so accessing a ‘well-known’ web page). You enter the URL, log in with your credentials and start using it for your daily duties. This application infrastructure behind the scenes can be just one computer with all installed in, or dozens of servers with different roles and external storage infrastructure and communications networks for providing a complex IT service.
2.- How data is stored for an application working properly.
Any non-trivial application deals with data and is the result of taking some data as input, executing a defined process with it, and generating some output data. This data has to be stored somewhere. There are multiple options but the most common situation is having a database, that is in fact an subsidiary application that provides this capability.The database is able to provide some very interesting services on top of safely storing data as is providing a way to structure and query that information, validate its formal representation (e.g. make sure that a customer record is only stored if it contains an attribute with “First Name” and “Last Name”), handle the simultaneous access of several applications to the same data in a unambiguous way, among many others.
The most used database type is what we call relational database where data is structured in tables as in an Excel spreadsheet. Databases provide a way for applications to read and write data; since those operations need high flexibility it is done providing a formal language; this way, an application can ‘talk’ to the database an tell exactly what it needs.
The most used language is called SQL. In computers, the language normally has more strict rules than in human language, both syntactically and semantically. If an application request the “contracts closed on last Monday” our application should have beforehand instructed our database that there is something called “contract”, they have an attribute that can hold the value “closed” and other attribute that indicates when the operation was performed; otherwise, the application receives an error.
3.- How is data protected in a database.
Any database software (remember a database is just another application) will physically store the information in media based on their own proprietary criteria. Nobody else than the database software needs to know it. However, we need a way to store a copy of that data anywhere else to be able to restore the database content in the case of any disaster. This will require the integration of the database software with another software (provided by the database manufacturer or not) able to extract a copy of that data for protection in an independent media (this is called backup software). When the database is large, this process takes time. Restoring the database in the case of problems, will take also significant time, with the additional trouble that in that case we will not being able to have our application running (so, no service provided to our users).
4.- What are the types of storage.
Keeping it simple, we have three basic types of storage from an access perspective (from a physical perspective there are other classifications but it is less relevant for our purpose):
- Block storage: This is the older and most common type. In this case a repository of data is somehow assigned to a computer and the operating system of this computer will create the physical structure to handle raw data in this repository (or ‘disk’). This is what happens with the disk that our laptop has installed for operating and storing our documents. In large enterprise environments those disks (or many of them) are not physically inserted in a computer, but assigned to a computer from a centralized pool of storage accessed through a special storage network. This is the most common way of serving storage to the server where the database application is running; so the database application consumes this type of storage for saving the data that its client applications ask to be persisted.
- Networked File storage: In this case, the storage tier provides an additional level of service and is able to provide a filesystem, so a higher level way of structuring our files in folders, that can be accessed from several servers. It is commonly used for providing a file storage service to users where they can save their documents and other stuff. It uses protocols with similar primitives but differences for Windows and Linux systems that generate some interoperability challenges.
- Object storage: Is the newest type of storage where we can store content (objects) in a common namespace where objects are identified by its name. It is based on a significantly different approach in the sense that is not the operating system of the server that deals with the storage, but the applications running on top of the operating system, dialoguing directly with the storage gear using a defined protocol. The most common and becoming a de-facto standard is the S3 protocol, implemented by the S3 service (Simple-Storage-Service) provided by Amazon Web Services. Now, there are many different storage manufacturers providing a similar service, most of them implementing the same protocol for interoperability.
5.- How is all this related with the Cloud.
Cloud computing is a way to consume computing, storage and networking services without having to deal with the actual infrastructure. In general, the Cloud provider can deliver IT services at a many levels of abstraction (pure infrastructure, application platforms, end-user ready to consume software, …). Moving to the Cloud is in general a complex task for non-trivial workloads when we need to provide a business continuity to our IT services as it is the normal scenario in enterprises. Large databases that support critical LoB (line-of-business) operations are probably the toughest scenario since any large datasets have specific challenges starting by the nature that transferring large amounts of data requires a significant amount of time. Information is also a key asset for any company so it opens additional aspects as security, privacy, protection, ….
If you want learn more things, there are more articles talking about this topic. Check HERE