SAP HANA: IN MEMORY COMPUTING
SAP High Performance Analytic Appliance (HANA) or in-memory computing is the latest technology offered by SAP. SAP HANA allows the processing of massive amounts of data in real-time. Many large business using SAP records millions of transactions every year. These transactions are recorded in the SAP database and can be used by various SAP modules such as the SAP BI module to generate reports for senior management for decision-making purposes.
It often takes a a great deal of time to extract and process the millions of transactions that are recorded every year to produce KPI’s etc… which can be used by management teams to make effective business decisions. This often results in a time lag for reporting.
This can often impact the efficiency and effectiveness of decision making processes. SAP HANA or in memory computing aims to overcomes this problem by providing real-time analysis of transactional data allowing management teams to have access to the most up to date picture of their business. For example, SAP HANA allows the marketing managers to have access to real-time sales data even before the customer has left the retail store.
SAP HANA uses the latest in the hardware as well as software technologies to achieve the objective of processing huge amounts of data in real-time. SAP HANA is just a technology involving a few tools and does not replace the existing SAP modules in any way. It consists of tools for storing and retrieving data from the in-memory database (IMDB) and the new proprietary SAP database optimized for in memory computing. SAP HANA therefore makes the use of ERP systems even more meaningful and important for businesses.
TRADITIONAL COMPUTING VERSUS IN MEMORY COMPUTING
In traditional computing, the operating system, applications and other data is stored on spinning hard disks. This data is retrieved from the hard disk and passed to the random access memory (RAM) as and when required and then processed by the Central Processing Unit (CPU). The main disadvantage of this model is the bottleneck created in extracting the data from the spinning hard disks.
The spinning hard disk forms the secondary memory which is very slow when compared to the RAM. Therefore, with traditional computing system design a lot of time is wasted in extracting the data from the hard disk and moving it to the fast memory i.e. RAM.
SAP HANA removes this bottleneck by storing the entire database in RAM in real-time.
Therefore, as the data is already present in RAM, the CPU can directly start processing the data without having to wait for the data to be extracted from the slow spinning hard disks. In case of in-memory computing, the RAM is the only source of data for the CPU.
Data in the RAM is updated on a real-time basis as transactions are recorded in the SAP system. Periodically, the updated data is copied onto the hard disks which act as the secondary storage and backup in case of a power failure. This is because the RAM is a ‘volatile memory’ which loses data whenever it is denied a power source whereas a hard disk is a ‘nonvolatile’ magnetic memory having the ability to retain data even in case of a power failure.
SAP HANA HARDWARE
SAP HANA uses advanced hardware and the latest CPUs to provide extremely fast processing and real-time analytic abilities. The data that is to be processed is divided into many sets which are then processed in parallel by the system. Each CPU has multiple cores and each core has the ability to process data independently.
Multiple CPUs are combined together to form a Blade. Multiple blades exist in an SAP HANA system. An example system would consist of four blades, two of which will be active and the other two will be standby. Each blade would consist of eight CPUs and eight CPU will have eight cores, each capable of independently processing data. Therefore, there would be a total of 2 x 8 x 8 = 128 cores crunching data in parallel.
The faster the clocks speed of the processor, the more data it can process. The concept of multiple cores is used because of the limitation on the clock speed that can be attained for a single processor due to heating considerations.
As SAP HANA stores the entire database in RAM, data retrieval from RAM is about 50 times faster compared to the data retrieval from a spinning magnetic hard disk. This reduces the processing times by about 50 times.
The hardware for SAP HANA is being provided by all of the major vendors like HP, IBM, Dell, Hitachi, etc. The most typical hardware configuration for the SAP HANA system would be a 40 core, 64 bit, 2 TB servers.
SAP HANA SOFTWARE
SAP HANA uses the proprietary SAP database to store data. This database is different from the common Oracle or MySQL databases and is optimized for in-memory computing.
HANA stores data in columns instead of the conventional way of storing data in the form of rows. This allows the data to be read from the RAM sequentially which is much faster than reading non-sequential data.
This technology also allows the totals or aggregates to be computed extremely fast. For example, consider an SAP customer table which stores data about the sales orders for a particular customer. In a simplistic example (where table keys are not used) let us assume that the table has 20 columns and a total of 1000 records.
Each row represents one complete sales order for the customer. One of the columns, say the 15th column stores the amount of the sales order. If there is a requirement to find out the total value of the sales orders that has been entered for a specific period of time, then in a conventional database, the data will be read row by row. Once the specific row has been found then the system would search for column 15 starting at column 1.
The system will be required to read all the 1000 x 20 = 20000 values to simply extract the values for all the sales orders and then calculate the aggregate. In SAP HANA as the tables are read column wise, only the 15th column will be read and the total value of the sales orders will be calculated. Therefore, only 1000 values need to be read in this case as opposed to the 20,000 values that need to be carried in a conventional database for carrying out the same activity.