Data pipelines on data-driven companies

In recent years, companies have started to make the most of their data assets. This means they can utilize their resources better, personalize their portfolio and improve the quality of service; in other words, to make smart-decisions based on data.

To achieve this aim, they require modern data pipelines to quickly extract information from any source and then transform it into a usable form, at a corporate scale.

But these data pipelines must overcome two main obstacles:

  • Cost increases due to architecture complexity: Current technology forces companies to create complex architectures. McKinsey indicates that a midsize institution with $5 billion of operating costs spends more than $250 million on it and, therefore, it recommends simplifying data architecture.
  • Lack of agility: Complex architectures mean that the process is executed more slowly. McKinsey affirms that the correct architecture provides IT cost savings, productivity improvements, reduced regulatory and operational risks, and allows new capabilities and services to be delivered.

LeanXcale

LeanXcale is a database vendor that creates an efficient database, blending the capabilities of high data ingestion of a key-value data store, the ease of query with SQL, and linear horizontal scalability.

This versatility makes LeanXcale an optimal database for data pipelines, since its capabilities allow the use of a simple architecture with a single database engine:

  • The key-value interface provides the capacity to ingest data very quickly from multiple sources.
  • Since it is an operational SQL database, it can transform the information into a useful format in parallel to the data ingestion.
  • Thanks to its linear scalability, corporations can use it to handle any volume of information with the same cost per TB.

Relying on LeanXcale results in an absolutely simple architecture that increases the solution agility and reduces the TCO of the solution.

Purpose of the document

Cost optimization is a business-focused discipline for reducing spending and, more generally, costs, while still maximizing business value. In the following sections we will describe the impact of LeanXcale’s usage in the following aspects:

  1. Reduction in license cost
  2. Reduction in infrastructure cost
  3. Increase in team performance
  4. Reduction in operational issues
  5. Opportunity cost: New services, product and business impact

This document explains how both the Business and IT teams can cut costs and grow revenues by means of LeanXcale, in comparison to the market leader in a hypothetical scenario.

Example scenario description

The legacy scenario

After €5M investment on development, one of the services for Acme Corporation provides a yearly benefit of €12M. 

This service requires an 18-hour daily analytical process. It captures 6 billion rows from several operational databases. Later, it processes them into an exploitable format. Five per cent of the time, the process cannot complete its execution due to the reduced margin (18 hours of execution vs 24 hours of the day). The process is executed in a 96-core server in a hybrid cloud, using the market-leader database. An SQL application interacts with the process outcome.

In summary, the numbers are as follows:

  • Investment in the development of the solution: €5M
    • Yearly benefit of the service: €12M
    • Process duration: 18h/day
    • 6 billion rows captured and processed
    • 5% executions fail
    • 96-core server with market-leader database

The LeanXcale scenario

LeanXcale’s architecture provides two main advantages: the efficiency of its store engine and the horizontal scalability.  According to LeanXcale customers’ experience, LeanXcale’s performance is 72 times higher than the market leader. In other words, to execute the same workload on the same 96-core server, LeanXcale needs 15 minutes.

LeanXcale scales linearly, so using four 24-core servers would provide the same output as using a 96-core server. 

In this scenario, we will use three 8-core servers (equivalent to using a 24-core server). As we said before, 96 cores with LeanXcale in this scenario would take 15 minutes to run the process. So, since we are using 24 cores, the performance will take four times more (96/24=4), that is one hour.

In summary, the numbers are as follows:

  • With market leader:
    • Hardware: 96 cores
    • Process duration: 18 hours
  • With LeanXcale:
    • Hardware: 24 cores (three 8-core servers)
    • Process duration: 1 hour

License cost reduction

Database licensing is one of the most important cost sections in a smart-driven application.

LeanXcale’s pricing is a competitive subscription model that includes licensing, 24×7 support and updating version access. 

LeanXcale is billed by the number of effective physical cores that the deployment is using. For example, if LeanXcale is running in a four-core virtual server on a 512 cores data server, the customer will only pay for the four cores. If these four cores are virtual, it means that they are two physical cores with hyperthreading. In this case, the customer will only pay for the two physical cores.

Listed prices are:

  • 8×5 support: 1.500€ per core and year
  • 24×7 support: 3.000€ per core and year

Additionally, LeanXcale can also be acquired in a traditional perpetual license:

  • License: 7.500€ per core
  • Support: 1.500€ per core and year

Scenario cost analysis

In the tables below, you can see the cost of the legacy scenario compared to LeanXcale with the subscription model, and LeanXcale with the perpetual model.

Table 1: Legacy vs LeanXcale with subscription model
Table 2: Legacy vs LeanXcale with perpetual model

*Legacy prices can be found here: https://www.oracle.com/us/corporate/pricing/technology-price-list-070617.pdf

In both scenarios, LeanXcale has a cost of 4% (or 24.18 times less) of the equivalent legacy scenario. This represents a saving of 8,3 million euros over five years.

Infrastructure reduction

When there are performance issues with applications, processes or databases, often the easiest solution is to throw more hardware at the problem. A lack of database efficiency makes companies overprovision their environment in order to support minimum performance levels. However, this strategy is very expensive.

LeanXcale can perform 72 times faster than the market leader in the same infrastructure. This efficiency greatly reduces the processing time, and the underlying hardware can be released for a different use. Alternatively, the process can be executed with less hardware.

Scenario cost analysis

Table 3: Legacy vs LeanXcale infrastructure cost

* Prices from AWS Frankfurt at 30/3/2021

In cloud environments, LeanXcale’s performance ensures a proportional reduction in the infrastructure bill. In short, the LeanXcale scenario is 72 times more affordable.

Something interesting about LeanXcale’s linear scalability is that, in any combination of servers, if the hardware cost is proportional to the process capacity, the execution cost is the same.  For instance, if you use six 8-core servers (two times more than in this scenario), the execution time will be half and it will again be 72 times more affordable than the legacy one. So, with no extra cost impact, you can choose different combinations according to your economical interest (i.e., reuse old hardware or use small and cheap servers) and your temporal goal (i.e., having results in 30 mins).

Team performance

Data scientists, data analysts or actuaries need to work with the output data to create models that represent the reality. These models are created incrementally after several interactions: observe the data, improve the model, train the model, and test it. A big portion of this loop is performed by machines, and the team’s performance relies heavily on the length of these periods: the shorter the process is, the more productive the team becomes. Due to the fact that LeanXcale’s execution is much faster than the legacy one, their performance increases.

Scenario cost analysis

Team cost analysis
Table 4: estimated salaries (in euros)
Table 5: Performance impact analysis

Let’s assume that for 25% of their working time, a data scientist is waiting for the legacy execution. The execution time reduction produced by LeanXcale’s performance can increase the team’s performance by a third, since the “waiting per results” time is 80 times lower with LeanXcale, and the total occupation time increases from 75% to 99,7%.

Operational issues

Long execution processes create a lot of operational pains. Companies lack any response capacity against unexpected events. This often means that a small failure leads to a full day with no service, and an obvious impact on the revenue stream and/or the reputation of the company.

With the abovementioned infrastructure, LeanXcale can execute the process in one hour. Therefore, a heavy, whole-day process becomes a light one that can be executed several times during the same day. If something unexpected happens, you can try it again at that time.

Scenario cost analysis

Table 6:LeanXcale save in operational issues

In a profitable service, a small percentage of days with problems, for example 5%, have a severe cost in terms of revenue. However, this can be easily overcome with LeanXcale, making your business much more reliable.

Future needs and architecture scalability

It is well known that more than half of the historical data has been stored during the last two years. Data-driven companies are storing more and more information. The existence of competitors force companies to continue producing and developing new capabilities in order to fulfill customer demand. This combination of new demand and increasing data quantities means that data-driven companies need to process increasingly bigger volumes of information.

Some architectures cannot scale out and require an entire redesign to provide the required level of service.

Scenario cost analysis

There are several options:

  1. Keeping the process execution time under 18 hours requires vertical scalability. For example, AWS ec2 r family does not have an available server over 96 cores (assuming linear vertical scalability).
  2. With the same server type, we can grow until we reach 24 hours of execution, which means a 33% improvement.
  3. Beyond that, the platform will require either:
  4. Full replication or share disk architecture, with logarithmic scalability. The cost of each extra parallel transaction grows exponentially.
  5. A full redesign (The current €5M development will become useless)

LeanXcale is able to scale up to hundreds of nodes in a linear manner, as the graph below displays. Notice that it is a representation of a TPC-C benchmark on a cluster of 1, 20, 100, and 200 LeanXcale nodes. The performance of the 200 nodes cluster is 200 times the one node provides.

Figure 1: LeanXcale linear horizontal scalability

Using the scenario example, assuming we use two hundred 96-core servers, the same architecture can manage 86.4 trillion rows; 14,400 times (72 times more performance x 200 nodes) more than the market leader. In other words, any development investment will produce returns forever with no extra investment.

Cost of Opportunity and Business Impact

LeanXcale allows much more information to be processed at much higher frequencies. LeanXcale’s huge data processing capabilities enable a wide range of new data driven business cases, which cannot be addressed today. LeanXcale thus becomes an important lever for differentiation and optimization; having a direct impact on the market share that these services can expect.

Real-time services that analyze more information are possible now with LeanXcale:

  • real-time analysis: i.e., instant ratings or real-time billing.
  • real-time forecasting: i.e., next hour electrical consumption forecast
  • real-time detections: i.e., fraud detection
  • processing of bigger volumes of information: i.e., analysis of several full economic cycles.

The cost of opportunity of not developing this kind of solution is difficult to estimate, but assuming that 10% of the current market share can be impacted by releasing these functionalities:

Table 7: LeanXcale save in cost of opportunity (in euros)

Conclusion

LeanXcale is a database technology that can perform exceptionally well for data pipeline use cases by playing a vital role in the journey of banking, insurance and other demanding data usage verticals.

As previously mentioned, LeanXcale provides a crucial total cost of ownership reduction and, at the same time, a strong increase in the platform’s capacities. This is achieved by allowing more data to be processed with more frequency.

A quick summary:

A 72-times improvement in performance that yields:

  • A 72-times reduction in infrastructure cost (€162,5K in five years).
  • Up to 24,18-times reductions in the costs of Licenses and support (€8.3M savings in five years).
  • An increase of 32% in team performance (235K € in five years).
  • Avoiding outages due to small execution window (3M€ savings in five years).
  • Capacity to scale out more than 14,400 times the market leader (€5M savings by using the original development investment).
  • Capacities to improve services due to the ability to process more data with greater frequency. (€6M more benefits in five years)

About the author

Mr. Juan Mahillo is the CRO of LeanXcale and a former serial Entrepreneur. After selling and integrating several monitoring tools for the biggest Spanish banks and telco with HP and CA, he co-founded two APM companies: Lucierna and Vikinguard. The former was acquired by SmartBear in 2013 and named by Gartner as Cool Vendor.