Photo by Marek Piwnicki on Unsplash

Data business in applications: Java or SQL?

The Java Trail

--

Data business in applications usually involves access to persistent data, data calculation and flow processing. The persistent data in database can be calculated in SQL, and the loop/if statement of stored procedures can perform flow processing, and JDBC (including ODBC) can integrate SQL in applications. Therefore, complex SQL (including stored procedures) is often used for data business development.

However, since complex SQL is deeply bound to database, there are defects in framework, resulting in a failure to meet the requirements of modern applications.

The defects of complex SQL (and stored procedures)

Difficult to scale

When using complex SQL or stored procedures to implement data business, the database will be the hub of pressure. Due to the inability to integrate the database into a mature framework, it is impossible to implement high availability and easy scalability through framework. When scaling the database itself, the cost is high whether scaling up or scaling out, which does not conform to the construction concept of modern applications.

Poor code migratability

Simple SQL code is highly universal and easy to migrate between databases, whereas complex SQL code is not.Since complex SQL code often uses special syntax (including functions) that binds a database, the code is difficult to migrate. As for stored procedures, there is not even a uniform standard, and the differences between them are even greater, making migration more difficult.

High coupling

Data business is to provide service for applications, preferably within application. Stored procedures, however, exists in the database, resulting in a too high coupling between them. In addition, the database is often shared, which will cause coupling between applications.

In order to make up for the defects of complex SQL, many applications start to use Java + simple SQL to implement data business, and there are two main categories of technologies to choose from: ORM and Stream. ORM mainly includes Hibernate and JOOQ; Stream is a library provided since Java8, based on which the Kotlin was developed. After adopting this method, a large amount of data computation and processing pressure is borne by application, and Java programs can be easily integrated into mature frameworks for low-cost scale. ORM and Stream are responsible for data calculation, while basic Java language is responsible for flow processing. Since all codes are written in Java, the migration is very good. In addition, this method also makes coupling low because the database is only used for storage, and all data businesses are concentrated in the application, making it possible to maintain separately, and naturally isolating data business between different applications.

Compared with complex SQL, this method can get better framework advantages, but it brings new defects.

The defects of Java + Simple SQL

Difficult to develop due to weak computing ability

Hibernate’s computing ability is far inferior to that of SQL, and many simple calculations cannot be described in Hibernate, including from subquery, calculation involving row number and so on. For example, the from subquery that is easy to implement in SQL:

select orderId, m from (select orderId, month(orderDate) m from OrdersEntity) t1

For more complex calculations, it is more difficult to describe in Hibernate. For example, Oracle SQL uses window functions to calculate the top 3 of each group:

select * from (
select *, row_number() over (partition by Client order by Amount) rn from Orders) T where rn<=3

Hibernate does not support many basic date functions and string functions, including: get a date before or after a specified date, get the ordinal number of a date in the year, get which quarter the date belongs to, string replacement, left truncation, get corresponding ASCII code, etc. Let’s take “get a date before or after a specified date” as an example:

select date_add(OrderDate,INTERVAL 3 DAY) from OrdersEntity

To achieve similar functionalities, two methods are available only: introduce dialect SQL or hard-code in Java. The former method requires binding to database, and is difficult to migrate the code, deviating from the original intention of ORM, while the latter one requires writing a huge amount of code.

JOOQ requires programmers to design SQL code first, then translate SQL code into JOOQ code, and finally parse Java code to SQL through engine for execution. If we want to get the computing ability close to dialect SQL, we need to use a lot of JOOQ functions bound to database, but this way does not solve the framework defects; To make up for the defects in framework, we have to use the general JOOQ function as much as possible, yet the computing ability will be greatly reduced. Java syntax is not suitable for expressing SQL. In order to express it correctly, JOOQ often over-encapsulates functions, making the code more complex than SQL and the actual computing ability lower than SQL.

For example: calculate the top 3 of each group:

//Equivalent SQL code can be found above. The code written in JOOQ binding Oracle is as follows
WindowDefinition CA = name("CA").as(partitionBy(ORDERS.CLIENT).orderBy(ORDERS.AMOUNT));
context.select().from(select(ORDERS.ORDERID,ORDERS.CLIENT,ORDERS.SELLERID,ORDERS.AMOUNT,ORDERS.ORDERDATE,rowNumber().over(CA).as("rn")).from(ORDERS).window(CA) ).where(field("rn").lessOrEqual(3)).fetch();

This code is significantly more complicated.

Stream provides streaming programming style, Lambda syntax, and set functions, making it possible to implement simple calculations on sets of simple data types (numbers, strings, dates). However, Stream is a general-purpose base-layer tool, and not professional enough in terms of record set, and its computing ability is far lower than that of SQL. Many basic calculations are difficult to implement in Stream, for example, grouping and aggregating

//Equivalent SQL:
select year(OrderDate), sellerid, sum(Amount), count(1) from Orders group by year(OrderDate), sellerid
//Stream:
Calendar cal=Calendar.getInstance();
Map<Object, DoubleSummaryStatistics> c=Orders.collect(Collectors.groupingBy(
r->{
cal.setTime(r.OrderDate);
return cal.get(Calendar.YEAR)+"_"+r.SellerId;
},
Collectors.summarizingDouble(r->{
return r.Amount;
})
)
);
for(Object sellerid:c.keySet()){
DoubleSummaryStatistics r =c.get(sellerid);
String year_sellerid[]=((String)sellerid).split("_");
System.out.println("group is (year):"+year_sellerid[0]+"\t (sellerid):"+year_sellerid[1]+"\t sum is:"+r.getSum()+"\t count is:"+r.getCount());
}

Kotlin improves Stream, and has more concise Lambda expressions and richer set functions. In addition, Kotlin adds Eager Evaluation (as opposed to Stream’s Lazy evaluation). But Kotlin is also a general-purpose base-layer tool designed for the calculation of simple sets, and not professional enough in record set, and still far lower than SQL in computing ability. For example, the basic grouping and aggregating:

data class Grp(var OrderYear:Int,var SellerId:Int)
data class Agg(var sumAmount: Double,var rowCount:Int)
var result=Orders.groupingBy{Grp(it.OrderDate.year+1900,it.SellerId)}
.fold(Agg(0.0,0),{
acc, elem -> Agg(acc.sumAmount + elem.Amount,acc.rowCount+1)
})
.toSortedMap(compareBy<Grp> { it. OrderYear}.thenBy { it. SellerId})
result.forEach{println("group fields:${it.key.OrderYear}\t${it.key.SellerId}\t aggregate fields:${it.value.sumAmount}\t${it.value.rowCount}") }

The fundamental reason why libraries such as Hibernate, JOOQ, Stream and Kotlin lack sufficient computing ability is that their host languages are static compiled languages, which make it difficult to support dynamic data structure, and hence their expression ability is greatly limited. Libraries like JOOQ that barely support dynamic data structures must be written in a mixed form of static code and dynamic code (strings), such as T2.field(“continuousdays”). Once the business data becomes slightly more complex, the coding difficulty will increase sharply. The fundamental reason why SQL is so computationally powerful is that it is a dynamic interpreted language, and inherently supports dynamic data structures and has a high upper limit of expression ability.

Complicated to operate and maintain due to the difficulty in hot deployment

Due to the fact that compiled languages do not support hot deployment, it often needs to recompile after the code is modified, and relaunch the application, resulting in poorer system security and higher complexity on operation and maintenance.

esProc SPL solves everything

There is a better choice to implement data business: esProc SPL + simple SQL.

esProc SPL is an open-source data processing engine of Java, and its basic functions cover every stage of data business. SPL itself has data calculation and flow processing abilities. Simple SQL is responsible for reading and writing the database, and the front-end Java code is responsible for calling SPL through JDBC.

Read and write the database. SPL provides the query function to execute SQL to read the query on database as an internal table sequence (SPL’s structured data object).

T=db.query("select * from salesR where SellerID=?",10)

SPL provides the update function to save table sequence to database. SPL engine will compare the data before and after modification, and automatically parse to different SQL DML statements (insert, delete, update) and execute. For example, the original table sequence is T, and becomes NT after inserting, deleting and updating. The following code is to persist the changed results to database:

db.update(NT:T,sales;ORDERID)

Data calculation. Based on the table sequence, SPL provides a wealth of computing functions.

Filter: T.select(Amount>1000 && Amount<=3000 && like(Client,"*bro*"))
Sort: T.sort(-Client,Amount)
Distinct: T.id(Client)
Aggregate: T.max(Amount)
Associate: join(Orders:o,SellerId ; Employees:e,EId).groups(e.Dept; sum(o.Amount))

Flow processing. Similar to Java’s for/while/if statement and stored procedure’s loop/if statement; SPL provides complete flow control capabilities. Branch judgment statement:

Loop statement:

JDBC driver. The data business code written in SPL can be saved as a script file, and Java invokes the script file name through JDBC driver, similar to calling a stored procedure.

Class.forName("com.esproc.jdbc.InternalDriver");
Connection conn =DriverManager.getConnection("jdbc:esproc:local:// ");
CallableStatement statement = conn.prepareCall("{call InsertSales(?, ?,?,?)}");
statement.setObject(1, d_OrderID);
statement.setObject(2, d_Client);
statement.setObject(3, d_SellerID);
statement.setObject(4, d_Amount);
statement.execute();

In addition to these basic abilities, SPL + simple SQL can also overcome various defects of complex SQL as well as Java + simple SQL, and has the advantages of low scale cost, good code migratability, low coupling, strong computing ability, and support for hot deployment.

Easy to scale

Using SPL to implement data business will put the pressure primarily on SPL. As a Java library, SPL can be seamlessly integrated in mature Java frameworks, making it easy to scale out.

Good code migratability

When using SPL + simple SQL to implement data business, the code focuses on data calculation and flow processing, which is implemented in SPL. SPL is database-independent, and its code can be seamlessly migrated between databases. Database read and write operations are implemented in simple SQL without involving dialect SQL, making it easy to migrate.

One of the original intentions of designing SPL is to facilitate migrating, and many tools are provided for this purpose. SPL encourages data fetching through data source name, so when migrating, we only need to modify the configuration file without modifying the code. SPL supports dynamic data source, and can switch different databases through parameters or macros, thus making migration more convenient. SPL also provides a standard SQL syntax that has nothing to do with specific database. Using the sqltranslate function can translate the standard SQL to mainstream dialect SQL, which is executed also through the query function.

Low coupling

The database is only responsible for storage, not for data business. Data business is implemented in SPL+ simple SQL and located in the same position as application. When the data business changes, we only need to modify the code in the application without having to maintain the database, so the coupling between them is low. SPL is a common Java library that can be deployed in different applications and is naturally isolated between applications.

Strong computing ability

SPL provides a wealth of computing functions, and can implement SQL-style calculations with intuitive and concise code.

Sub-query: Orders.new(OrderId,month(OrderDate):m).new(OrderId,m)
Group and aggregate: T.groups(year(OrderDate),Client; avg(Amount):amt)
Calculate the top 3 of each group: Orders.groups(Client;top(3,Amount))

SPL supports ordered computing, set computing, stepwise computing and association computing, and is suitable for simplifying complex data computation. The computing ability of SPL surpasses that of SQL, for example: calculate the maximum consecutive days that a stock keeps rising:

Another example, find out employees with the same birthday as someone else in a company:

SPL also offers richer date and string functions, far surpassing Java’s computing libraries and SQL in terms of number and functionality.

It is worth mentioning that in order to further improve development efficiency, SPL invents unique function syntax.

Support hot deployment

SPL is an interpreted language, and its code is placed outside the Java as the form of script file. SPL code can be executed without compiling, and takes effect immediately after modifying the script. In addition, SPL supports non-stop hot deployment, which is suitable for the changing business logic, and makes operation and maintenance complexity low.

SPL has other advantages: support full-featured debugging, including breakpoint, step in, entry, run to cursor, etc.; SPL code is usually as the form of script files, and can be stored in the directory of operating system to facilitate team code management; Since SPL code does not depend on Java, and the data business and the front-end code are physically separated, the code coupling is low.

--

--

The Java Trail

Scalable Distributed System, Backend Performance Optimization, Java Enthusiast. (mazumder.dip.auvi@gmail.com Or, +8801741240520)