Again, misconception on ORM. A colleague of mine tries to query the database to find a record based on a field of the same table. He tries to use a query by example (a feature in Toplink which allows us to find the matching persistent object(s) if you provide the sample instance). I don't know whether this feature exists in other ORM technologies, e.g. Hibernate, but it's becoming quite handy.
For example, let's say we have an Employee table and a Department table. The Employee has a DepartmentID as his FK field, referring to the PK field of the Department table, with the same name.
To find a Department by its DepartmentID (PK field), all we need to do is as follows:
// initializing the query object
ReadObjectQuery query = new ReadObjectQuery ();
// preparing the example
Department dept = new Department ();
dept.setDepartmentID ("someID");
// set the example
query.setExampleObject (dept);
// execute the query on the current DB session
dept = (Department) session.executeQuery (query);
It's quite simple and straightforward (as it should be). We have retrieve a full Department object by performing such query. Depending on our indirection settings, the Department object may or may not have resolved all of its foreign key fields.
Now, moving on the next step. Let's try to find all Employees which are working on that Department (as specified by the DepartmentID). An obvious code may look something like my colleague did:
// initializing the query object
ReadAllQuery q = new ReadAllQuery ();
// preparing the example
Employee emp = new Employee ();
emp.setDepartment (dept);
// set the example
q.setExampleObject (emp);
// execute the query on the current DB session
Vector employees = (Vector) session.executeQuery (query);
At first, there's nothing wrong with the code. Of course, the results are showing correctly. Then, what's the problem? The only problem with this approach is when the Department object is actually a big table consisting of many fields, and many foreign key fields. In the background process, Toplink tries to generate a SELECT SQL Statement for the query by joining all of the foreign tables, and by putting numerous WHERE clauses on all the Department fields.
Why did this occur? It's all because we're using the wrong approach here. Query by example will read all non-null values from the example object, and create the same number (if not more) WHERE clauses as the number of non-null values in the example object. The query by example approach may be a convenient and suitable approach for query which involves the PK field of the table, or for query which involves a small number of known field values of the table. But, for a different case as shown above, a different approach needs to be chosen.
My colleague performed the query by example on his table (which more complex than the example above), and I have to wait 30 seconds to see the query results. I changed the code to use query by criteria instead, and it only cost me 3 seconds to wait.. :D
I think this could be a good example of working on/using something you don't know very well.. :D
Further Reading:
Hibernate in Action
trying to relive the glory days of blogging as a java developer living in java
Chitika
Showing posts with label ORM. Show all posts
Showing posts with label ORM. Show all posts
Friday, December 24, 2004
Friday, November 05, 2004
optimizing EJB trans-attribute & ORM DB access
A colleague of mine recently had his transaction thwarted because the EJB container (OC4J) timed his transaction out. The default setting is 60 seconds, and he's still running the test in his own desktop with only a single user accessing the transaction. It is quite unacceptable for a transaction to exceed 60 seconds limitation while it is still in development stage.
The transaction includes many reads to the DB, before actually updating one record, and inserting many new records. As most developers who are unaware of the EJB transaction attributes and their respective meanings, he also set all his methods in the SLSB as Required. This would have all of the methods which are part of the transaction to be included in the transaction, when there are many cases that they may not. The transaction itself spans a great length, starting from the very first method in the SLSB, going through many other methods in the same SLSB, or in the other Session Beans, until finally return from the entry point method.
I tried to optimize this by learning the business algorithm behind the transaction, then changing all read accesses to NotSupported, and leave the write access to Required. This has significantly reduced the transaction span time. But still, I had to change the transaction time out limit to 4 minutes. This is still unacceptable.
After measuring the time it took for each processes, I found out that the transaction is trying to insert 15 records to a database table. Each record requires a FK field to be set, and each of this FK field has a different value (for the 15 records), and requires a database access to retrieve the object. Why? Because we're using ORM, i.e. Toplink.
In Toplink, I don't know how they do it in the other ORM technology, before you insert into a child table, you'll need to find the parent object, and set it as the FK field's value. Then, you insert the child.
This type of programming style leads to defficiency in many developers' code. So, what I did was I created another method to query these parent objects (based on whatever information I had), before I try to insert the 15 records into the database. I put the query result into a Map, which will be used by each of the 15 records as the lookup base for their respective FK field's value.
This approach worked very well. It reduced the 20 seconds access into 1.5 seconds. There are still many inefficient code lying around his package. But it's one refactoring. There will be more, of course. The transaction time out limit has been reduced back to 60 seconds, and all went well.
The question remains, does ORM tend to lead inefficient DB access, if performed recklessly?
Further Reading:
Head First EJB
Expert One-on-One J2EE Development without EJB
Bitter EJB
The transaction includes many reads to the DB, before actually updating one record, and inserting many new records. As most developers who are unaware of the EJB transaction attributes and their respective meanings, he also set all his methods in the SLSB as Required. This would have all of the methods which are part of the transaction to be included in the transaction, when there are many cases that they may not. The transaction itself spans a great length, starting from the very first method in the SLSB, going through many other methods in the same SLSB, or in the other Session Beans, until finally return from the entry point method.
I tried to optimize this by learning the business algorithm behind the transaction, then changing all read accesses to NotSupported, and leave the write access to Required. This has significantly reduced the transaction span time. But still, I had to change the transaction time out limit to 4 minutes. This is still unacceptable.
After measuring the time it took for each processes, I found out that the transaction is trying to insert 15 records to a database table. Each record requires a FK field to be set, and each of this FK field has a different value (for the 15 records), and requires a database access to retrieve the object. Why? Because we're using ORM, i.e. Toplink.
In Toplink, I don't know how they do it in the other ORM technology, before you insert into a child table, you'll need to find the parent object, and set it as the FK field's value. Then, you insert the child.
This type of programming style leads to defficiency in many developers' code. So, what I did was I created another method to query these parent objects (based on whatever information I had), before I try to insert the 15 records into the database. I put the query result into a Map, which will be used by each of the 15 records as the lookup base for their respective FK field's value.
This approach worked very well. It reduced the 20 seconds access into 1.5 seconds. There are still many inefficient code lying around his package. But it's one refactoring. There will be more, of course. The transaction time out limit has been reduced back to 60 seconds, and all went well.
The question remains, does ORM tend to lead inefficient DB access, if performed recklessly?
Further Reading:
Head First EJB
Expert One-on-One J2EE Development without EJB
Bitter EJB
Thursday, September 30, 2004
database commit in Toplink
I don't know how the other O/R mapping technology, such as JDO or Hibernate, commits the changes into the database, but Toplink has an interesting behaviour when using the transaction from EJB container.
First, Toplink will remember all database updates (insert, update, and/or delete) being performed within a single transaction. Then, only after the method in which the transaction started exits, and when the EJB Container is about to close the transaction, Toplink will try to generate the SQL statements required to perform the above actions. Toplink then executes these SQL statements, in an order specific to Toplink's dependency algorithm which might differ from the order in which we instruct them to (within our code).
If and only if the database accepts all these changes, then Toplink updates its caches, and return the control back to the EJB Container which will successfully close down the transaction and return back the needed values to the client. If, for example, there's a database exception thrown during the commit phase, the catch block or any other code within our EJB code will not be invoked. Instead, the code of the EJB client will be the one who is supposed to handle the database-related exception.
Why? Because the exception were thrown after the EJB method completes, but before the EJB Container returns the control back to the EJB client. This is troublesome sometimes.
Well, I haven't learned much about Toplink nor any other O/R mapping technology, but is this the ideal approach?
I hardly believe it.. :D
Links:
Links:
Hibernate in Action
More Hibernate books
More JDO books
First, Toplink will remember all database updates (insert, update, and/or delete) being performed within a single transaction. Then, only after the method in which the transaction started exits, and when the EJB Container is about to close the transaction, Toplink will try to generate the SQL statements required to perform the above actions. Toplink then executes these SQL statements, in an order specific to Toplink's dependency algorithm which might differ from the order in which we instruct them to (within our code).
If and only if the database accepts all these changes, then Toplink updates its caches, and return the control back to the EJB Container which will successfully close down the transaction and return back the needed values to the client. If, for example, there's a database exception thrown during the commit phase, the catch block or any other code within our EJB code will not be invoked. Instead, the code of the EJB client will be the one who is supposed to handle the database-related exception.
Why? Because the exception were thrown after the EJB method completes, but before the EJB Container returns the control back to the EJB client. This is troublesome sometimes.
Well, I haven't learned much about Toplink nor any other O/R mapping technology, but is this the ideal approach?
I hardly believe it.. :D
Links:
Links:
Hibernate in Action
More Hibernate books
More JDO books
Subscribe to:
Posts (Atom)