Coding with the enemy

Tuesday, February 26, 2008

TIOPF - New persistance layer walkthrough

TIOPF

I use tiOPF for most new Delphi database applications.
tiOPF is a Object Persistence Framework. That is, it is a framework based around saving your objects to, and loading them from, databases and/or flat files.

See overview for more details on tiOPF.

Persistence Layers (PL)
TIOPF uses persistence layers to save/load objects. If you want to use an unsupported database, you just write another PL, and the required unit tests. The following is a walkthrough the process of creating and testing a new PL.

Corelab - SDAC
I use Corelab's Sql Server Data Access Components for talking to MS Sql Server databases. This is faster than the BDE and ADO components I have used previously. However there is currently no PL for this...

Step 1 - Base point
Writing a PL involves the oldest form of code reuse - copy/paste.
Find an existing PL similar to what you want. In this case, I will use ADOSQLServer. The ado sql server layer is slightly more complicated than most as it uses an abstract class shared with ado access. However this shouldn't be a problem.

Because I am using a very similar persistance layer, the required changes are quite limited. In many cases, I would need to alter the Connection string handling as well, amd perhaps the multithreading support.

I need to copy the *ADOSQLServer files to *CrLabSDAC files, and then make the following substitutions:

TMSConnection replaces TADOConnection
TMSQuery replaces TADOQuery
CrSDAC replaces ADOSQLServer
CrAbs replaces ADOAbs

+ the various uses replacements

Using delphi's 'Find it Files', I searched for ADOSQLServer in the ..\tiOPF2\Trunk\ directory and subdirectories. It pulls up the following files:

tiQueryADOSQLServer (which uses tiQueryADOAbs)
TTestTIPersistenceLayersADOSQLServer
tiTestDependencies
tiConstants
tiOPFManager

tiQueryADOSQLServer.pas is saved as tiQueryCrSdac.pas
tiQueryADOAbs.pas is saved as tiQueryCrAbs.pas
Both of these are saved in the \Options\ directory

tiOPFADOSQLServer_TST.pas is saved as tiOPFCrSdac_TST.pas
This is saved in the \UnitTests\Tests directory.

Step 2 Persistence Layer Changes
In tiQueryCrAbs, I made the following changes:

In the uses I replace ,ADODb with ,DBAccess, MSAccess
replace TADOConnection with TMSConnection
replace FADOConnection with FMSConnection
replace TADOQuery with TMSQuery
replace FADOQuery with FMSQuery
replace TtiQueryADO with TtiQueryCrSdac
replace TtiDatabaseADOAbs with TtiDatabaseCrAbs
replace cErrorADOCoInitialize with cErrorCrCoInitialize
replace EADOError with EDAError
replace cTIPersistADOSQLServer with cTIPersistCrSdac
delete cDelphi5ADOErrorString = '...';

In tiQueryCrSdac, I made the following changes:

In the uses replace ,ADODb with ,DBAccess, MSAccess and tiQueryADOAbs with tiQueryCrAbs
replace ADOSQLServer with CrSdac
replace TtiDatabaseADOAbs with TtiDatabaseCrAbs
replace TADOTable with TMSTable
Delete the line TtiQueryCrSdac = class(TtiQueryADO);

In tiConstants, I added the line
cTIPersistCrSdac = 'CrSdac';

In tiOPFManager, I added the line
{$IFDEF LINK_CRSDAC} ,tiQueryCrSdac {$ENDIF}
below
$IFDEF LINK_BDEPARADOX} ,tiQueryBDEParadox {$ENDIF}

This means that the CR PL can be compiled in to applications by using LINK_CRSDAC in the conditional defines instead of adding the unit tiQueryCrSdac to the application. This makes running the unit tests much easier as my local copy of the standard tests just need the define added

Finally, I added the LINK_CRSDAC define to the DUNINTTIOPFGui appliciation and did a build.

Some errors turned up in constructor TtiQueryCrSdac.Create; so I commented them out for now. I found a few more compile errors as well due to the differences between SDAC and Ado components

replace ExecSQL with Execute
replace Parameters with Params
replace TParameter with TMSParam
replace CommitTrans with Commit
replace RollbackTrans with Rollback
replace BeginTrans with StartTransaction
replace ConnectionString with ConnectString

Finally it all compiles. I ran the unit tests to make sure I haven't broken anything yet. I shouldn't have (yet) but ...

8 minutes later, 1729 tests are run and passed.

Step 3 Unit Test changes

In tiOPFCrSdac_TST, I made the following changes:

replace ADOSQLServer with CrSdac

In tiTestDependencies, I made the following changes:

added ,tiOPFCrSdac_TST after ,tiOPFADOSQLServer_TST
added tiOPFCrSdac_TST.RegisterTests; after tiOPFADOSQLServer_TST.RegisterTests;

Update: Add the following define to the unit test properties: LINK_CRSDAC

This now adds unit tests for the CrSdac PL. It will run the same tests as the AdoSqlServer layer. If necessary, I could override and alter the tests to accommodate database changes.

Compile and run again.

This time there are numerous errors. In part, this is because this layer doesn't implement CreateDatabase and no default database has been defined.

I ran the tests again. In the Setup dialog, I clicked on the [Local Settings] button. I added the following lines to the ini.
[DB_CrSdac]
DBName=localhost:tiopf
UserName=NULL
Password=

This resolves most errors, leaving only 5. I won't step through the process of fixing them. They came down to:

SDAC truncates long strings to 8000. This needs further investigation
SDAC GetTableNames returns the owner as part of the name (eg 'dbo.MyTable') which needs to be removed.

Step 4 Real data test
In the unit tests for my real application, I added then tiQueryCrSdac unit. I can now swap persistence layers by altering the connection string used in my application.

Running the tests raises a couple more errors which are resolved by changing some of the SDAC query options. Once that is done, and the tests all pass, I ran my application against some test data, and real data. All works well.

Step 5 Build a patch file

Links
TIOPF http://tiopf.sourceforge.net/
Overview http://tiopf.sourceforge.net/Doc/overview/index.shtml
Corelabs http://crlab.com/

Friday, February 1, 2008

Remote access to computers

I spent a fair amount of time using one computer to look at/control another across the internet. Over the past 5 years I have evaluated a number of products. The following are some of my favourites.

Remote Desktop
This is the best option across a wan, giving good performance. However it is a pain in the proverbial to set up for use across the internet, as the ports are frequently blocked by firewalls.

LogMeIn.com
LogMeIn is the one I use must frequently, for controlling my personal computers. A free account will let you control a reasonable number of computers (5 I think). The paying version has additional features such as file access, sound and printing. I have used the free version for about 5 years.

Install is straight forward, go to the web site, log in, install software and go. The software takes care of firewall and NAT issues in nearly every case. In 5 years, I only found one location where I couldn't get connectivity.

There is software available for nearly everything, I have even used my windows mobile cellphone to control my pc.

It's account based, so it is good for computers I own. It is not so good for other computers where I don't have physical access beforehand as I obviously don't want to pass my account details around. Once installed, the client pc can be unattended.

GotoMyPc is an alternative, but they don't have a free version.

CrossLoop
I use crossloop (no relation) for remote user support. They download and install crossloop and I do the same. They click on the share button, send me the access code, and then I connect using the same access code. Performance is not as good as LogMeIn, but it is usually adequate.

It does require a user on the client machine to run the software, hit the share button and provide the access code. It's free to use.

CoPilot is an alternative which is probably better for Grandma and technically challenged users. It's free on weekends, and $5 a day during the week.

Links
LogMeIn
GotoMyPC
CrossLoop
Copilot

Monday, January 21, 2008

Garbage collection: Performance test

Following my initial GC post, I received feedback regarding my comment "A well written and tuned garbage collector can be faster than manual allocation.". These comments can be summarised as "show us the proof".

I asked on the Boehm GC mailing list (if in doubt, ask for help). The conversation starts here.

They provided the following (my summary):

One benchmark is here, showing that speeds are comparable given sufficient memory (a gc will require more memory) .
Another is here from Hans Boehm's presentation. See pages 50 onwards. He comments that it is a toy benchmark, on linux.
Malloc implementations have improved
Code that favours manual allocation
Simple create, do something, free
Large objects
Code that favours gc
Complicated lifetime management
try ... finally, free
multi threading

Well that kinda helps. But what about in delphi?

I have done some quick tests using my modified version of the delphi wrapper for the Boehm GC (Delphi GC for short). The modifications shouldn't make any major difference to the result.

Delphi benchmark 1:
This is a simple, trivial, benchmark. It creates 60,000,000 small objects and assigns a value.

The object is simply:


TTestObject = class
public
Lines: TStrings;
constructor Create();
destructor  Destroy; override;
end;

and the test is simply

for f := 1 to TestCount do
begin
testObj:= TTestObject.Create;
{$ifdef USE_GC}
testObj.Lines.Add('aaa');
{$else}
try
testObj.Lines.Add('aaa');
finally
testObj.Free;
end;
{$endif}

The try ... finally free section is not required by the GC version as we don't have to worry about memory leaks.

The GC tests were repeated with a range of initial heap sizes and on different computers. The FastMem test was also tried without the try finally. The source code is available if anyone wants it.

The results are

	Old laptop, 512mb	Core 2, 2gig	Single core 2 gig	Quad core 3 gig
FMM (no try finally)		approx 31.5
FMM try finally	81.281	33.306	37.875	48.046
GC 0mb		73.181	59.047	46.25
GC 5mb		39.499	32.906	29.656
GC 10mb	60.891	30.857	29.422	27.984
GC 20mb	58.328	26.926	27.437	27.062

Given a large enough initial heap, the gc version ends up faster than the FastMM version.

This is not a serious benchmark, but it does indicate that a gc can be faster than manual allocation.

Delphi benchmark 2

For this, I added the gc to 2 of my existing unit tests. It was a 2 line conversion, I just added the gc and set the initial heap size.

Enable is a work injury management system. It is heavy on database access and single threaded.
Envisage is a document management system. Database access is done via the tiopf object persistence framework. It reads pdf files, checks for bar codes and creates new ones. It is multi-threaded. It uses a large amount of memory.

Here are the results:

	Envisage, no threads	Envisage, threaded	Enable
FMM	70	114.4	16.4
GC 20	74	119.0
GC 40	71	117.5	14.1
GC 100	71	115.6

Conclusion
Based on these results, I would have to say that my comment "A well written and tuned garbage collector can be faster than manual allocation." is correct. Given sufficient heap space, the gc version is faster in some tests, and 1% slower in others.

Le me restate my conclusion as the initial one is not well worded in terms of what I intended to say. A better conclusion would be "It is possible for a garbage collected application to run at a speed similar to that of an application using manual deallocation". Or alternately, "adding a gc to an application doesn't automatically make it incredibly slow".

The gc performance could probably be improved further by surfacing the gc tuning options, improving the delphi wrapper and using a later version of the GC. The unit tests could also be sped up by removing the now redundant frees, destructors and try ... finally blocks

The Boehm GC used is an early version 6 (6.2 or so). Version 7 is available from cvs. V7.1 should be released soon.

There are downsides to using a gc, such as increased memory use. It is not appropriate for all applications, especially those with memory constraints. However speed does not appear to be one of those downsides.

Update

In response to a query, yes the garbage collector is running, and collecting the objects. After the initial run (which may increase the heap), the heap size remains static no matter how many times I repeat the test (any of the tests).

I also repeated the FastMM test removing the testObj.Free; line.

It "completed" in 35 seconds. By completed, I mean "used up 1.3gig of free mem, all my 4gig page file and then threw an "out of memory" exception.

Reference
GC Mailing list: Are there any benchmarks comparing the speed of gc v non gc
Garbage collector for Delphi
Boehm GC
Wikipedia article

Friday, January 18, 2008

Garbage collection: Follow up

Given some of the feedback on my previous post, I thought a follow up would be in order

Performance
One of the most contentious points was my comment that "A well written and tuned garbage collector can be faster than manual allocation". I will cover this in a separate post as it needs more than a couple of lines.

Why would you want to use a GC in delphi
I will cover this in a separate post as well. There is probably little gain in just adding a gc to an existing delphi app (unless it's leaky, but we don't write apps like that). If you are writing a new app based around having a gc in place, then you can do things differently.

Clarifications
Once of the original quotes referred to objects referencing each other not being released. I read this as talking about cyclic references. That is a problem with simple reference counting, but not with a tracing (ie mark and sweep) gc such as used by Boehm, .net and java (1).

A gc is not a silver bullet, nor will it catch all memory leaks. I am not suggesting that it will.

Corrections
One point I forgot to mention. Some gc algorithms will allocate extra memory for flags, counts etc (1). This can push up the memory use compared to manual allocation. However Fast mm 4 (2) also allocations a 32 bit flag ahead of every memory block so it is probably a wash.

Fast mm
Fast mm will not catch all memory leaks. It will catch memory that hasn't been freed when the application exits (2) which is not the same thing.

If you have poor testing coverage, then the untested code can have memory leaks.
Fast mm will not catch objects freed on application shutdown (ie forms owned by the application). A gc won't catch this either.

Fast mm will help with double frees, but it won't help with a/v errors (unless I am missing something, it certainly hasn't helped me). A gc will help with both of those (1).

References
1 The wikipedia article on Garbage collection provides a lot of the background.
2 Fast mm details are taken from http://dn.codegear.com/article/33416

"However most Delphi memory managers request large chucks of memory from windows and then parcel it out to the app on request," See (2) and Nexus MM

The quotes in the original article are taken from the newsgroup thread "Garbage collection"

Thursday, January 17, 2008

Garbage collection in Delphi Win 32: Why and how to do it

I got drawn into a newsgroup thread last month (Delphi and XCode) regarding Delphi and x code. I posted on the existence of a Garbage collector for Delphi (http://codecentral.codegear.com/Item/21646) and spent several days in ongoing discussion. Another recent thread is at Garbage Collection.

This posting is a summary of my views of the subject. I spent 3 years as a fulltime C# programmer (and about 14 as a Delphi programmer) so I have a reasonable amount of experience and without a GC. Based on some of the comments I have been reading, there are a lot of people with more opinion than experience in the newsgroups (no surprises there).

Note: When I refer to Delphi, I mean Delphi win 32. When referring to Delphi.net, I do explicitly.

In general, the debate over garbage collection is already over. I am unaware of any recent language that doesn’t have gc built in. There is even a proposal before the c++ standards board to have a gc in c++. In other words, get used to the idea.

There seems to be a certain amount of dislike/distrust of garbage collection amongst a number of people. So far I have seen comments about laziness and incompetence, spurious analogies to automatic cars and a fair amount of 'I know how to free memory (usually) therefore it must be a good thing'.

Most of the comments are more rhetorical than reasoned arguments:

"Garbage collection leads to sloppy, bloated, inefficient code", Any number of comments on laziness and incompentance: Garbage collection is not about incompetence or laziness. It is perfectly possible to be competent and still prefer a GC.
"destructor calls have to be replaced by strict nil'ing of references": You typically don't need to set objects to nil, although there are some cases where you would want to.
"There's also a problem with objects referencing each other which prevents the items releasing": Not with any recent garbage collector.
"It's inefficient, it slow, and its unacceptable.": Again, not with any recent GC. In many cases a GC app is faster than non-gced.

What is garbage collection?
Garbage collection (GC) is the automatic release of unused memory and objects.

E.g.

MyStrings:= TStringlist.Create;
// try
do stuff here
// finally
// MyStrings.Free; // no longer required
// end;

The try..finally and Free are no longer required as objects are released when (sometime after) they are no longer referenced.

See wikipedia for more info than you ever needed on the subject.

Garbage collection does not release resources (database connections, windows handles, etc). These should be released manually.

Strings, dynamic arrays and interfaces are already garbage collected in Delphi , and virtually everything is in Delphi.net.

Why use garbage collection?

Fewer memory leaks: By reducing the need for manually freeing memory, a gc significantly reduces the scope for memory leaks. It is still possible to leak memory, but much harder.
Less code: I performed a naive analysis on my most recent project by removing most .Free calls and the supporting destructors and try … finally blocks. The result was about 4% fewer lines of code.
Better code: In a non gc language, you end up with a number of idioms and practices to guard against memory leaks. Delphi has several of these.
Eg:
It is rare to return an object from a function. Typically you would create an object and then pass it to a procedure to be modified.
The use of assign rather than :=
The use of Owner, Owned and the like to solve object destruction problems)

There are some problems that it is difficult to solve without a GC. Linq is often given as an example.

What are the disadvantages of a garbage collector?

Memory use: A non gc app can free memory as soon as it is no longer required. A gc however will only free memory once it is satisfied that it is not being used, which could be some considerable time later. Typically the gc will only collect objects when it is feeling some memory pressure. Therefore a non gc app can use less memory than a gc app. However most Delphi memory managers request large chucks of memory from windows and then parcel it out to the app on request, so this disadvantage is largely theoretical.
Speed: A well written and tuned garbage collector can be faster than manual allocation. Unfortunately there is no gc tuned for Delphi. The only available gc is slightly slower than the default memory manager.
Reliance on a gc: Code that is written to take advantage of a gc cannot be readily run without a gc. If your code must work in standard Delphi, then it must be programmed accordingly and not rely on the ex.
Non deterministic finalisation: With a gc, you have little control over when the object is removed from memory. Even when it is removed, destroy is not called unless you have implemented a finalizer.

Why not use Delphi.net
I spent 3 years as a c# and c++ programmer. If I wanted to use .net, I would use c# and have access to all the latest toys such as linq.

The main drawback with .net is the need to distribute a very large runtime library with the application. This may not be a problem with web apps or internal applications, but it can be a large issue with shareware.

A second drawback specific to delphi.net is that your code may end up being used by Delphi win 32. As a result, the recommended approach is to program as if the gc did not exist. That is, to include the .Free calls, the try … finally blocks and the destructors. You end up with the all of the drawbacks of a gc, without the advantages.

A third drawback for me is that many of the libraries and controls I use are not available in Delphi.net.

What about other resources?

If your object holds other resources such as windows handles, database connections etc then you still need to release those. Depending on the object you can do this with .Free, .Close, .Disconnect or similar. .Free will always work the object will be disposed of although the memory won't be released until the gc gets around to it.

How do you use a Garbage Collector in Delphi win32?

The Delphi memory manager is designed to be easily replaceable. A garbage collected memory manger is available from http://codecentral.codegear.com/Item/21646. This is written by Barry Kelly and is a thin wrapper around the Boehm GC library. I have made some changes to make it work better. If there is any interest, I will post the code.