Coding with the enemy

Tuesday, April 22, 2008

Database versioning part 2

In part 1, I discussed database versioning with MS SQL Server. SQL Server has robust tools available that simplifies the process of differencing and creating upgrade scripts. DBISAM doesn't have these tools, but using datamodules and table components makes versioning close to painless.

Situation 2
Delphi app
DBISAM database
1 developer

Rental Property Manager uses a DBISAM database (DBISAM v3 in RPM 1, DBISAM v4 in RPM 2). In the 4 years since it was released, it has been through about 20 changes.

Unlike sql server, there is no database comparison utility to generate upgrade scripts. That means things have to be done a bit differently.

I still use the same general process as for sql server:

Record the database version number in the database
Define the expected db version in the application
Keep a master db for comparison
Only make one set of changes at a time
Automate differencing as much as possible
Unit test, test and test again. The tests should fail the moment a table is modified.

Because dbisam doesn't support views, the version number is stored (along with a bunch of other info) in an ini file in the database directory.

I have a datamodule, TdmodCheckDatabase. This has a TdbisamTable component for every table in the database. The table component contains all fields in the table and is updated whenever the table is changed.

To make database changes, the following process was used:

Increase the version number in the application
Make and test DB changes.
Update the affected tables in TdmodCheckDatabase
If necessary (rarely) add further upgrade queries to TdmodCheckDatabase. E.g. to set the values of new fields, or to add new data rows.
Generate a CreateDatabase unit script using the supplied database tools.
Update unit tests to suit the new db

When the application is run, it goes through the following process

If no database is found, then run CreateDatabase unit and then do step 3
Get the current version number from the database ini file
If it is less than the expected version number then
Run CreateDatabase (to create any new tables)
Check every table component in TdmodCheckDatabase
Apply any table changes
run any manual upgrade scripts
Update the version number in the database ini file.

In code this is:


class procedure TdmodCheckDatabase.UpgradeDatabase(databasePath: string; currentVersion, newVersion: integer);
var
module: TdmodCheckDatabase;
f: integer;
begin
module:= TdmodCheckDatabase.create(nil);
try
  module.OpenDatabase( databasePath );

  for f:= 0 to module.ComponentCount -1  do
  begin
    if module.Components[f] is TDBISAMTable then
    begin
      try
        // if we need to upgrade table to dbisam 4
        if currentVersion <= DB_VERSION_FOR_DBISAM4 then
          TDBISAMTable(module.Components[f]).UpgradeTable;

        module.UpgradeTable(TDBISAMTable(module.Components[f]));
      except
       // logging and error stuff removed
      end;
    end;
  end;

  for f:= currentVersion + 1 to newVersion do
    module.RunUpgradeScripts(f);

  module.sqlMakeIndexes.ExecSQL; // have to create additional indexes manually
 finally
  module.DBISAMDatabase1.Close;
  module.free;
end;
end;


procedure TdmodCheckDatabase.UpgradeTable(table: TDBISAMTable);
var
 fieldIndex: integer;
 needsRestructure: boolean;
 canonical: TField;
begin
 needsRestructure:= false;

 table.FieldDefs.Update;

 // add any new fields to the FieldDefs
 if table.FieldDefs.Count < table.FieldCount then
 begin
   for fieldIndex := table.FieldDefs.Count to table.Fields.Count -1 do
   begin
     table.FieldDefs.Add(fieldIndex + 1, table.Fields[fieldIndex].FieldName, table.Fields[fieldIndex].DataType, table.Fields[fieldIndex].Size, table.Fields[fieldIndex].Required);
   end;
   needsRestructure:= true;
 end;

 // make sure we have correct size for string fields
 for fieldIndex := 0 to table.FieldDefs.Count -1 do
 begin
   if (table.FieldDefs[fieldIndex].DataType = ftString) then
   begin
     canonical:= table.FindField(table.FieldDefs[fieldIndex].Name);
     if assigned(canonical) and (table.FieldDefs[fieldIndex].Size <> canonical.Size) then
   begin
     // field size has changed
     needsRestructure:= true;
     table.FieldDefs[fieldIndex].Size:= canonical.Size;
   end;
   end;
 end;

 if needsRestructure then
   table.AlterTable(); // upgrades table using the new FieldDef values
end;

procedure TdmodCheckDatabase.RunUpgradeScripts(newVersion: integer);
begin
 case newVersion of
   3: sqlVersion3.ExecSQL;
   9: sqlVersion9.ExecSQL;
   11: begin  // change to DBISAM 4
         sqlVersion11a.ExecSQL;
         sqlVersion11b.ExecSQL;
         sqlVersion11c.ExecSQL;
         sqlVersion11d.ExecSQL;
         sqlVersion11e.ExecSQL;
       end;
   19: sqlVersion19.ExecSQL;
   20: sqlVersion20.ExecSQL;
 end;
end;

Unit tests included:

Make sure the current version is correct
Make sure that every table and every field exists
Create a new blank database (for a number of different versions) and work though the upgrade process to make sure the final database is correct.
Restore an existing older database with data and upgrade to the latest version

With this process, altering the database structure is trivial for most changes. Adding fields and table usually requires no more work than updating the table components and generating a new creation script .

The current implementation does have a couple of restrictions in that it won't remove tables or fields. However if that is required, it won't take long to add.

Thursday, April 17, 2008

Database versioning part 1

Versioning databases is one of those ongoing problems that has no one-size-fits-all solution. There are 2 solutions I have developed and used successfully.

The general process I use each time is:

Record the database version number in the database
Define the expected db version the application
Keep a master db for comparison
Only make one set of changes at a time
Automate differencing as much as possible
Unit test, test and test again. The tests should fail the moment a table is modified.

Situation 1
MSDE (= SQL server) database
8 developers (pair programming)
C# application

Every developer had 2 databases, Unit Test db and Application Test DB (UnitDb and AppDb from now on). With msde (now sql express) there are no licensing costs to worry about.
There was also a Master database stored on a central server that served as the canonical reference.

The database version number was stored in a view ('select xx as VersionNumber).

To make database changes, the following process was used:

Check out the latest version of the app
Increase the version number in the application
Make and test DB changes. Usually this was done in UnitDB. AppDb was kept synchronised using SQL Compare
Update the version number in UnitDB
Generate an update script using Sql Compare. Scripts were named UpgradeXXX.sql where XXX is the version that was being upgraded from.
Generate a CreateDatabase.sql script (for shipping with the app) and a CreateDatabaseXXX.sql (for unit tests only) script. In this case XXX is the version that will be created. The 2 scripts are the same except for the name.
If necessary (rarely) append further queries to the scripts. E.g. to set the values of new fields, or to add new data rows.
Update unit tests to suit the new db
Check in changes

When the application is run, it goes through the following process

If no database is found, then run CreateDatabase.sql
Get the current version number from the database
If it is less than the expected version number then
run UpgradeXXX.sql and go to 2

When the unit tests are run, the first step is to upgrade UnitDb to the current version using the same process as the main application. That means that other peoples changes are automatically applied. The unit tests included:

Make sure the current version is correct
Make sure that every table and every field exist (this doesn't always need to be explicit as the persistence unit tests should pick up any problems here).
Create a new blank database (for a number of different versions) and work though the upgrade process to make sure the final database is correct.

The secret weapon in all this was sql compare which makes generating the scripts quite straight forward.
Also the upgrade scripts can do more than one set of version changes. ie Upgrade001.sql could upgrade the version to v10 so that scripts 002 - 009 don't need to be run.

Situation 2 (delphi and dbisam) will follow in a later post.

Links
Sql Compare

Sunday, March 2, 2008

VMWare tips

Wireless networking
By default vmware networks are set up as bridged. However this doesn't work with Vista and wireless networks.

The easy answer is to change the network setup to NAT. However if you are doing server development this may not be much use. In that case, you can try network bridging as described here.

Resizing the vm image.
The easy way is to download the vm converter from here. Amoungst other things (creating an image from an existing computer) this can let you resize an image.

Performance
If you have sufficent system memory, change VMWare settings to fit all virtual machine memory into reserved host ram (Edit menu -> Preferences, memory tab). This will give a performance boost.

Also, preallocate the hard drive space.

Links
VMware Knowledgebase article - http://kb.vmware.com/selfservice/dynamickc.do?cmd=show&forward=nonthreadedKC&docType=kc&externalId=1212&sliceId=2&stateId=0%200%209345265
Bridging Process http://mingstert.blogspot.com/2007/12/vmware-wireless-network-adapter-and.html

VM Converter http://vmware.com/products/converter/

Tuesday, February 26, 2008

TIOPF - New persistance layer walkthrough

TIOPF

I use tiOPF for most new Delphi database applications.
tiOPF is a Object Persistence Framework. That is, it is a framework based around saving your objects to, and loading them from, databases and/or flat files.

See overview for more details on tiOPF.

Persistence Layers (PL)
TIOPF uses persistence layers to save/load objects. If you want to use an unsupported database, you just write another PL, and the required unit tests. The following is a walkthrough the process of creating and testing a new PL.

Corelab - SDAC
I use Corelab's Sql Server Data Access Components for talking to MS Sql Server databases. This is faster than the BDE and ADO components I have used previously. However there is currently no PL for this...

Step 1 - Base point
Writing a PL involves the oldest form of code reuse - copy/paste.
Find an existing PL similar to what you want. In this case, I will use ADOSQLServer. The ado sql server layer is slightly more complicated than most as it uses an abstract class shared with ado access. However this shouldn't be a problem.

Because I am using a very similar persistance layer, the required changes are quite limited. In many cases, I would need to alter the Connection string handling as well, amd perhaps the multithreading support.

I need to copy the *ADOSQLServer files to *CrLabSDAC files, and then make the following substitutions:

TMSConnection replaces TADOConnection
TMSQuery replaces TADOQuery
CrSDAC replaces ADOSQLServer
CrAbs replaces ADOAbs

+ the various uses replacements

Using delphi's 'Find it Files', I searched for ADOSQLServer in the ..\tiOPF2\Trunk\ directory and subdirectories. It pulls up the following files:

tiQueryADOSQLServer (which uses tiQueryADOAbs)
TTestTIPersistenceLayersADOSQLServer
tiTestDependencies
tiConstants
tiOPFManager

tiQueryADOSQLServer.pas is saved as tiQueryCrSdac.pas
tiQueryADOAbs.pas is saved as tiQueryCrAbs.pas
Both of these are saved in the \Options\ directory

tiOPFADOSQLServer_TST.pas is saved as tiOPFCrSdac_TST.pas
This is saved in the \UnitTests\Tests directory.

Step 2 Persistence Layer Changes
In tiQueryCrAbs, I made the following changes:

In the uses I replace ,ADODb with ,DBAccess, MSAccess
replace TADOConnection with TMSConnection
replace FADOConnection with FMSConnection
replace TADOQuery with TMSQuery
replace FADOQuery with FMSQuery
replace TtiQueryADO with TtiQueryCrSdac
replace TtiDatabaseADOAbs with TtiDatabaseCrAbs
replace cErrorADOCoInitialize with cErrorCrCoInitialize
replace EADOError with EDAError
replace cTIPersistADOSQLServer with cTIPersistCrSdac
delete cDelphi5ADOErrorString = '...';

In tiQueryCrSdac, I made the following changes:

In the uses replace ,ADODb with ,DBAccess, MSAccess and tiQueryADOAbs with tiQueryCrAbs
replace ADOSQLServer with CrSdac
replace TtiDatabaseADOAbs with TtiDatabaseCrAbs
replace TADOTable with TMSTable
Delete the line TtiQueryCrSdac = class(TtiQueryADO);

In tiConstants, I added the line
cTIPersistCrSdac = 'CrSdac';

In tiOPFManager, I added the line
{$IFDEF LINK_CRSDAC} ,tiQueryCrSdac {$ENDIF}
below
$IFDEF LINK_BDEPARADOX} ,tiQueryBDEParadox {$ENDIF}

This means that the CR PL can be compiled in to applications by using LINK_CRSDAC in the conditional defines instead of adding the unit tiQueryCrSdac to the application. This makes running the unit tests much easier as my local copy of the standard tests just need the define added

Finally, I added the LINK_CRSDAC define to the DUNINTTIOPFGui appliciation and did a build.

Some errors turned up in constructor TtiQueryCrSdac.Create; so I commented them out for now. I found a few more compile errors as well due to the differences between SDAC and Ado components

replace ExecSQL with Execute
replace Parameters with Params
replace TParameter with TMSParam
replace CommitTrans with Commit
replace RollbackTrans with Rollback
replace BeginTrans with StartTransaction
replace ConnectionString with ConnectString

Finally it all compiles. I ran the unit tests to make sure I haven't broken anything yet. I shouldn't have (yet) but ...

8 minutes later, 1729 tests are run and passed.

Step 3 Unit Test changes

In tiOPFCrSdac_TST, I made the following changes:

replace ADOSQLServer with CrSdac

In tiTestDependencies, I made the following changes:

added ,tiOPFCrSdac_TST after ,tiOPFADOSQLServer_TST
added tiOPFCrSdac_TST.RegisterTests; after tiOPFADOSQLServer_TST.RegisterTests;

Update: Add the following define to the unit test properties: LINK_CRSDAC

This now adds unit tests for the CrSdac PL. It will run the same tests as the AdoSqlServer layer. If necessary, I could override and alter the tests to accommodate database changes.

Compile and run again.

This time there are numerous errors. In part, this is because this layer doesn't implement CreateDatabase and no default database has been defined.

I ran the tests again. In the Setup dialog, I clicked on the [Local Settings] button. I added the following lines to the ini.
[DB_CrSdac]
DBName=localhost:tiopf
UserName=NULL
Password=

This resolves most errors, leaving only 5. I won't step through the process of fixing them. They came down to:

SDAC truncates long strings to 8000. This needs further investigation
SDAC GetTableNames returns the owner as part of the name (eg 'dbo.MyTable') which needs to be removed.

Step 4 Real data test
In the unit tests for my real application, I added then tiQueryCrSdac unit. I can now swap persistence layers by altering the connection string used in my application.

Running the tests raises a couple more errors which are resolved by changing some of the SDAC query options. Once that is done, and the tests all pass, I ran my application against some test data, and real data. All works well.

Step 5 Build a patch file

Links
TIOPF http://tiopf.sourceforge.net/
Overview http://tiopf.sourceforge.net/Doc/overview/index.shtml
Corelabs http://crlab.com/

Friday, February 1, 2008

Remote access to computers

I spent a fair amount of time using one computer to look at/control another across the internet. Over the past 5 years I have evaluated a number of products. The following are some of my favourites.

Remote Desktop
This is the best option across a wan, giving good performance. However it is a pain in the proverbial to set up for use across the internet, as the ports are frequently blocked by firewalls.

LogMeIn.com
LogMeIn is the one I use must frequently, for controlling my personal computers. A free account will let you control a reasonable number of computers (5 I think). The paying version has additional features such as file access, sound and printing. I have used the free version for about 5 years.

Install is straight forward, go to the web site, log in, install software and go. The software takes care of firewall and NAT issues in nearly every case. In 5 years, I only found one location where I couldn't get connectivity.

There is software available for nearly everything, I have even used my windows mobile cellphone to control my pc.

It's account based, so it is good for computers I own. It is not so good for other computers where I don't have physical access beforehand as I obviously don't want to pass my account details around. Once installed, the client pc can be unattended.

GotoMyPc is an alternative, but they don't have a free version.

CrossLoop
I use crossloop (no relation) for remote user support. They download and install crossloop and I do the same. They click on the share button, send me the access code, and then I connect using the same access code. Performance is not as good as LogMeIn, but it is usually adequate.

It does require a user on the client machine to run the software, hit the share button and provide the access code. It's free to use.

CoPilot is an alternative which is probably better for Grandma and technically challenged users. It's free on weekends, and $5 a day during the week.

Links
LogMeIn
GotoMyPC
CrossLoop
Copilot

Monday, January 21, 2008

Garbage collection: Performance test

Following my initial GC post, I received feedback regarding my comment "A well written and tuned garbage collector can be faster than manual allocation.". These comments can be summarised as "show us the proof".

I asked on the Boehm GC mailing list (if in doubt, ask for help). The conversation starts here.

They provided the following (my summary):

One benchmark is here, showing that speeds are comparable given sufficient memory (a gc will require more memory) .
Another is here from Hans Boehm's presentation. See pages 50 onwards. He comments that it is a toy benchmark, on linux.
Malloc implementations have improved
Code that favours manual allocation
Simple create, do something, free
Large objects
Code that favours gc
Complicated lifetime management
try ... finally, free
multi threading

Well that kinda helps. But what about in delphi?

I have done some quick tests using my modified version of the delphi wrapper for the Boehm GC (Delphi GC for short). The modifications shouldn't make any major difference to the result.

Delphi benchmark 1:
This is a simple, trivial, benchmark. It creates 60,000,000 small objects and assigns a value.

The object is simply:


TTestObject = class
public
Lines: TStrings;
constructor Create();
destructor  Destroy; override;
end;

and the test is simply

for f := 1 to TestCount do
begin
testObj:= TTestObject.Create;
{$ifdef USE_GC}
testObj.Lines.Add('aaa');
{$else}
try
testObj.Lines.Add('aaa');
finally
testObj.Free;
end;
{$endif}

The try ... finally free section is not required by the GC version as we don't have to worry about memory leaks.

The GC tests were repeated with a range of initial heap sizes and on different computers. The FastMem test was also tried without the try finally. The source code is available if anyone wants it.

The results are

	Old laptop, 512mb	Core 2, 2gig	Single core 2 gig	Quad core 3 gig
FMM (no try finally)		approx 31.5
FMM try finally	81.281	33.306	37.875	48.046
GC 0mb		73.181	59.047	46.25
GC 5mb		39.499	32.906	29.656
GC 10mb	60.891	30.857	29.422	27.984
GC 20mb	58.328	26.926	27.437	27.062

Given a large enough initial heap, the gc version ends up faster than the FastMM version.

This is not a serious benchmark, but it does indicate that a gc can be faster than manual allocation.

Delphi benchmark 2

For this, I added the gc to 2 of my existing unit tests. It was a 2 line conversion, I just added the gc and set the initial heap size.

Enable is a work injury management system. It is heavy on database access and single threaded.
Envisage is a document management system. Database access is done via the tiopf object persistence framework. It reads pdf files, checks for bar codes and creates new ones. It is multi-threaded. It uses a large amount of memory.

Here are the results:

	Envisage, no threads	Envisage, threaded	Enable
FMM	70	114.4	16.4
GC 20	74	119.0
GC 40	71	117.5	14.1
GC 100	71	115.6

Conclusion
Based on these results, I would have to say that my comment "A well written and tuned garbage collector can be faster than manual allocation." is correct. Given sufficient heap space, the gc version is faster in some tests, and 1% slower in others.

Le me restate my conclusion as the initial one is not well worded in terms of what I intended to say. A better conclusion would be "It is possible for a garbage collected application to run at a speed similar to that of an application using manual deallocation". Or alternately, "adding a gc to an application doesn't automatically make it incredibly slow".

The gc performance could probably be improved further by surfacing the gc tuning options, improving the delphi wrapper and using a later version of the GC. The unit tests could also be sped up by removing the now redundant frees, destructors and try ... finally blocks

The Boehm GC used is an early version 6 (6.2 or so). Version 7 is available from cvs. V7.1 should be released soon.

There are downsides to using a gc, such as increased memory use. It is not appropriate for all applications, especially those with memory constraints. However speed does not appear to be one of those downsides.

Update

In response to a query, yes the garbage collector is running, and collecting the objects. After the initial run (which may increase the heap), the heap size remains static no matter how many times I repeat the test (any of the tests).

I also repeated the FastMM test removing the testObj.Free; line.

It "completed" in 35 seconds. By completed, I mean "used up 1.3gig of free mem, all my 4gig page file and then threw an "out of memory" exception.

Reference
GC Mailing list: Are there any benchmarks comparing the speed of gc v non gc
Garbage collector for Delphi
Boehm GC
Wikipedia article

Friday, January 18, 2008

Garbage collection: Follow up

Given some of the feedback on my previous post, I thought a follow up would be in order

Performance
One of the most contentious points was my comment that "A well written and tuned garbage collector can be faster than manual allocation". I will cover this in a separate post as it needs more than a couple of lines.

Why would you want to use a GC in delphi
I will cover this in a separate post as well. There is probably little gain in just adding a gc to an existing delphi app (unless it's leaky, but we don't write apps like that). If you are writing a new app based around having a gc in place, then you can do things differently.

Clarifications
Once of the original quotes referred to objects referencing each other not being released. I read this as talking about cyclic references. That is a problem with simple reference counting, but not with a tracing (ie mark and sweep) gc such as used by Boehm, .net and java (1).

A gc is not a silver bullet, nor will it catch all memory leaks. I am not suggesting that it will.

Corrections
One point I forgot to mention. Some gc algorithms will allocate extra memory for flags, counts etc (1). This can push up the memory use compared to manual allocation. However Fast mm 4 (2) also allocations a 32 bit flag ahead of every memory block so it is probably a wash.

Fast mm
Fast mm will not catch all memory leaks. It will catch memory that hasn't been freed when the application exits (2) which is not the same thing.

If you have poor testing coverage, then the untested code can have memory leaks.
Fast mm will not catch objects freed on application shutdown (ie forms owned by the application). A gc won't catch this either.

Fast mm will help with double frees, but it won't help with a/v errors (unless I am missing something, it certainly hasn't helped me). A gc will help with both of those (1).

References
1 The wikipedia article on Garbage collection provides a lot of the background.
2 Fast mm details are taken from http://dn.codegear.com/article/33416

"However most Delphi memory managers request large chucks of memory from windows and then parcel it out to the app on request," See (2) and Nexus MM

The quotes in the original article are taken from the newsgroup thread "Garbage collection"