Feature Wiki

Information about planned and released features

Tabs

Functions

Materialized Path

Until July 2012 this page was called "Performance Improvement of Tree operations".

1 Description

We want to improve the performance and throughput of all operations that act on the "tree" table of ILIAS.

1.1 Known Issues

Performance measurements at the Lucerne University of Applied Sciences and Arts (HSLU) have shown that the following operations on the "tree" table are causing performance and throughput issues:
  • "Insertion", "Removal" and "Move" operations on the "tree" table have an expensive worst-case scenario. In this scenario the "Nested Sets" data structure in all rows of the table must be updated. The cost of this operation is linear to the total number of repository nodes. On ILIAS repositories with large numbers of objects (200,000 or more objects) it may take several seconds to perform one of these operations. (The ILIAS repository at HSLU has ~500,000 objects).
  • "Insertion", "Removal" and "Move" operations always require a lock of the whole "tree" table, because the "Nested Sets" data structure can not be updated concurrently. Hence while a user performs any of these operations, nobody else can access the repository - not even reading is possible. On ILIAS repositories with many concurrent users (more than 200 concurrent users) this scenario can become the limiting factor for the overall troughput of the system. (HSLU has up to 400 concurrent users).
  • Retrieval of the path to an object takes linear time depending on the depth of the object in the tree. For this scenario a complex SQL-statement with nested JOIN-operations must be performed over the "Adjacency Map" data structure of the "tree" table. Path retrieval is performed very often in ILIAS. For example, if a user navigates through the repository, for each listed object, the path must be retrieved.
 

1.2 Improving Tree Operations

We suggest the following changes:

1.2.1 Materialized Path

Replace the "Adjacency Map/Nested Sets" data structure by an "Adjacency Map/Materialized Path" data structure. Specifically remove the fields "lft" and "rgt" from table "tree", and add a new field "path" of type "char(255)" to the table, and add an index for this field.
 
Benefit: The "Adjacency Map/Materialized Path" data structure can perform Insertion/Removal and Move operations with a cost of 1 per affected node. Path retrieval is also supported with a cost of 1 per affected node using the path index. Subtree retrieval is performed using the path index with a cost of 1 per retrieved node in the subtree. Locks are only needed either for single nodes or for the subtree starting at an affected node. This means that concurrent writes are possible in different areas of the repository.
 
Caveat: The depth of the tree is limited by the maximal length of data field which holds the materialized path.
With the current approach used in the Release_3_10_hslu_branch, we need 11 characters to store the object reference key (which is of type int(11)), plus 1 additional character for separating the keys. The key of the root object of ILIAS is known to use only 1 digit. With this approach, we need 253 characters to represent a path with 22 components (1+21*(11+1)=253).
With MySQL, an indexable field can have up to 65,535 characters, of which up to the first 255 characters can be indexed. This yields to a maximal depth of 5,462 levels for the repository on MySQL (1+5461*(11+1)=65,533).
Other database systems, for example Oracle, my have different limitations.
 
(The ILIAS repository of HSLU has currently a depth of 21 levels. We have tested it with up to 100 levels using a materialized path of 1189 characters, and a prefix index of 255 characters.)

1.2.2 InnoDB

Use the InnoDB table engine instead of the MyISAM table engine for the tables "tree" and "object_reference". Use explicit transactions for operations on the tree.
 
Benefit: This table engine support concurrent reads during writes. And - most importantly - concurrent writes are supported as long as they don't affect the same rows. Changing to the InnoDB table engine is necessary to take advantage of the "Adjacency Map/Materialized Path" data structure. If we kept the MyISAM table engine, MySQL would still lock the entire table even if we only need a lock for a specific table row or a specific range of table rows.
 
Caveat: This specific distinction of table engines only exists for MySQL. For other database systems another distinction may be necessary.
This feature (1.2.2) will be implemented as MyISAM/InnoDB switch for 4.3.

1.2.3 Option to Switch Between Nested Sets and Materialized Path

Optional feature: Provide an option which lets system administrators choose between the "Adjacency Map/Nested Sets" data structure and the "Adjacency Map/Materialized Path" data structure. If the administrator chooses one of these options, ILIAS alters the data structure of the "tree" table and fills in the required values to make the data structur work.
 
Benefit: Some ILIAS installations can opt out until the suggested solution is known to work stable. Or maybe, some ILIAS installations need to opt out, because their repository is more than 22 levels deep.
 
Caveat: This specific distinction of table engines only exists for MySQL. For other database systems another distinction may be necessary.

2 Status

3 Additional Information

  • If you want to know more about this feature, its implementation or funding, please contact: Werner Randelshofer werner.randelshofer@hslu.ch

4 Discussion

JF 5 May 2009:
  • A restrictions to 22 level does not seem acceptable when we already know of an installation that needs 21 levels. Would it be possible to encode the node ids not at decimals with the cost of one byte per decimal? Hex could already bring us 28 levels, but there may be even better solutions.
  • For the locking we would need an abstraction, not only on the DBMS level, but on the engine level, since we do not want to force anyone to use InnoDB. Means: a lock row should fall back to a lock table, if row locking is not supported. We need solutio for MyISAM, InnoDB, Oracle. PostgreSQL would be nice.
  • There are a lot of open questions how the crucial functions are performed in detail deleteTree, moveTree, insertNode, getSubTree, getPath...
W. Randelshofer 6.5.2009

Depth restriction

The depth restriction bothers us as well. But please note, that the 21 levels on our installation are a really extreme case. 99.9 % of all objects are contained in the first 17 levels.

I digged in the MySQL documentation. If we use a prefix index instead of a regular index, we are only limited by the maximal length of varchar fields (which is 65'535 bytes) and the maximal row size (which is 65'535 bytes as well). If we use ASCII-encoding instead of UTF-8 encoding, we can thus easily store paths with more than 5,400 levels. (I noticed that ASCII-encoding has the added benefit of reducing the storage requirements of the index by factor 3. So we should use it anyway for the materialized path field.). Since MySQL performs a prefix compression on the index keys, and also only needs storage space for bytes in the path fields which are actually used, we should not see any performance degradation or excessive storage space usage if we create a path field of type ASCII VARCHAR(65000) with an index of type BTREE with a prefix length of 255 characters. - Of course, we are going to measure this.

If a database other than MySQL is used, the limits may be different. I don't know what the limitations are of every possible database out there. The suggested Hex-encoding of the keys is easy to implement, and it has the added benefit of reducing the storage space requirements of the tree table. So, it won't hurt if we implement it.
I don't know whether the HEX() and UNHEX() functions are supported by other databases though.

We can use a property in client.ini.php to define the maximal depth of the materialized path.


Abstraction of the Locking Strategy

As far as I know, ILIAS 3.11 is going to provide a database abstraction layer. I think, we can easily integrate the locking strategy into this abstraction layer.

We can use a property in client.ini.php to define which strategy shall be used.

We are going to make a specification for this.



Specification of Tree Operations

We are going to make a specification of all tree operations, showing a comparison of the current tree operations with the new operations.

FYI, We have already implemented all of the tree operations for the materialized path data structure. We are currently doing tests and some bug fixing. If you want, you can take a look at the implementation. You can search for "BEGIN PATCH Tree" to find all code sequences which use the materialized path data structure.
Alex, 8 May 2009:

Would it be necessary to do the Hex/Unhex enconding on the DB level? If possible I would do it in PHP, even better would be to use base_convert with a base of 36.
Werner, 9 May 2009:

I quickly looked through the code that we have implemented so far - I think we don't need to do the encoding at the DB level. I am going to take a closer look at this on Monday.

For our installation, I am going now for a maximal depth of 100 levels. I tested this already - it works like a charm.
I used decimal encoding and the ASCII character set, a VARCHAR(1200) field to store the path, and a prefix 255 index.

With base-36 encoding a VARCHAR(900) would suffice to store a path of 100 levels, but…



…I don't know whether we can use the function base_convert to perform the conversion. The PHP specification states that this function uses floating point numbers to perform the conversion. It does not state whether 32-bit or 64-bit floating point numbers are used.

This is unfortunate since we are dealing with very large integer numbers with 11 decimal digits. With that many digits we exceed the precision of 32-bit floating point numbers. This might lead to subtle errors that are very hard to detect in the future.


btw. the PHP dechex function only supports integer numbers with up to 9 decimal digits, then it performs a floating point conversion as well. So, we must not use it.

Unless there is a pressing need for hex-, or base36-encoding, I would rather go with the more robust decimal encoding.
Therefore I am asking: Is there a pressing need for any other decoding than decimal since we can easily represent a path with more than 5,400 levels with MySQL using decimal encoding?
JF 16 Nov 2009: We should try to get these improvements into ILIAS 4.1.
Simon Moor 20.9.2011: Feature won't get into 4.2, what about 4.3?
Alex 20 Mar 2012: I would like to put the materialized path implementation on the agenda of 4.3.
JF 2 May 2012: We schedule the materialized path implementation for 4.3 again and hope to find someone who funds its integration.
Alex 1 Nov 2012: Funding for 4.4 integration is settled.
JF 10 Dec 2012: We highly appreciate the development and schedule it for 4.4.
Jochen 29 Oct 2013: What does "currently no switch to mp possible" on release 4.4 page mean? Currently in "ILIAS 4.4 Beta 1" or in  finale release.
Stefan 07 Nov 2013: The switch is implemented in the setup in release 4.4 Beta 2.

5 Implementation

A comprehensive specification of the repository tree changes is available here:
Redesign of ILIAS Repository Tree for Performance

Last edited: 7. Nov 2013, 09:26, Meyer, Stefan [smeyer]