A partition file purge could unexpectedly fail with error PUSD_ERR (718). While attempting to reproduce the issue reported from the field, it was found an internal server hang could occur. A partition purge attempts an internal file block call to allow an open partition member to be closed while holding a mutex for the partition host. If a thread is waiting for the same host file (for other operations such as closing a partition member), it was unable to complete without acquiring the same mutex.
The timing of this situation could result in either a file block timeout, or deadlocked threads, resulting in a server hang. Evidence of this occurrence was suggested with following reported server logging:
- User# 00020 ctFILBLK Note: no progress clearing threads from core. Abort block attempt.: 842
To avoid this condition, the mutex is now released before calling the file block and repeat the mutex acquisition after the file block completes.