Tel: +44 1903 209367
Mobile: +44 7802 651892
|Main Information Help Links Maritime Personal Small print|
Kongsberg HAIN – configuration of time synchronisation
The world has turned a few times since this page was first published in 2008 and it refers to a very early version of the software, but it seems a shame just to throw it away, so we’ve left it here for students of history. Who knows – it could still be useful to someone, but do check first.
OSKTimeSynch and NTP
Before HAIN was introduced, Kongsberg Maritime used a simple form of time synchronisation (OSKTimeSynch.exe) that would get the time from the master APOS to set local PC clocks. This is adequate for most purposes, but the HAIN system implements time-based filtering (Kalman) which would give unstable results if the clock time were to jump, as it could if using OSKTimeSynch.
For this reason, HAIN systems use Network Time Protocol (NTP). NTP is a mature, reliable and well-developed protocol which polls reliable time sources and disciplines the PC clock to run accurately by varying its speed without introducing any jumps. On the Master APOS station NTP is configured to get its time from the IOserver that interfaces to GPS time (and the 1pps signal if available); the HAIN system and the other APOS stations have NTP configured to get time from the Master APOS.
NTP is implemented by the executable program, ntpd.exe, and configuration file ntp.conf. The only configuration difference beween stations is the IP address that is used as the time server source. On all stations except the Master, the “server” address is the address of the Master; on the Master station the “server” address is 127.127.28.2, which causes NTP to read GPS time from the IOserver via shared memory (SHM). If this is not available the Master station will use its own free-running clock 127.127.1.0 as the reference.
There are three configuration issues worth checking if you have any concerns about time synchronisation in a HAIN system:
Note: HAIN does not care that the time is correct – it just needs it synchronised between the PCs without jumping, but of course surveys will require accurate time-stamps on fixes.
Problem: OSKTimeSynch running
In recent software versions this should not be an issue. In early installations it was not unusual to find OSKTimeSynch still configured to start up on stations that had NTP running, which causes a conflict. This can be discovered by using ALT-Tab to toggle through all of the running processes. If OSKTimeSynch is found to be running as well as ntpd.exe then it should be stopped by closing (not just minimising) its window. The lines that start OSKTimeSynch may also be removed by commenting out from the startup batch files in the APOS directory.
In newer software versions the main APOS process, WinHPRu, starts ntpd.exe (if it is not already running) and kills OSKTimeSynch.
Problem: NTP not getting time from IOserver (SHM)
This can be caused by a "race condition" between NTP and the IOserver. NTP must be running before the IOserver starts, but although the APOS startup batch file starts them in the correct sequence, the IOserver sometimes starts before NTP has finished initialising.
On the Master APOS station, the IOserver receives GPS time. NTP requests this time regularly, and this can be seen by ALT-tabbing to the ntpd.exe window. When the transfer of time is working correctly, there will be regular updates from the reference clock 127.127.28.2, e.g.:
refclock_transmit: at 749934 127.127.28.2
The offset value is the difference in seconds between the reference time and the system clock time – in a working system that has stabilised this should never be more than a few milliseconds (i.e. 0.00xxxx).
If NTP is not fetching time from the SHM, then error messages will be seen in the ntpd.exe window, e.g.:
refclock_transmit: at 15 127.127.28.2
Note that there is no refclock_receive but instead a clk_noreply event. There will also be entries from NTP in Windows' Application Event log of the form:
SHM: No new value found in shared memory
There are two methods that will get NTP talking to the SHM clock. The first is easier with a full keyboard, but can also be done by logging in as Service and selecting the Task Manager from the Utility menu:
You should now find that NTP starts to get time from SHM. The alternative method is simply to stop and restart APOS (but not rebooting the PC). This works because stopping APOS does not stop NTP, and so when APOS restarts, NTP will already be running and ready to connect to IOServer.
Problem: Clock time jumping
This is a critical problem, because the reason for using NTP is to avoid causing the clocks to jump and so destabilise the Kalman filter. When NTP first starts it will jump the clock to the correct time (as long as it is within 1000 seconds), but then control should only be carried out by minute adjustment of the clock speed. NTP also stores the drift rate of the clock in the drift.conf file to be read as a starting adjustment. I have observed, however, that in some HAIN systems there are periodic gross clock adjustments sometimes exceeding 1000mS.
These can be recognised in two ways:
These messages should only been seen shortly after start-up, after NTP has confirmed that the time source appears to be stable. By default, NTP never steps the clock unless there is an error greater than 128mS. When NTP is working normally this is a large error that should not occur; however sometimes there can be instability, and the step limit needs to be increased. If clock resets continue to be seen every few hours or more frequently, then the work-around is to include the following line in the ntp.conf file:
This line should be early in the file – it is best to insert it just before the server lines. The effect is to alter the default step size so that NTP will not step the clock unless the error exceeds 1.5 seconds; this is a condition that should never exist after the initial step at start-up.
NTP slews the clock at a maximum 0.5mS/S (half a millisecond per second) so that the minimum time to correct a 1 second error is 2000 seconds. Thus, a 1.5 second error will take an hour or so to correct. As mentioned above, you should see offsets of below 10mS in a system that has stabilised.