Medical Image Format FAQ - Part 6

Hosts and Compression

Access to other parts ...


4. Host Machines

4.1 Data General

4.1.1 Data General Data Data General Integers

Integers are 16 bit two's complement and stored in big-endian format as on Sun Sparc and opposite to the Dec VAX. Data General Floating Point

Single precision real values are 32 bits long, in big-endian format. The high bit is the sign bit, followed by a 7 bit excess 64 exponent (power to which 16 must be raised) then a 24 bit hexadecimally normalized mantissa with the decimal point to the left of the most significant bit. Double precision values just have another 32 bits tacked on the mantissa and the same exponent format.

           |<-->|<------ Exponent ------>|<--------- Mantissa -------->|
            ______________ ______________ ______________ ______________
           |              |              |              |              |
            31          28 27          24 23          20 19          16
           |<----------------------- Mantissa ------------------------>|
            ______________ ______________ ______________ ______________
           |              |              |              |              |
            15          12 11           8 7            4 3            0

Here is a little piece of C++ code that should run on anything and convert Data General floats to whatever the host's floating point format is.

		double	value;
		unsigned char	sign;
		Uint16		exponent;
		Uint32		mantissa;

		typedef struct {
			unsigned	sign : 1;
			unsigned	exponent : 7;
			unsigned	mantissa : 24;

		DG_FLOAT number;

		unsigned char buffer[4];,4);
		if (instream) {
			// DataGeneral is a Big Endian machine
			memcpy ((char *)(&number),buffer,4);
			sign     = number.sign;
			exponent = number.exponent;
			mantissa = number.mantissa;

			value = (double) mantissa / (1 << 24) *
				pow (16.0, (long)(exponent) - 64);
			value = (sign == 0) ? value : -value;
		else {
			cerr << "read failed\n" << flush;

4.1.2 Data General Operating System Data General RDOS

Used on the GE CT 9800 family. Severely primitive but then is running on an old machine that can only map 64Kb of memory at a time after all. It is apparently multitasking. Documentation may still be available from Data General (try DG Direct) but is not supplied with the scanner by GE. If anyone knows where I can find it at a reasonable price let me know. Here is a brief command summary culled from a nifty pocket book from GE for SunOS/Genesis users that compares commands:

                 CHATR  - file attributes
                 CRAND  - create randomly organized file
                 CDIR   - create directory
                 DELETE - files or directories
                 DIR    - change directory
                 DISK   - free space
                 FILCOM - compare files
                 GDIR   - show working directory name
                 GTOD   - show date and time
                 LINK   - files (symbolic)
                 LIST   - directory contents
                 MOVE   - a file
                 RENAME - a file
                 SDAT   - set date
                 STOD   - set time
                 SDUMP  - write files to a device
                 SLOAD  - read dumped files
                 SPEED  - tex editor
                 TYPE   - contents of file
                 XFER   - copy a file

                 wildcards: '-' is series, '*' is single character Data General AOS/VS

Used on the GE Signa 3X and 4X family. Quite a nice operating system with multi-tasking and hierarchical directories. Here is a brief command summary again culled from a nifty pocket book from GE for SunOS/Genesis users that compares commands:

                 ACL         - access control list (ownership)
                 BYE         - exit command process
                 COPY        - a file
                 CREATE      - a text file
                 CREATE/DIR  - a directory
                 CREATE/LINK - link files
                 DELETE      - files & directories
                 DIR         - display or change working directory
                 DUMP        - to peripheral
                 F/AS/S      - directory listing with file status
                 DATE        - show or set
                 LOAD        - DUMPed files
                 MOVE        - a file
                 RENAME      - a file
                 PATH        - show pathname of a file
                 PAUSE       - the command line interpreter
                 SUPERU ON   - enable superuser
                 SED         - text editor
                 TIME        - show or set
                 TYPE        - contents of text file
                 ?           - list processes running

                 wildcards: '+' is series, '*' is single character

Other useful hints include the use of "^" to refer to the next directory up (like ".." in Unix) in DIR commands. Command options follow the command name without any spaces and are indicated by a slash. COPY operations specify the destination name first and then the source name. Devices like the mag tape are indicated by "@", for example "@MTB0" is tape drive zero. Files on the tape can be referred to as "@MTB0:nn" which is very handy. For example to read a file off a CT 9800 tape under AOS/VS:

                COPY/V/IMTRSIZE=8192 B038040101.YP @MTB0:18

Perhaps most importantly, there is an extensive online help system ... use the HELP command.

4.1.3 Data General Network

If you have a GE Signa based on a DG then you can get the so-called "High Speed Network" card and software from GE. From memory it is pretty pricey, and there used to be a "slower" network interface that was cheaper, but I don't think this is available anymore.

If you have a CT 9800 based on the DG S/140 and you need to get it connected there are a number of solutions:

              $2,850 - EC-10 ethernet controller
              $1,645 - RDOS TCP/IP software (telnet client,ftp client/server)

I have not personally tried either of these approaches, and I am sure there are others (talk to Merge or DeJarnette), but I am getting really tired of carrying 9-track tapes around so perhaps I will bite the bullet soon (and upgrade to a HighSpeed Advantage !).

4.2 Vax

4.2.1 Vax Data Vax Integers Vax Floating Point Vax Strings

4.2.2 Vax Operating System Vax VMS
(See also Vax VMS Tools)

Truely one of the world's most irritating operating systems to use, especially if you are a unix fan. Still it works, has a great online help system that saves one's butt almost often enough to be useful, and if you can remember the directory where kermit is stored and the weird command to invoke it one can get by (barely).

If you don't know VMS and the vendor doesn't supply the manuals, get them from DEC ... you need them bad ... real bad. If (like me) you throw them out everytime you move then encounter another piece of archaic equipment, you need the "vaxbook" which is available via ftp from, written by Joseph E St Sauver, which summarizes commands, files and all sorts of application specific stuff, though it is no substitute for the real thing.

Recent VMS update: goddamn file formats ! Why can't VMS behave like a real operating system and forget this file format crap ! I have some Philips S5 MR images exported in ACR/NEMA format and I can't get the things off the hosts's Vax using Kermit, because though they have fixed length 512 byte records, some cretinous program sets the "carriage return carriage control" record attributes, which causes kermit to send with all the '0A' characters scrubbed out amongst other atrocities.

I am getting desperate and about to try using the Hex/Dehex utility that came with Kermit to get the stuff off and then decode the hex format ! Or perhaps even use "dump" to make a textfile, transfer, and decipher that. (No I don't have a C compiler for the Vax so I guess I can't use uuencode unless someone wants to mail me a hex'ed executable). Any hints, or instructions as to how to use FDL and Convert, to change it to a normal format would be appreciated. (Why can't they just have a "set file record attribute xxx" command like all the other millions of set commands ? Grrrr.).

More recent VMS update: finally had an inspiration while staring at hex dumps of these files - why not use the VMS "DUMP" utility which produces hex dumps as a "poor man's uuencode" by saving the dump to a file, transferring it as an ascii file, and then decoding it at the destination ? Of course there are no nifty line checksums or anything, but a transfer protocol such as kermit takes care of this.

The DUMP output defaults to 8 32 bit long words separated by a space per line displayed as hex, then an ascii string (32 bytes) and then a 24 bit word hex address offset from the start of the fixed length record. All the data containing lines start with a single space, where as descriptions at the start of each record begin in the first column, hence the data lines can be easily selected out. By the way, the hex version of the data is listed in reverse order ! VMS is so bizarre ! For example, here is a fixed length 512 byte record file from a Philips S5 MRI (some of the hex words elided to make the line fit on the page):

Dump of file SYS$SYSROOT:[GYROSCAN]ABAALKHAIL02010201010001.ANI;1 ...
File ID (2419,301,0)   End of file block 198 / Allocated 200

Virtual block number 1 (00000001), 512 (0200) bytes

 0000000C 00100008 ... 00000008 .............................. 000000
 00083932 2E36302E ... 2D524341 ACR-NEMA 1.0.. .....1994.06.29.. 000020
 00600008 4D5F4553 ... 00000030 0.......@.........A.....SE_M..`. 000040
 494B0000 00100080 ... 00000002 ....MR..p.....Philips ........KI 000060

 00183148 00000002 ... 32200000 .. 2........63865375........H1.. 0001E0
Dump of file SYS$SYSROOT:[GYROSCAN]ABAALKHAIL02010201010001.ANI;1 ...
File ID (2419,301,0)   End of file block 198 / Allocated 200

Virtual block number 2 (00000002), 512 (0200) bytes

 40000018 45424F52 ... 00161250 P.....AGACQ_PT_SURFACE_PROBE...@ 000000

And so on ... you get the idea. This ugly little C++ utility written quickly during this moment of inspiration will take saved DUMP output and make it binary again:

#include <fstream.h>

#include "MainCmd.h"

signed char
hextobin(char c)
	signed char r;
	switch (c) {
		case '0':	r=0; break;
                case '1':       r=1; break;
                case '2':       r=2; break;
                case '3':       r=3; break;
                case '4':       r=4; break;
                case '5':       r=5; break;
                case '6':       r=6; break;
                case '7':       r=7; break;
                case '8':       r=8; break;
                case '9':       r=9; break;
		case 'A':
                case 'a':       r=0xa; break;
                case 'B':
		case 'b':       r=0xb; break;
                case 'C':
		case 'c':       r=0xc; break;
                case 'D':
		case 'd':       r=0xd; break;
                case 'E':
		case 'e':       r=0xe; break;
                case 'F':
		case 'f':       r=0xf; break;
		default:	r=-1; break;
	return r;

main(int argc,char **argv)

	while (1) {
		const linemax=132;		// only needs 113
		char line[linemax];
		if (!cin || cin.eof()) {
			// cerr << "Bad or eof\n" << flush;
		unsigned count=cin.gcount();
		if (count == 0 || line[0] != ' ') continue;
		if (count != 113) {
			cerr << "Line length " << count << "\n" << flush;
		unsigned i;
		char *ptr = line + 8*(1+8);
		// line is in reverse order ...
		for (i=0; i<8; ++i) {
			unsigned j;
			for (j=0; j<4; ++j) {
				// 2 hex bytes -> 1 byte
				char bytelo = *--ptr;
				char bytehi = *--ptr;
				unsigned char byte
					= (hextobin(bytehi)<<4)
					  + hextobin(bytelo);
			--ptr;	// space between long words
	return 0;

Note that the nature of fixed length records under VMS means that the last record will be padded out to 512 bytes without any indication of the "real" end-of-file. This means you have to cope with trailing garbage gracefully.

Hot VMS/Philips news: (Peter Neelin) tells me there is an extremely useful tool for fiddling binary files called FILE from DECUS. It allows you to change a file's header information without modifying the content of the file. This then permits ftp, kermit, etc. to do the right thing with Philips .ANI files. It also permits wildcards and does not make a copy of the file (so it is fast). He says also that someone has told him that they succeeded in using convert to fix these files, but his general experience with it is not positive (it will often change the content of the file and it doesn't allow wildcards, in addition to promoting the use of the horrible fdl editor!). If you are interested, you can get FILE through gopher from (look for the DECUS software library archives, under essential tools). The binary is provided in case you don't have a compiler. FILE, and many other useful things are also available from the sites listed in Vax VMS Tools.

Some other useful hints:

			UNIX FTP server     Vax/VMS FTP server

			cd dir                cd [.dir]
			cd dir/subdir         cd [.dir.subdir]
			cd ..                 cd [-] ULTRIX OSF

4.3 Sun - Sun3 68000 and Sun4 Sparc

4.3.1 Sun Data

The sun3 and sun4 architectures use much the same formats. Even though the processors are different both are big-endian and the float formats are IEEE. See the Sparc Architecture Manual - Chapter 3 - Data Formats for more details.

One very important difference though, is that the sun3 convention is not to align 32 bit and 64 bit data types on 4 and 8 byte boundaries respectively, whereas the sparc (sun4) architectures usually does, dictated by a compile time option. Be very careful when using the same header files on one architecture or the other. This drove me nuts when trying to figure out why the well described Genesis (sun3) layout did not match the unknown Advantage Windows (sun4) data. It was pretty obvious when it was pointed out though :). Sun Integers

Integers are 8, 16, 32, or 64 bit unsigned or signed two's complement and stored in big-endian format as on Data General and opposite to the Dec VAX. Most C compilers treat short as 16 bits, and int and long as 32 bits. Sun Floating Point

Formats conform to the IEEE 754-1985 Standard for Binary Floating-Point Arithmetic. Single precision real values are 32 bits long, in big-endian format. The high bit is the sign bit, followed by a 8 bit excess 127 exponent (power to which 2 must be raised) then a 23 bit normalized mantissa with the decimal point to the left of the most significant bit, from which 1.0 has been subtracted. Double precision values have a 11 bit excess 1023 exponent and a 52 bit mantissa. Quad precision values have a 15 bit excess 16383 exponent and a 112 bit mantissa.

           |<-->|<-------- Exponent -------->|<------- Mantissa ------>|
            ______________ ______________ ______________ ______________
           |              |              |              |              |
            31          28 27          24 23          20 19          16
           |<----------------------- Mantissa ------------------------>|
            ______________ ______________ ______________ ______________
           |              |              |              |              |
            15          12 11           8 7            4 3            0

Here is a little piece of C++ code that should run on anything and convert Sun IEEE floats to whatever the host's floating point format is. It probably should take into account a few special cases to be strictly correct:

		unsigned char buffer[4];,4);
		if (instream) {
			float fvalue;
			memcpy ((char *)(&fvalue),buffer,4);
			unsigned char	sign;
			Uint16		exponent;
			Uint32		mantissa;

			typedef struct {
				unsigned	sign : 1;
				unsigned	exponent : 8;
				unsigned	mantissa : 23;

			// Sparc is a Big Endian machine
			memcpy ((char *)(&number),buffer,4);
			sign     = number.sign;
			exponent = number.exponent;
			mantissa = number.mantissa;

			if (exponent) {
				value = (1.0 + (double)mantissa / (1 << 23)) *
					pow (2.0, (long)(exponent) - 127);
			else {
				if (mantissa) {
					value = (double)mantissa / (1 << 23) *
						pow (2.0, (long)(-126));
				else {
			value = (sign == 0) ? value : -value;
		else {
			cerr << "read failed\n" << flush;
		} Sun Strings

Strings obey the usual C convention of null terminated strings without a length preamble.

4.3.2 Sun Operating System

5. Compression Schemes

5.1 Reversible Compression

5.2 Irreversible Compression

5.2.1 Perimeter Encoding

5.3 DICOM Compression

In DICOM, compression (both reversible and irreversible) is achieved by specifying a particular "transfer syntax" either during negotiation of the network connection (association) or in the media application profile for files stored on media (and specified in the meta information header so the reader knows which transfer syntax to switch to).

The compressed data stream is actually encoded as an "encapsulated" data stream as defined in Part 5 of DICOM. Uncompressed data (unencapsulated) is sent in DICOM as a series of raw bytes or words (little or big endian) in the Value field of the Pixel Data element (7FE0,0010). Encapsulated data on the other hand is sent not as raw bytes or words but as Fragments contained in Items that are the Value field of Pixel Data. The encoding of these Items follows the same pattern as is used to specify Sequences in DICOM, thogh the VR (Value Representation) field of the Pixel Data is OB not SQ.

The encapsulated compressed data may be a single frame or it may contain multiple frames for those SOP Classes that allow multifram images (such as XA, XRF, US and NM). The rules in part 5 further specify that the first Item will either be empty or contain a list of offsets to the beginning of the Item containing each frame (or the only frame for a single frame image). Also, though a frame may be split into multiple fragments, each fragment may contain data for only one frame. That is a frame may be split into multiple fragments, but a fragment may not span different frames. The reason for the fragments in the first place is that each fragment (each item) must have a fixed, known length, so unless one buffers the entire compressed frame before encoding it, one doesn't know in advance how long it will be. In practice, most encoders do send one frame per fragment but all decoders must be prepared to handle the case where a frame spans fragments. Furthermore, all fragments have to be of even length, and there are padding rules in Part 5 for the last fragment of a frame (that are consistent with the definition of padding in the JPEG standard).

Part 5 contains several examples of how to fill in the various fields in Items of the encapsulated sequence-like value for Pixel Data, so these will not be repeated here. However the overall strategy looks something like this for an image with two frames,the first split across two fragments, and an empty offset table:

		(7FE0,0010) VR=OB VL=FFFFFFFF Pixel Data
		(FFFE,E000) VR=   VL=00000000 Item (empty offset table, hence zero length)
		(FFFE,E000) VR=   VL=000004C6 Item (first fragment of first frame)
		.... compressed byte stream here (4C6 bytes)
		(FFFE,E000) VR=   VL=0000024A Item (first fragment of first frame)
		.... compressed byte stream here (24A bytes)
		(FFFE,E000) VR=   VL=00000628 Item (first fragment of first frame)
		.... compressed byte stream here (628 bytes)
		(FFFE,E0DD) VR=   VL=00000000 Sequence Delimiter

Note that the Item and Sequence Delimiter tags have no VR, that the Item Delimiter tag is never used, since Items are required to be of fixed not undefined length, and that the Sequence Delimiter tag is always used, since the Pixel Data is always of undefined length (that is FFFFFFFF) for encapsulated data.

If one is trying to decode a DICOM image encoded with an encapsulated transfer syntax, one therefore has to get to the Pixel Data tag, and start parsing the sequence like structure. One cannot just pass the entire Value field of Pixel Data to a conventional JPEG decoder for instance. One needs to strip out the embedded Item tags and the trailing Sequence Delimiter. For an example of how to do this see the source code from dicom3tools in "libsrc/include/pixeldat/unencap.h", a simplified version of which (without the GE bug handling) is reproduced here.

	size_t read(void)
			// - non-pixel data is always LE, including fragment delimiters and lengths
			// - 1st item is offset table, may have zero VL
			// - other items are fragments
			// - finally sequence delimitation tag (with zero VL)
			// - each delimiter is 2 byte group,2 byte element, 4 byte VL, little endian
			// - Item tag      is (0xfffe,0xe000)
			// - Seq delimiter is (0xfffe,0xe0dd)


			while (!lefttoreadthisfragment && !finished && !bad) {
				Uint16 group=read16();
				Uint16 element=read16();
				Uint32 vl=read32();
				if (group == 0xfffe) {
					if (element == 0xe0dd) {	// Sequence Delimiter Tag
						Assert(vl == 0);
					else /* if (element == 0xe000) */ {	// Item Tag
						bool vlbyteorderwrong=false;
						if (++fragmentnumber > 0) {
							Assert(vl);	// Zero length fragments thought not to be legal
						else {
							// skip the offset table
							Assert(vl%4 == 0);
							unsigned i=0;
							while (vl) {
								Uint32 offset=read32();
				else {
					// bad tag group in encapsulated data

			if (lefttoreadthisfragment && !bad) {
				length=unsigned(lefttoreadthisfragment > maxlength ? maxlength : lefttoreadthisfragment);
				if (istr->read(buffer,length)) {
				else {

			return length;

An application that will take a DICOM dataset and write a pure byte stream (having stripped off the DICOM encapsulation) is also in dicom3tools, "dctoraw". One can feed the output of this utility straight to a JPEG decoder such as the Stanford PVRG utility "jpeg -d". If any padding is present at the end of each frame, it should have been encoded in a manner consistent with JPEG padding defined in ISO 10918-1 so that the JPEG decoder won't fail if it encounters padding between the image frames.

Note also that the use of the terms "image" and "frame" are slightly different in DICOM than JPEG so be careful when comparing the two standards.

When using images with more than one component (that is a color image rather than a grayscale image), take care about the color space. One of the features of the ISO 10918-1 JPEG standard is that it specifies only a compressed bitstream, and not a file format. Even if there are three components specified in the compressed bitstream, that does not mean they are RGB or YBR or whatever. This has to be signalled outside the bitstream, and in DICOM this is done in Photometric Interpretation (this is somewhat controversial however, and one should look at recent proposed DICOM CPs on the matter, such as CP 143).

In the non-DICOM world, the color space is specified in the file header such as the commonly used JFIF header, or its superset, the SPIFF header as defined in ISO 10918-3. Be especially careful that one does not assume during decoding that a JFIF header is present in the DICOM compressed bit stream ... it is not. If one wants to feed the extracted bitstream to a JPEG decoder that needs a JFIF header (like the IJG code), then you need to add one. Conversely, never create an encapsulated DICOM image with a bitstream that contains the JFIF header ... strip it off first or use an encoder like Stanford PVRG JPEG that doesn't create JFIF headers.

Here JPEG has been discussed, but the same principle applies to other encapsulated data sets in DICOM, including the RLE compression scheme popular in Ultrasound images (which is equivalent to the TIFF PackBits compression scheme). The compression scheme to interpret the encapsulated bitstream is different, but the encapsulation mechanism using Item tags and fragments is identical.

This mechanism has been widely used in the cardiac angiography world on the DICOM CDs that these devices make, on Ultrasound 90 mm MODs, and on GE's more recent CT and MR scanners that write use the CT and MR media application profile on 130 mm MODs. Note that early implementations of the encapsulation mechanism and the JPEG lossless encoding contain some bugs which are described in detail in the section on GE CTI.

6. Getting Connected

6.1 Tapes

Nine-track half-inch tapes were the old medium of choice for archiving and image exchange and many older pieces of equipment will have these. Unfortunately most people don't have such a drive on their workstation or personal computer. There are several possibilities:

The Qualstar 1054 is one such drive, that attaches to a SCSI port, and works with the regular SunOS SCSI tape driver, once a few tables in the kernel have been updated as follows, and the kernel rebuilt:

{root}% pwd

{root}% diff -c stdef.h.prequalstar stdef.h
*** stdef.h.prequalstar Tue Aug 30 19:32:24 1994
--- stdef.h     Tue Aug 30 19:32:24 1994
*** 43,48 ****
--- 43,49 ----
  #define       ST_TYPE_FUJI            0x21    /* Fujitsu - (not tested) */
  #define       ST_TYPE_KENNEDY         0x22    /* Kennedy */
  #define       ST_TYPE_HP              0x23    /* HP */
+ #define       ST_TYPE_QUALSTAR        0x24    /* Qualstar */
  #define       ST_TYPE_HIC             0x26    /* Generic 1/2" Cartridge */
  #define       ST_TYPE_REEL            0x27    /* Generic 1/2" Reel Tape */
{root}% diff -c st_conf.c.prequalstar st_conf.c
*** st_conf.c.prequalstar       Tue Aug 30 19:32:22 1994
--- st_conf.c   Tue Aug 30 19:32:22 1994
*** 153,158 ****
--- 153,174 ----
   * so our best guess as to their capabilities is
   * included herein.
+ /* Qualstar 1054 or 1260s scsi 9-track with 64KB buffer */
+ {
+       "Qualstar 1054/1260s 1/2\" Reel", 7, "NCR ADP-53", ST_TYPE_QUALSTAR, 10240,
+       300, 300,
+       { 0x00, 0x02, 0x06, 0x03},
+       {  0, 0, 0, 0 }
+ },
+ /* Qualstar 1054 scsi 9-track with 256KB buffer */
+ {
+       "Qualstar 1054 1/2\" Reel", 10, "QUALSTAR10", ST_TYPE_QUALSTAR, 10240,
+       300, 300,
+       { 0x00, 0x02, 0x06, 0x06},
+       {  0, 0, 0, 0 }
+ },
  /* Wangtek QIC-150 1/4" cartridge */ {
        "Wangtek QIC-150", 14, "WANGTEK 5150ES", ST_TYPE_WANGTEK, 512,

I got my Qualstar 1054 from Bill Power at Power Computer Services for only $750 and have successfully read GE 9800 CT and Philips S15 MR tapes with it so far. See the "Sources" section for where to get one.

Once you have such a tape connected to the SCSI port, one can either write simple programs to read files (easiest if the tape has variable length records) or use shell scripts and the "dd" command with whatever the correct block size is. See dd(1), mt(1), and mtio(3) for more information. Remember that the read(2) call reads one fixed or variable length record at a time, and returns 0 bytes read for a tape mark, and two tape marks in a row indicates the end of the tape (normally). If you encounter short files with a series of records 80 bytes long chances are you are dealing with header/end markers. This is what ANSI standard tapes off VAX VMS seem to look like.

Anyone who has any further information about tape formats and handling, especially references to standard or on-line documents please let me know.

6.2 Ethernet

6.3 Serial Ports

The next part is part7 - information sources.