PCjs v2 (version 2.x) disk images are JSON objects with the following properties:
Sector objects contain the raw sector data, using 32-bit signed decimal values, and for sectors that contain data from a file in fileTable, the object will also contain an index into fileTable, along with the data’s offset within the file.
For example, look at the PC DOS 2.00 diskette and examine the following sector object:
The f property is a zero-based index into fileTable that refers to the following file entry:
and the o property indicates that sector’s byte offset (1024) within COMMAND.COM. Having all this information about the FAT file system encoded in the disk image makes it easy to observe low-level I/O and immediately know which portions of which files are being accessed. Other tasks, such as cataloging the contents of disk images, locating identical files, finding files older or newer than a certain date, or extracting other information about the disks or their files, become much simpler as well.
Unusual sectors, like those used in copy-protection schemes, may have non-standard sector IDs or additional properties that simulate special behavior; for example, sectors with a dataError property can trigger a read or write failure at a certain point within the sector. Some of these properties have been discussed in PCjs blog posts and may be documented more fully at a later date.
Older PCjs v1 (version 1.x) disk images were basically just an array of CHS sector data (what is now called the diskData object), without any other information. Such disk images are still supported, but all the disk images now stored on PCjs disk servers, such as diskettes.pcjs.org, have been converted to the v2 format.
To build a PCjs disk image, such as this PC DOS 2.00 diskette, from an IMG file:
diskimage.js PCDOS200-DISK1.img PCDOS200-DISK1.json
In addition to IMG files, DiskImage also includes (experimental) support for PSI (PCE Sector Image) files, which can in turn be built from Kryoflux RAW files. Here are the basic steps, using tools from PCE:
which translates to these commands (using a 360K PC diskette named “disk1” as an example):
pfi disk1.00.0.raw disk1.pfi pfi disk1.pfi -p double-step -r 600000 -p decode pri disk1.pri pri disk1.pri -c 40-99 -p delete disk1.pri pri disk1.pri -p decode mfm disk1.psi diskimage.js disk1.psi disk1.json
To build a VisiCalc diskette from a directory containing VC.COM, specify the name of the directory, including a trailing slash, like so:
diskimage.js /miscdisks/pcx86/app/other/visicalc/1981/archive/VISICALC-1981/ VISICALC-1981.json
Alternatively, use the
diskimage.js --dir=/miscdisks/pcx86/app/other/visicalc/1981/archive/VISICALC-1981 VISICALC-1981.json
By default, the diskette will be given an 11-character volume label derived from the directory name (eg, “VISICALC-19”);
however, you can use
--label to specify your own label (eg,
--label=VISICALC81), or use
--label=none to suppress
the volume label.
The smallest standard PC diskette format that can accommodate all the files will be automatically selected, but you can
specify a different target size using
--target=N, where N is 160K, 180K, 320K, 360K, 720K, 1200K, or 1440K. For example,
if your diskette must work with PC DOS 1.0, use
--target=160K. Sizes are in kilobytes by default, so the
K is optional.
You can also create a hard disk image by specifying a capacity in megabytes (Mb) rather than kilobytes (Kb). For example,
to create a 10Mb hard disk image, use
diskimage.js should be able to read any DOS-formatted disk image,
but its ability to create new hard disk images is currently limited to 10Mb (larger formats will be coming soon).
Another useful option is
--normalize, which will transform the line-endings in all recognized text files from LF to CR/LF;
a recognized text file is any file ending with one of these extensions (.MD, .ME, .BAS, .BAT, .ASM, .LRF, .MAK, .TXT, OR .XML)
AND which contains only 7-bit ASCII characters – since some files, like .bas files, can contain either ASCII or non-ASCII
data. The list of recognized text file extensions is likely to grow over time.
There are many large software collections where the diskette contents have been archived as ZIP files rather than as disk images, and in theory, it’s trivial to
unzip them into separate folders and then use
diskimage.js to build new images from those folders (see
For example, I originally recreated all the PC-SIG Library diskette images from the “PC-SIG Library Eighth Edition” CD-ROM files stored at cd.textfiles.com. Some of the diskettes on the CD-ROM had been completely archived as single ZIP files – probably because the diskettes contained filenames that were not allowed on CD-ROM – so I used
unzip on macOS to extract those ZIP files to folders, and then recreated disk images from those folders.
First, the original order of the filenames was not preserved. Modern operating systems (eg, macOS) list files alphabetically, and as a result, the files on the recreated diskettes were sorted alphabetically as well.
Second, while the ZIP archives appeared to more-or-less preserve non-ASCII filenames,
unzip did not. IBM PCs used a character set now known as Code Page 437 (CP437), which included a variety of line-drawing characters and other symbols that
unzip failed to translate to their modern (UTF-8) counterparts.
To resolve all these issues, I updated
diskimage.js with an option (
--zip) to read ZIP archives directly. I started with an NPM package called node-stream-zip, which is essentially a module that understands the ZIP file format, identifies all the compressed files inside the ZIP file, and uses Node’s built-in zlib functionality to decompress them.
Here’s an example of
--zip in action:
diskimage.js --zip=/Volumes/PCSIG_13B/BBS/DISK0042.ZIP --output=DISK0042.json --verbose DiskImage v3.00 Copyright © 2012-2023 Jeff Parsons <Jeff@pcjs.org> Options: --zip=/Volumes/PCSIG_13B/BBS/DISK0042.ZIP --output=DISK0042.json --verbose reading: /Volumes/PCSIG_13B/BBS/DISK0042.ZIP Filename Length Method Size Ratio Date Time CRC -------- ------ ------ ---- ----- ---- ---- --- MSVIBM.EXE 131392 Implode 73651 44% 1990-02-19 19:30:30 ac9163ba MSR300.UPD 20338 Implode 8627 58% 1990-02-19 19:35:12 61372fc6 MSKERM.HLP 35263 Implode 13799 61% 1990-02-19 19:39:16 1c61d95c MSKERM.BWR 27985 Implode 12132 57% 1990-02-19 19:42:28 353a76ed MSKERMIT.INI 4760 Implode 2309 51% 1990-02-19 20:07:20 00e884a5 GO.BAT 40 Shrink 38 5% 1980-01-01 06:00:08 75d72756 FILE0042.TXT 3870 Implode 896 77% 1990-11-12 01:46:16 3a817bda GO.TXT 1002 Implode 307 69% 1990-11-09 06:21:54 e64455e9 processing DISK0042: 327680 bytes (checksum -1217186896, hash bba045788185bc8284f5e4cde0929b70) writing DISK0042.json...
--verbose option generates the
PKZIP-style file listing, displaying the individual file names, compressed and uncompressed file sizes, compression ratio, etc.
In fact, creating a disk image is entirely optional; you can use
diskimage.js to simply examine the contents of
diskimage.js --zip=/Volumes/PCSIG_13B/BBS/DISK0042.ZIP --verbose
To simplify dealing with large collections of files, I also added an
diskimage.js --all="/Volumes/PCSIG_13B/**/*.ZIP" --verbose
That command will locate all matching
ZIP files and process each one with any other options you specify (eg,
--verbose to display their contents).
--all also supports file extensions
--disk is assumed for any file ending with one of those extensions, whereas
--zip is assumed for any file ending with a
If you want to create a disk image for every
diskimage.js --all="/Volumes/PCSIG_13B/**/*.ZIP" --output=tmp --type=img
--output specifies the output folder and
--type specifies the output file type (either
JSON). Each output file will have the same basename as the
ZIP file. You can use also “%d” anywhere in the
--output value to represent the directory of the corresponding input file (eg,
ZIP files inside disk images can be automatically expanded during disk image processing as well; just add the new
--expand option. Each
ZIP file will be replaced with a folder of the same name, and that folder will contain the entire uncompressed contents of the archive; the original
ZIP file will not be included in the disk image:
diskimage.js --all="/Volumes/PCSIG_13B/**/*.ZIP" --expand --output=tmp
Finally, support for the ARC file format (ZIP’s predecessor) is now available. Just use
--arc instead of
--zip, or specify input files with
.ARC extensions instead of
.ZIP. All the same capabilities apply.
You can extract the contents of a single disk image to your current directory, or to a specific directory using
diskimage.js DISK0001.IMG --extract diskimage.js DISK0001.IMG --extract --extdir=tmp
You can also extract the contents of an entire collection of disk images, placing the contents of each either in the same directory as the original disk image or in a specific directory:
diskimage.js --all="*.IMG" --extract --extdir=%d diskimage.js --all="*.IMG" --extract --extdir=tmp
You can also expand any
ZIP files during the extraction process, by including the
diskimage.js --all="*.IMG" --extract --expand --extdir=tmp
Also, while the
--normalize option was originally created to “normalize” files read from the host (eg, to convert LF to CR/LF in text files), it can also be used during extraction now, when files are being written to the host.
For example, if you want any filenames with CP437 characters to be created properly on the host, or you want the contents of any CP437 text files, BASIC files, etc, to be stored in readable form on the host, use the
--normalize option along with the
--extract option; eg:
diskimage.js --all="/Volumes/PCSIG_13B/**/*.ZIP" --extract --expand --normalize --extdir=tmp
In addition to converting line-endings back from CR/LF to LF,
--normalize will also convert any tokenized
.BAS files to plain-text UTF-8 files on the host, as well as decrypt any
.BAS files that have been “protected” by
BASIC with the
P option of the
--output option is available with all of the above commands as well, but that option only affects disk image creation, not file extraction. If you don’t want any disk images created at the same time, don’t use
A disk image must either be the first argument or specified using the
--disk option. It can either be a local disk image:
or a remote disk image:
Note that the PCjs web server automatically maps certain implicit diskette paths, such as
/diskettes, to specific disk servers,
https://diskettes.pcjs.org. The list of implicit paths for PC disks currently includes (but is not limited to):
diskimage.js does not perform any local-to-remote mapping. Instead, whenever it sees an implicit path, it will look for that path inside the
/disks folder in the PCjs repository, which is the recommended location for all PCjs disk repositories when running PCjs locally. If you want
diskimage.js to use a remote image, you must provide a complete URL.
To get a DOS-compatible directory listing of a disk image:
diskimage.js https://diskettes.pcjs.org/pcx86/sys/dos/ibm/2.00/PCDOS200-DISK1.json --list
To list only files matching a file specification (eg,
diskimage.js https://diskettes.pcjs.org/pcx86/sys/dos/ibm/2.00/PCDOS200-DISK1.json --list="*.EXE"
To display all the unused bytes of a disk image (JSON-encoded disk images only):
diskimage.js https://diskettes.pcjs.org/pcx86/sys/dos/ibm/2.00/PCDOS200-DISK1.json --list=unused
NOTE: Unused bytes are a superset of free bytes. Free bytes are always measured in terms of unused clusters, multiplied by the cluster size, whereas unused bytes are the combination of all completely unused cluster space plus any partially unused cluster space. Being able to see all the unused bytes on a disk can be useful for studying disk image usage, or simply making sure that a disk is free of any unwanted data.
TODO: Update the unused byte report to include unused bytes, if any, in all FAT sectors and directory sectors.
To extract all the files from a disk image:
diskimage.js https://diskettes.pcjs.org/pcx86/sys/dos/ibm/2.00/PCDOS200-DISK1.json --extract
To extract a specific file from a disk image:
diskimage.js https://diskettes.pcjs.org/pcx86/sys/dos/ibm/2.00/PCDOS200-DISK1.json --extract=COMMAND.COM
To display the contents of a specific file in a disk image:
diskimage.js https://diskettes.pcjs.org/pcx86/sys/dos/ibm/3.00/PCDOS300-DISK2.json --type=VDISK.LST
To extract files from a disk image into a specific directory (eg, tmp):
diskimage.js https://diskettes.pcjs.org/pcx86/sys/dos/ibm/2.00/PCDOS200-DISK1.json --extract --extdir=tmp
To dump a specific (C:H:S) sector from a disk image:
diskimage.js https://diskettes.pcjs.org/pcx86/sys/dos/ibm/2.00/PCDOS200-DISK1.json --dump=0:0:1
To dump multiple (C:H:S) sectors from a disk image track, follow the C:H:S values with a sector count; eg:
diskimage.js https://diskettes.pcjs.org/pcx86/sys/dos/ibm/2.00/PCDOS200-DISK1.json --dump=0:0:1:4