public class PseudonymizeAndSequester
extends java.lang.Object
A class to implement bulk de-identification and pseudonymization of DICOM files with sequesteration of files that may have risk of identity leakage.
Modifier and Type | Class and Description |
---|---|
protected class |
PseudonymizeAndSequester.OurMediaImporter
A protected class that actually does all the work of finding and processing the files.
|
Modifier and Type | Field and Description |
---|---|
protected static java.util.Date |
defaultEarliestDateInSet |
protected java.util.Map<java.lang.String,java.util.Date> |
earliestDateByOrignalPatientID |
protected static java.util.Date |
epochForDateModification |
protected java.util.Map<java.lang.String,java.lang.String> |
newPatientIDByOriginalPatientID |
protected java.util.Map<java.lang.String,java.lang.String> |
newPatientIDByOriginalStudyInstanceUID |
protected java.util.Map<java.lang.String,java.lang.String> |
newPatientNameByNewPatientID |
protected static java.lang.String |
ourCalledAETitle |
protected static int |
radixForRandomPseudonymousID |
protected java.util.Random |
random |
Constructor and Description |
---|
PseudonymizeAndSequester(java.lang.String inputPathName,
java.lang.String outputFolderCleanName,
java.lang.String outputFolderDirtyName,
java.lang.String pseudonymizationControlFileName,
java.lang.String pseudonymizationResultByOriginalPatientIDFileName,
java.lang.String pseudonymizationResultByOriginalStudyInstanceUIDFileName,
java.lang.String failedFilesFileName,
java.lang.String uidMapResultFileName,
java.lang.String seed,
boolean keepAllPrivate,
boolean addContributingEquipmentSequence,
boolean keepDescriptors,
boolean keepSeriesDescriptors,
boolean keepProtocolName,
boolean keepPatientCharacteristics,
boolean keepDeviceIdentity,
boolean keepInstitutionIdentity,
int handleDates,
int handleStructuredContent)
Read DICOM format image files, de-identify and pseudonymize them and sequester any files that may have risk of identity leakage.
|
Modifier and Type | Method and Description |
---|---|
protected static boolean |
containsOverlay(AttributeList list) |
protected java.lang.String |
createNewPseudonymousPatientAndAddToMaps(java.lang.String originalPatientID,
java.lang.String originalStudyInstanceUID)
Create a new PatientID and PatientName and them to the maps.
|
protected static boolean |
isDirty(AttributeList list) |
static void |
main(java.lang.String[] arg)
Read DICOM format image files, de-identify and pseudonymize them and sequester any files that may have risk of identity leakage.
|
protected static java.lang.String |
makeOutputFileName(java.lang.String outputFolderName,
java.lang.String inputFileName,
java.lang.String sopInstanceUID)
Make a suitable file name to use for a deidentified and redacted input file.
|
protected void |
readPseudonymizationControlFile(java.lang.String pseudonymizationControlFileName)
Read a file mapping original PatientID or StudyInstanceUID to new PatientID and PatientName and add them to the maps.
|
protected void |
writePseudonymizationResultByOriginalPatientID(java.io.PrintWriter w) |
protected void |
writePseudonymizationResultByOriginalStudyInstanceUID(java.io.PrintWriter w) |
protected void |
writeUIDMapResult(java.io.PrintWriter uidMapResultWriter) |
protected static java.util.Date defaultEarliestDateInSet
protected java.util.Map<java.lang.String,java.util.Date> earliestDateByOrignalPatientID
protected static java.util.Date epochForDateModification
protected java.util.Map<java.lang.String,java.lang.String> newPatientIDByOriginalPatientID
protected java.util.Map<java.lang.String,java.lang.String> newPatientIDByOriginalStudyInstanceUID
protected java.util.Map<java.lang.String,java.lang.String> newPatientNameByNewPatientID
protected static java.lang.String ourCalledAETitle
protected static int radixForRandomPseudonymousID
protected java.util.Random random
public PseudonymizeAndSequester(java.lang.String inputPathName, java.lang.String outputFolderCleanName, java.lang.String outputFolderDirtyName, java.lang.String pseudonymizationControlFileName, java.lang.String pseudonymizationResultByOriginalPatientIDFileName, java.lang.String pseudonymizationResultByOriginalStudyInstanceUIDFileName, java.lang.String failedFilesFileName, java.lang.String uidMapResultFileName, java.lang.String seed, boolean keepAllPrivate, boolean addContributingEquipmentSequence, boolean keepDescriptors, boolean keepSeriesDescriptors, boolean keepProtocolName, boolean keepPatientCharacteristics, boolean keepDeviceIdentity, boolean keepInstitutionIdentity, int handleDates, int handleStructuredContent) throws DicomException, java.io.FileNotFoundException, java.io.IOException
Read DICOM format image files, de-identify and pseudonymize them and sequester any files that may have risk of identity leakage.
Searches the specified input path recursively for suitable files.
The pseudonymizationControlFileName and pseudonymizationResultFileName files are three columns of tab delimited UTF-8 text, the original PatientID, the new PatientID and the new PatientName.
inputPathName
- the path to search for DICOM filesoutputFolderCleanName
- where to store all the low risk processed output files (must already exist)outputFolderDirtyName
- where to store all the high risk processed output files (must already exist)pseudonymizationControlFileName
- values to use for pseudonymization, may be null or empty in which case random values are usedpseudonymizationResultByOriginalPatientIDFileName
- file into which to store pseudonymization by original PatientID performedpseudonymizationResultByOriginalStudyInstanceUIDFileName
- file into which to store pseudonymization by original StudyInstanceUID performedfailedFilesFileName
- file into which to store the paths of files that failed to processuidMapResultFileName
- file into which to store the map of original to new UIDsseed
- the initial seed to generate random pseudonymous identifiers, long integer as string or null or zero length if none (for deterministic creation of pseudonyms)keepAllPrivate
- retain all private attributes, not just known safe onesaddContributingEquipmentSequence
- whether or not to add ContributingEquipmentSequencekeepDescriptors
- if true, keep the text description and comment attributeskeepSeriesDescriptors
- if true, keep the series description even if all other descriptors are removedkeepProtocolName
- if true, keep protocol name even if all other descriptors are removedkeepPatientCharacteristics
- if true, keep patient characteristics (such as might be needed for PET SUV calculations)keepDeviceIdentity
- if true, keep device identitykeepInstitutionIdentity
- if true, keep institution identityhandleDates
- keep, remove or modify dates and timeshandleStructuredContent
- keep, remove or modify structured contentDicomException
java.io.IOException
java.io.FileNotFoundException
protected static boolean containsOverlay(AttributeList list)
protected java.lang.String createNewPseudonymousPatientAndAddToMaps(java.lang.String originalPatientID, java.lang.String originalStudyInstanceUID)
Create a new PatientID and PatientName and them to the maps.
originalPatientID
- the old PatientIDoriginalStudyInstanceUID
- the old StudyInstanceUIDprotected static boolean isDirty(AttributeList list)
public static void main(java.lang.String[] arg)
Read DICOM format image files, de-identify and pseudonymize them and sequester any files that may have risk of identity leakage.
Searches the specified input path recursively for suitable files The pseudonymizationControlFile and pseudonymizationResultFile are tab delimited with a header row containing either: originalPatientID newPatientID newPatientName or originalStudyInstanceUID newPatientID newPatientNamearg
- seven or eight parameters plus options, the inputPath (file or folder), outputFolderClean, outputFolderDirty, pseudonymizationControlFile, pseudonymizationResultByOriginalPatientIDFile, pseudonymizationResultByOriginalStudyInstanceUIDFile, failedFilesFile, uidMapResultFile, and optionally a random seed for deterministic creation of pseudonyms, then various options controlling de-identificationprotected static java.lang.String makeOutputFileName(java.lang.String outputFolderName, java.lang.String inputFileName, java.lang.String sopInstanceUID) throws java.io.IOException
Make a suitable file name to use for a deidentified and redacted input file.
The default is the UID plus "_Anon.dcm" in the outputFolderName (ignoring the inputFileName).
Override this method in a subclass if a different file name is required.
outputFolderName
- where to store all the processed output filesinputFileName
- the path to search for DICOM filessopInstanceUID
- the SOP Instance UID of the output filejava.io.IOException
- if a filename cannot be constructedprotected void readPseudonymizationControlFile(java.lang.String pseudonymizationControlFileName) throws java.io.IOException
Read a file mapping original PatientID or StudyInstanceUID to new PatientID and PatientName and add them to the maps.
Type of file is detected based on header line of the form: originalPatientID newPatientID newPatientName or originalStudyInstanceUID newPatientID newPatientNamepseudonymizationControlFileName
- the control file, if anyjava.io.IOException
protected void writePseudonymizationResultByOriginalPatientID(java.io.PrintWriter w)
protected void writePseudonymizationResultByOriginalStudyInstanceUID(java.io.PrintWriter w)
protected void writeUIDMapResult(java.io.PrintWriter uidMapResultWriter)