An impending reality - the Patient Contributed Image Repository
Summary: A test site for the PCIR is now up and running, allowing contribution and downloading; feedback is sought on the idea, the contribution process, the contribution agreement, and the site design itself. Go to "http://www.pcir.org/".
Long Version.
As you may recall from an earlier blog entry, I have been exploring the feasibility of a repository to which members of the general public could contribute their own digital medical images. Rather than wait for some grand scheme involving multiple protagonists and sources of funding to come together, I thought that it might be easier just to "build it" in the hope that "they will come". The "they", in this case, being patients willing to contribute their data.
By leveraging some very simple tools, existing relatively cheap web and data hosting services and my own time and funds, this turned out to be relatively straightforward, at least for the initial pilot.
If you wish to take a look at the test site that I have created, go to "http://www.pcir.org/". Send any comments you have back to me at "mailto:dclunie@dclunie.com".
The principles behind this site are straightforward, if perhaps somewhat naive:
Approach.
The basic approach is that:
The definition of "public domain" is somewhat nebulous in this context; it is well-defined with respect to "creative works" like art and music and literature (and even computer software), but it is unlikely that medical images are creative works. The term is also used in the context of patents and land. Perhaps there can be no formal definition of "public domain" with respect to medical images, or medical records in general, until the term is used in a legislative or common law context to apply to such things, or until it is declared that they should be treated as if they were creative works and subject to copyright (not that I am advocating the latter). Regardless, for the PCIR's purposes, the analogy to creative works may suffice to convey the intent of both the contributor and the PCIR in this regard.
Do patients' even have the right to contribute their own images ? For that matter who actually owns them ? Certainly in the US, the HIPAA Privacy Rule has clarified that patients have a right to a copy of their medical record, regardless of who owns the "original". This seems to be a general principle that spans international boundaries, including in Europe, where the Privacy Directive specifically addresses access rights in general, not just to medical records. We assume that medical images are to all practical intents and purposes also medical records; though some medical records departments in hospitals may deny this, that seems to be because they do not store them (nor have a responsibility to), the radiology department does. The PCIR agreement in its current draft proposes that contributors do have such a right and are agreeing that they are not constrained in any manner from exercising it. This seems to be a reasonable strategy until somebody argues otherwise.
Implementation.
You may also be interested in some of the details of how this test site currently works.
To make tractable maintenance of the informative web pages, I use Apache Forrest. This tool, as discussed in a previous blog entry, allows one to construct the source form of the text and organization and external links in a simple XML format, and then to "build" the site using an appropriate "skin" to generate the look and feel. I can't really say enough good things about Forrest. I dare say many folks have commercial web site design tools that are more sophisticated and produce a more visually appealing result, but for the humble novice like myself who is more comfortable at the command line with a plain text editor, Forrest gets the job done.
The uploading tool when the patient decides to make a contribution is a Java Applet. This approach was chosen because a platform-neutral approach is a basic requirement; I dare say something Microsoft Windows specific would cover the majority of potential contributors, but I do not want to exclude anyone if possible. Using Java applets requires that the contributor's browser be both capable of and enabled to allow these to work. The invocation of the applet is through an HTML page that will prompt the user's browser, or the user themselves if necessary, to install a sufficiently recent version of Java to work. The lowest level of JRE that will work is SE5 (1.5), due to the need for support of various encryption features used by the applet.
The applet also requires access to resources on the local machine, both in order to read files and CDs to be uploaded, as well as to be able to transfer these over the network to the PCIR. This requires a signed applet, and for the user to agree to "trust" the applet. It seems to have become relatively commonplace nowadays for users to routinely click on "yes I trust you", pretty much regardless of the source of the applet.
It is possible to use a "self-signed certificate" to sign a Java applet and allow it to work, as long as the user does not mind seeing a message that the "digital signature has not been verified"; if one goes to the trouble of obtaining a legitimate code signing certificate from a certifying authority that is installed in the JRE, then the user instead sees a message that the "digital signature has been verified". The difference, frankly, is a little subtle. For the purposes of the test site, the applet used has been signed by PixelMed's verifiable certificate from Comodo.
[As an aside, getting such a code signing certificate is actually reasonably cheap and not too difficult; being inherently cheap, I searched long and hard using Google to find the lowest cost provider. Verisign charges a fortune for these; Thawte is not much cheaper. Comodo themselves have a relatively high price on their own site, but their reselling partners are generally much cheaper. I ended up using KSoftware; their price is right ($USD 85 for one year), and though their web site sucks, and causes all sorts of browser error messages, and will not accept credit card numbers until you give up and use their PayPal payment method, eventually you get to the Comodo site and things go smoothly from there. Since PixelMed is a legitimate business entity already, I had no trouble providing the appropriate credentials (in this case a bank statement by fax) and got the certificate almost immediately. I was also worried about getting just a Microsoft Authenticode certificate, which is all Comodo offer, since I had read all sorts of early posts about how to convert the various certificate forms from one to another and into something that the Java jarsigner can use. I need not have worried; since I was using Firefox (on a Mac as it happens), when I picked up my certificate it got automatically saved in the browser's collection of certificates. All I needed to do was then "export" it (to a PKCS12 file, as it happens), specifying a password for that exported file that I would need every time I signed with it; it worked fine with jarsigner, by specifying the exported file as the "-keystore" command line option, and using the "-storetype pkcs12" option, though I am not sure if that is strictly necessary). The CAcert Wiki was somewhat helpful in figuring out some of this.]
How does the applet manifest itself to the user ? Well, when the user navigates to the page, it checks to see if they have agreed to the contribution agreement, if not it asks them to do so, then displays the applet in the page. The user can:
You can try this out yourself by going to the PCIR upload page, and uploading your own images; please be sure that you really do agree to contribute these to the public domain if you do so.
What is happening behind the scenes is that:
Once uploaded, the files enter a manually supervised de-identification process and all images and documents are both mechanically and visually checked for leakage of identifiable information, which is then removed. This includes:
On the download side, since this is only a test site, there is relatively little present; just a few examples. The primary requirement here is to make bulk downloading easy; no unwieldy "shopping cart" interfaces here. The de-identified exams are packaged up as a single set into bzip'd tar files. The reasoning for this is explained in the FAQ, but in short is to make the most efficient use of the bandwidth and storage available in lieu of their being any need to "browse" or "visualize" individual images from the PCIR website itself; i.e., you need to download and unpackage the set to use them.
Entering Reasons and Other Conditions
One of the core issues with having patients contribute their own images, is that those images would be more useful in context than alone. The PCIR site tries to encourage the contributor to also scan their radiology and pathology reports, but frankly, this may be too burdensome for many of them. With luck, some uploaded CDs may contain at least the radiology reports. A modest amount of information may occasionally be present in the DICOM image headers. As a fall back position, better than no information at all might be something that the patient themselves was willing to enter.
Accordingly, I put some effort into constructing a set of menus from which the patient could chose from a list of categories. After looking at a bunch of different available coding schemes, including SNOMED, ICD-9CM and ICD10CM, I finally settled on the Medical Subject Headings (MeSH) used by the NLM to index articles in medical journals. MeSH seemed to offer a comprehensive range of terms without being too detailed, is not encumbered by expensive or nationally-specific licensing restrictions, can be downloaded in an easily processable XML form, and most importantly, was already organized into hierarchies that translated well into menus. Some massaging was required for the lay person (e.g., to turn words like "neoplasm" into "cancer"), and there were a few missing critical categories (such as for healthy screening exams).
Let me know what you think of the result, which you can test by going to the PCIR upload page and clicking "Enter Reason".
David
Long Version.
As you may recall from an earlier blog entry, I have been exploring the feasibility of a repository to which members of the general public could contribute their own digital medical images. Rather than wait for some grand scheme involving multiple protagonists and sources of funding to come together, I thought that it might be easier just to "build it" in the hope that "they will come". The "they", in this case, being patients willing to contribute their data.
By leveraging some very simple tools, existing relatively cheap web and data hosting services and my own time and funds, this turned out to be relatively straightforward, at least for the initial pilot.
If you wish to take a look at the test site that I have created, go to "http://www.pcir.org/". Send any comments you have back to me at "mailto:dclunie@dclunie.com".
The principles behind this site are straightforward, if perhaps somewhat naive:
- that patients have an interest in promoting the common good;
- that they can be convinced that contributing their own images to the public domain is for the common good;
- that if only a modest level of effort is required they would be willing to do so;
- that patients have sufficient basic computer skills, equipment and fast enough connections to do so;
- that patients will be satisfied that their privacy will be protected;
- that providing unrestricted downloading will disseminate the images to the most users;
- most important of all, that the images will actually be useful.
- there is sufficient interest in the concept to justify proceeding
- the ease of use is within range of the target audience of non-medical, non-IT patient contributors
- the level of effort to obtain images +/- accompanying information is feasible
- the type of images and information collected will be of sufficient use
- the concept of anonymous contribution to the public domain stands up to legal and ethical scrutiny
- form the non-profit corporation to manage the effort,
- have the "contribution agreement" tidied up by the lawyers,
- start soliciting contributions of images and funds from the general public, and
- engage the advocacy organizations in promoting and supporting the effort.
Approach.
The basic approach is that:
- patients agree to contribute their own images and documents TO THE PUBLIC DOMAIN
- the PCIR receives and de-identifies their images and documents
- the PCIR distributes the de-identified set to ANYONE WITHOUT RESTRICTION
The definition of "public domain" is somewhat nebulous in this context; it is well-defined with respect to "creative works" like art and music and literature (and even computer software), but it is unlikely that medical images are creative works. The term is also used in the context of patents and land. Perhaps there can be no formal definition of "public domain" with respect to medical images, or medical records in general, until the term is used in a legislative or common law context to apply to such things, or until it is declared that they should be treated as if they were creative works and subject to copyright (not that I am advocating the latter). Regardless, for the PCIR's purposes, the analogy to creative works may suffice to convey the intent of both the contributor and the PCIR in this regard.
Do patients' even have the right to contribute their own images ? For that matter who actually owns them ? Certainly in the US, the HIPAA Privacy Rule has clarified that patients have a right to a copy of their medical record, regardless of who owns the "original". This seems to be a general principle that spans international boundaries, including in Europe, where the Privacy Directive specifically addresses access rights in general, not just to medical records. We assume that medical images are to all practical intents and purposes also medical records; though some medical records departments in hospitals may deny this, that seems to be because they do not store them (nor have a responsibility to), the radiology department does. The PCIR agreement in its current draft proposes that contributors do have such a right and are agreeing that they are not constrained in any manner from exercising it. This seems to be a reasonable strategy until somebody argues otherwise.
Implementation.
You may also be interested in some of the details of how this test site currently works.
To make tractable maintenance of the informative web pages, I use Apache Forrest. This tool, as discussed in a previous blog entry, allows one to construct the source form of the text and organization and external links in a simple XML format, and then to "build" the site using an appropriate "skin" to generate the look and feel. I can't really say enough good things about Forrest. I dare say many folks have commercial web site design tools that are more sophisticated and produce a more visually appealing result, but for the humble novice like myself who is more comfortable at the command line with a plain text editor, Forrest gets the job done.
The uploading tool when the patient decides to make a contribution is a Java Applet. This approach was chosen because a platform-neutral approach is a basic requirement; I dare say something Microsoft Windows specific would cover the majority of potential contributors, but I do not want to exclude anyone if possible. Using Java applets requires that the contributor's browser be both capable of and enabled to allow these to work. The invocation of the applet is through an HTML page that will prompt the user's browser, or the user themselves if necessary, to install a sufficiently recent version of Java to work. The lowest level of JRE that will work is SE5 (1.5), due to the need for support of various encryption features used by the applet.
The applet also requires access to resources on the local machine, both in order to read files and CDs to be uploaded, as well as to be able to transfer these over the network to the PCIR. This requires a signed applet, and for the user to agree to "trust" the applet. It seems to have become relatively commonplace nowadays for users to routinely click on "yes I trust you", pretty much regardless of the source of the applet.
It is possible to use a "self-signed certificate" to sign a Java applet and allow it to work, as long as the user does not mind seeing a message that the "digital signature has not been verified"; if one goes to the trouble of obtaining a legitimate code signing certificate from a certifying authority that is installed in the JRE, then the user instead sees a message that the "digital signature has been verified". The difference, frankly, is a little subtle. For the purposes of the test site, the applet used has been signed by PixelMed's verifiable certificate from Comodo.
[As an aside, getting such a code signing certificate is actually reasonably cheap and not too difficult; being inherently cheap, I searched long and hard using Google to find the lowest cost provider. Verisign charges a fortune for these; Thawte is not much cheaper. Comodo themselves have a relatively high price on their own site, but their reselling partners are generally much cheaper. I ended up using KSoftware; their price is right ($USD 85 for one year), and though their web site sucks, and causes all sorts of browser error messages, and will not accept credit card numbers until you give up and use their PayPal payment method, eventually you get to the Comodo site and things go smoothly from there. Since PixelMed is a legitimate business entity already, I had no trouble providing the appropriate credentials (in this case a bank statement by fax) and got the certificate almost immediately. I was also worried about getting just a Microsoft Authenticode certificate, which is all Comodo offer, since I had read all sorts of early posts about how to convert the various certificate forms from one to another and into something that the Java jarsigner can use. I need not have worried; since I was using Firefox (on a Mac as it happens), when I picked up my certificate it got automatically saved in the browser's collection of certificates. All I needed to do was then "export" it (to a PKCS12 file, as it happens), specifying a password for that exported file that I would need every time I signed with it; it worked fine with jarsigner, by specifying the exported file as the "-keystore" command line option, and using the "-storetype pkcs12" option, though I am not sure if that is strictly necessary). The CAcert Wiki was somewhat helpful in figuring out some of this.]
How does the applet manifest itself to the user ? Well, when the user navigates to the page, it checks to see if they have agreed to the contribution agreement, if not it asks them to do so, then displays the applet in the page. The user can:
- specify a reason for the exam that they are going to upload,
- upload an entire CD (e.g., of DICOM images), or
- upload selected image files (e.g., of scanned documents like reports)
You can try this out yourself by going to the PCIR upload page, and uploading your own images; please be sure that you really do agree to contribute these to the public domain if you do so.
What is happening behind the scenes is that:
- on starting the applet, any existing session information (stored in local preferences) is checked, so as not to keep asking the user to re-agree
- if it is a new session, then the agreement itself is downloaded from the web site (in order to keep the web site version and the applet displayed version in concordance) and rendered in a dialog box to the user; they must agree to in order to proceed
- when "enter reason" is clicked, a pop-up dialog is opened with buttons that have automatically generated tear off menus attached to them - these menus allow the user to choose from a pre-defined hierarchy (by category or by alphabetical nesting) of reasons for imaging exams (more about this later)
- when upload disk or files is selected, a file chooser dialog appears; the reason for the separate buttons are two-fold; firstly, that the default directory is different (e.g., to the "My Computer" directory on Windows for CDs, or the "My Documents" folder on Windows for files); secondly, there is a well-known Java bug related to not being able to select entire drives under Windows
- once the user has selected a CD or a set of files, these are packaged into a zip file, compressed whilst doing so, and encrypted using an AES symmetric cipher
- the packaged, compressed and encrypted files are then transferred to the PCIR server, together with an RSA encrypted copy of the symmetric key encrypted with the current PCIR uploading public key, as well as an encrypted copy of the contribution agreement; the received files are not accessible for downloading
Once uploaded, the files enter a manually supervised de-identification process and all images and documents are both mechanically and visually checked for leakage of identifiable information, which is then removed. This includes:
- editing of the pixel data to remove burned in identification
- removal of all text strings that contain identifying information
- checking and removal of either all private attributes, or those that are unsafe
On the download side, since this is only a test site, there is relatively little present; just a few examples. The primary requirement here is to make bulk downloading easy; no unwieldy "shopping cart" interfaces here. The de-identified exams are packaged up as a single set into bzip'd tar files. The reasoning for this is explained in the FAQ, but in short is to make the most efficient use of the bandwidth and storage available in lieu of their being any need to "browse" or "visualize" individual images from the PCIR website itself; i.e., you need to download and unpackage the set to use them.
Entering Reasons and Other Conditions
One of the core issues with having patients contribute their own images, is that those images would be more useful in context than alone. The PCIR site tries to encourage the contributor to also scan their radiology and pathology reports, but frankly, this may be too burdensome for many of them. With luck, some uploaded CDs may contain at least the radiology reports. A modest amount of information may occasionally be present in the DICOM image headers. As a fall back position, better than no information at all might be something that the patient themselves was willing to enter.
Accordingly, I put some effort into constructing a set of menus from which the patient could chose from a list of categories. After looking at a bunch of different available coding schemes, including SNOMED, ICD-9CM and ICD10CM, I finally settled on the Medical Subject Headings (MeSH) used by the NLM to index articles in medical journals. MeSH seemed to offer a comprehensive range of terms without being too detailed, is not encumbered by expensive or nationally-specific licensing restrictions, can be downloaded in an easily processable XML form, and most importantly, was already organized into hierarchies that translated well into menus. Some massaging was required for the lay person (e.g., to turn words like "neoplasm" into "cancer"), and there were a few missing critical categories (such as for healthy screening exams).
Let me know what you think of the result, which you can test by going to the PCIR upload page and clicking "Enter Reason".
David
5 Comments:
Great start David. I'm currently trying to drum up some images to post.
Hi David,
Nice idea. Couple of feature suggestions. First some clarification on "Reason". I interpreted this to be the clinical indications which prompted the exam to be performed rather than a summary report after the fact. I went to upload images taken after a skiing accident. The reasons for the exam were a) Trauma and b) Pain. All the terms provided from the NLM index seemed to be diagnostic rather than indicative in nature, certainly I couldn't find anything to fit my category. Second, while you may get a fair number of uploads from "imaging groupies", I suspect the general public would be more more inclined to upload their own images, if they got some visibility in how those images were subsequently used. It is one thing to throw your images into a black hole knowing that somebody might use them but not having any visibility to how or when. It is another thing to know "My images were used in a study looking at [name your interesting subject]". While providing such tracking is a set of functionality would substantially increase the complexity, I think you're likely to get a much greater response from the general public if they feel a sense of participation in specific research.
Hi Eric
The reason menu is a bit convoluted; there is actually a "pain" entry, which can be found alphabetically, or by category under "Pathological Condition, Sign or Symptom", then "Sign or Symptom". Trauma is classified as "injury", and in this case, specifically "leg injury"; by category this can be found under "Disorder of Environmental Origin" (!), then "Wound or Injury", etc. Obviously the top level category needs to be made more user friendly!
Also, I noticed that your upload did not run to completion; did you cancel it deliberately, or was there an error, or did you get the impression that it did work ?
As to your second point, yes, I had planned to include some information about projects that actually use the data - e.g., article references and web links to projects that credit the PCIR as the source of their material (even though such credit is not a pre-requisite for downloading); but I had not planned to go so far as to link it to specific contributions; indeed, quite the converse - the nature of the contribution process is intended to be as anonymous as possible - once a contribution has been made and the set de-identified, there will be no record kept of where it came from or who contributed it, to minimize risks of privacy breaches.
In particular, there is no way of contacting the contributor (e.g., I cannot contact you directly to tell you that your upload failed).
Thanks for testing the site ... David
Hi David, The upload was left running for about 15 or 20 minutes. It only appeared to make progress during the first few seconds, then progressed only by a few 10s of bytes per minute. Estimate was sitting at 4 hours for the upload, and growing. So I canceled. I was uploading a whole disc, which was IHE portable media compliant (efilm generated). After your de-id and compression steps ran, it was about 80 MB to upload
Hi Eric
Thought it might be something like that. The instructions do suggest leaving the upload to run overnight!
Seriously though, we were having bandwidth problems on the upload server yesterday that seem to be resolved now; feel free to try again, since it should not take that long.
David
Post a Comment
Links to this post:
Create a Link
<< Home