CEDARS Administrator Manual

CEDARS is provided as-is with no guarantee whatsoever and users agree to be held responsible for compliance with their local government/institutional regulations. All CEDARS installations should be reviewed with institutional information security authorities.

Software Installation

  1. Minimum CPU requirements: 32GB Memory and 8 cores [t2.2xlarge on AWS]

  2. In order to run, you will need two .env files

    • The first .env file will be placed under the ROOT DIR
    • The second the .env file under the CEDARS/cedars directory.

    .sample.env files are available - please modify them with your settings and rename it to .env

    bash CEDARS/ │ ├── .env ├── docker-compose.yml ├── cedars/ │ ├── .env │ ├── Dockerfile │ └── ...

    • CEDARS/.env: This file contains environment variables used by Docker Compose.
    • CEDARS/cedars/.env: This file contains environment variables specific to cedars application.
    • docker-compose.yml: The Docker Compose configuration file.
    • cedars/Dockerfile: The Dockerfile for building cedars app.

Detailed Requirements

Local Installation Requirement

WARNING

Local installation is not recommended unless you want to modify the underlying codebase. It is recommended to use the Docker deployment method.

For example:

SECRET_KEY = \xcfR\xd9D\xaa\x06\x84S\x19\xc0\xdcA\t\xf7it
HOST=0.0.0.0 
DB_HOST=localhost  # change to DB_HOST=db if running docker container
DB_NAME=cedars
DB_PORT=27017
MINIO_HOST=localhost
MINIO_PORT=9000
MINIO_ACCESS_KEY=ROOTUSER
MINIO_SECRET_KEY=CHANGEME123
ENV=dev
PINES_API_URL=<>  # if using PINES
RQ_DASHBOARD_URL=/rq # URL for dashboard to interact with redis queues

CEDARS is a flask web application and depends on the following software:

  1. Python 3.9 - 3.11

    You can install Python from the official website.

    If you have multiple python versions installed, you can manage the environments using pyenv

    Windows Setup Specification

    On windows machines for development setups (not using docker) only python 3.9 is supported. If using windows, then installing python via WSL is recommended.

  2. Poetry

    To install poetry, run pipx install poetry or follow the instructions.

  3. Mongo 7.0 or later

    For using Mongo, you have multiple options:

    • You might use your own enterprise Mongo instance
    • You can use a cloud-based service like MongoDB Atlas
    • You can run a local instance of Mongo using Docker
    • You can run a local instance of Mongo using the official installation
  4. Minio

    Similar to Mongo, you have multiple options to install MINIO

    • You might use your own enterprise MINIO instance
    • You can use a cloud-based service like MINIO
    • You can run a local instance of MINIO using Docker
    • You can run a local instance of MINIO using the official installation
  5. Redis

Mac Fork Issue

On MacOS, if you see a issue with fork processes you will need to export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES for running the rq workers

To manage long running processes such as upload, download, spacy labelling, PINES jobs etc.

  • You can install redis locally on your computer
  • Run redis docker image

Docker Requirement

TIP

This is the easiest way to run CEDARS and encapsulates all dependencies above.

TIP

If using docker on windows, it is recommended to install docker via WSL.

Install Docker and Docker Compose.

TIP

Please install docker compose v2 as the spec using deploy which is not compatible with v1.

System Architecture

CEDARS Operational Schema

The CEDARS application runs on a web server and generates an online graphical user interface (GUI) using Flask. All data are stored in a MongoDB instance hosted separately. However, most CEDARS instances are dockerized in order to streamline the project setup process and ensure adequate compatibility of dependencies.

Once the instance is running, electronic health record (EHR) documents are imported and processed through the CEDARS natural language processing (NLP) pipeline. Additional document annotation with a PINES model is optional. A CEDARS annotation project can be set up entirely from the GUI, using the administrator panel. The existing annotations can be downloaded at any point from this interface.

Annotators can connect to the CEDARS app by accessing a web URL provided by the administrator. CEDARS performs the operations to pull selected documents from the database, process them and present them to the annotators. Data entered by users is processed by CEDARS and saved to the database. Multiple users can work on one CEDARS project at the same time. The application will automatically select individual patient records for each user. Record locking is implemented to prevent collisions and inconsistencies.

Installing CEDARS

To install CEDARS, please start by cloning the repository and installing the required dependencies. You can then run the app locally or using Docker.

  • Clone the Repo: git clone git@github.com:CEDARS-NLP/CEDARS.git
  • Change directory: cd CEDARS
  • Initialize submodules: git submodule init
  • Download submodules: git submodule update

git submodule update time out

If you are accessing git over http - take following steps - update .gitmodules in the root dir with url = https://github.com/CEDARS-NLP/PINES.git - Run: git submodule sync - Run: git submodule update

Standalone CEDARS Python Package Installation

Make sure all the local requirements above are met. Then, you can install the package using Poetry:

$ cd cedars
$ poetry install  # do not cd into cedars/app
$ poetry run python -m app.wsgi

Setting Up VS Code Debugger for Flask Application (OPTIONAL)

If you are a developer and wish to use a code debugger while working with CEDARS, then you can follow the steps below to setup a VS Code debugger.

1. Create a python virtual environment (preferably using [pyenv](https://github.com/pyenv/pyenv?tab=readme-ov-file#installation)).

2. Create a profile in launch.json (VS Code) as defined in [this](https://code.visualstudio.com/docs/python/tutorial-flask#_run-the-app-in-the-debugger) article.

3. Set FLASK_APP variable to “app/wsgi.py” in the new launch.json you created.

4. Follow these [instructions](https://code.visualstudio.com/docs/python/environments) to load the python virtual environment you created in step 1. into VS Code.

5. Select you new debugger profile in the debugger tab and run it.

Docker Deployment

The most straightforward way to complete a CEDARS project is via docker containers. This approach allows fast and reliable installation on prems or in the cloud with on-demand access to compute resources, including graphics processing unit (GPU). Inclusion of required dependencies in the containers mitigate the problems associated with version incompatibilities inherent to ad hoc builds. Docker images can be easily installed in air-gapped environment, which is sometimes an institutional requirement. A CEDARS docker deployment will include:

  • CEDARS Flask web server
  • MongoDB database service
  • MINIO object storage service
  • PINES NLP annotation service (optional)

Each component runs as a service encapsulated in a docker container. Those three elements a coordinated within a deployment.The PINES service requires a GPU for model training. This is optional for inference (i.e. annotating documents).

After cloning as described above, create required .env files as mentioned here

After creating .env files, run the following commands:

$ cd CEDARS
# if you do are not using GPU and want all the services to be hosted on docker
$ docker compose --profile cpu --profile selfhosted up --build -d
# if you are using a GPU
$ docker compose --profile gpu --profile selfhosted up --build -d
# if you want to use a native service such as AWS Document DB
$ docker compose --profile gpu  up --build -d  # gpu
$ docker compose --profile cpu  up --build -d  # cpu

Once all services are started - the app will be available here

http://<hostaddress>:80

AWS/Server Deployment

  1. Install docker: Ubuntu
  2. Make sure you have docker compose v2 if you are running docker as sudo - please follow this stackoverflow link to run as a non-sudo

  3. Install compose v2 using this link

For example to use AWS DocumentDB with tls you can create .env (under CEDARS/cedars) file like this

DB_HOST=<your-cluster-ip>.docdb.amazonaws.com
DB_NAME=cedars
DB_PORT=27017
DB_USER=<docdbuser>
DB_PWD=<docDBpassword>
DB_PARAMS="tls=true&tlsCAFile=global-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false"

Project Execution

Overview

Determining clinical event dates with CEDARS is a simple, sequential process:

CEDARS Project Execution

After generation of a CEDARS instance, EHR documents are uploaded, a keyword search query is generated and automatic NLP annotations are launched, following which manual data entry can begin. If known event dates exist, those can be imported before annotator work starts. Once all patient records have been annotated manually for clinical events, the dataset can be downloaded and used immediately in time-to-event analyses. Alternatively, estimated event dates can be obtained without the human review step if a PINES model of satisfactory accuracy was used to classify documents.

The package authors suggest that a random sample of patients be selected for manual review via independent means. If performance metrics are unsatisfactory, the search query can be modified and CEDARS annotations updated through the same process.

Setting Up a CEDARS Project and Users

The first step after running CEDARS is to set up a new project. This is done by the administrator through the GUI. The following steps are required:

1. At first login, the administrator will be prompted to register a new user. This user will be the administrator of the project.

2. The administrator will then fill in Project Details such as Project Name.

3. The administrator can also create new users who will only work on the Annotation Interface.

4. Administrator will provide the credentials to the annotators.

Electronic Health Record Corpus Upload

Keyword Search Query Design

The CEDARS search query incorporates the following wildcards:

"?": for one character, for example "r?d" would match "red" or "rod" but not "reed"

"*": for zero to any number of characters, for example "r*" would match "red", "rod", "reed", "rd", etc.

CEDARS also applies the following Boolean operators:

"AND": both conditions present "OR": either present present "!": negation, for example "!red" would only match sentences without the word "red"

Lastly, the "(" and ")" operators can be used to further develop logic within a query.

Search Query Implementation

Expected query will be a set of keywords separated by OR keyword.

Each expression separated by OR can have expressions combined by AND or NOT and the keywords can also contain wildcards.

Spacy Requirements:
  • ! - negation
  • Each dictionary in a list matches one token only
  • A list matches all the dictionaries inside it (and condition)
  • A list of list contains OR conditions
  • [{"TEXT": {"REGEX": "abc*"}}] represents one token with regex match
  • [{"LOWER": "dvt"}] matches case-insenstitive DVT
  • [{"LEMMA": "embolus"}] matches the lemmatized version of embolus as well in text
Implementation:
  1. Split the query by OR
  2. Split each expression by AND
  3. Split each expression by NOT
  4. Split each expression by wildcard
  5. Convert each expression to a spacy pattern
  6. Combine the patterns
  7. Return the combined pattern
Source code in cedars/app/nlpprocessor.py
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
def query_to_patterns(query: str) -> list:
    """
    Expected query will be a set of keywords separated
    by OR keyword.

    Each expression separated by OR can have expressions
    combined by AND or NOT and the keywords can also contain
    wildcards.

    ##### Spacy Requirements:
    - ! - negation
    - Each dictionary in a list matches one token only
    - A list matches all the dictionaries inside it (and condition)
    - A list of list contains OR conditions
      - [{"TEXT": {"REGEX": "abc*"}}] represents one token with regex match
      - [{"LOWER": "dvt"}] matches case-insenstitive  DVT
      - [{"LEMMA": "embolus"}] matches the lemmatized version of embolus as well in text

    ##### Implementation:
      1. Split the query by OR
      2. Split each expression by AND
      3. Split each expression by NOT
      4. Split each expression by wildcard
      5. Convert each expression to a spacy pattern
      6. Combine the patterns
      7. Return the combined pattern
    """

    or_expressions = query.split(" OR ")
    res = [[] for _ in range(len(or_expressions))]
    for i, expression in enumerate(or_expressions):
        spacy_pattern = []
        expression = expression.strip().replace("(", "").replace(")", "")
        and_expressions = expression.split(" AND ")
        for tok in and_expressions:
            tok = tok.strip()
            if not tok:
                continue
            if "*" in tok or "?" in tok:
                spacy_pattern.append(get_regex_dict(tok))
            elif "!" in tok:
                spacy_pattern.append(get_negated_dict(tok.replace("!", "")))
            else:
                spacy_pattern.append(get_lemma_dict(tok))
        logger.debug(f"{expression} -> {spacy_pattern}")
        res[i] = spacy_pattern
    return res

Natural Language Processing Annotations

The process of automatically parsing clinical documents before presentation to an annotator is performed in three steps:

1. NLP annotation via the SpaCy traditional NLP pipeline: In this step, sentence boundaries, lemmas and negation status are characterized.

Negation Detection

This function takes a spacy token and determines if it has been negated in the sentence.

Ex.
This is not an apple.
In the above sentence, the token apple is negated.
Parameters:
  • spacy (token) –

    This is a token of a single word after spacy

Returns:
  • (bool) : True if the token is negated in the sentence.

Source code in cedars/app/nlpprocessor.py
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
def is_negated(span):
    """
    ##### Negation Detection

    This function takes a spacy token and determines if it has been negated in the sentence.
    ```
    Ex.
    This is not an apple.
    In the above sentence, the token apple is negated.
    ```

    Args:
        spacy token : This is a token of a single word after spacy
        runs a model on some text.

    Returns:
        (bool) : True if the token is negated in the sentence.
    """
    neg_words = ['no', 'not', "n't", "wouldn't", 'never', 'nobody', 'nothing',
                 'neither', 'nowhere', 'noone', 'no-one', 'hardly', 'scarcely', 'barely']

    for token in span.subtree:
        parents = list(token.ancestors)
        children = list(token.children)

        for parent in token.ancestors:
            children.extend(list(parent.children))

        if ("neg" in [child.dep_ for child in children]) or ("neg" in [par.dep_ for par in parents]):
            return True

        parents_text = [par.text for par in parents]
        children_text = [child.text for child in children]

        for word in neg_words:
            if word in parents_text or word in children_text:
                return True

    return False

2. Keyword query matching: only documents with at least one sentence matching the search query are retained. Sentences from documents without a matched will be marked as reviewed. Patients with no remaining sentences/documents will be considered not to have sustained the event of interest and will not be reviewed manually.

Process Query Matching

This function takes a medical note and a regex query as input and annotates the relevant sections of the text.

Source code in cedars/app/nlpprocessor.py
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
def process_notes(self, patient_id: str, processes=1, batch_size=20):
    """
    ##### Process Query Matching

    This function takes a medical note and a regex query as input and annotates
    the relevant sections of the text.
    """
    # nlp_model = spacy.load(model_name)
    assert len(self.matcher) == 0
    query = db.get_search_query()

    # load previosly processed documents
    # document_processed = load_progress()
    spacy_patterns = query_to_patterns(query)
    for i, item in enumerate(spacy_patterns):
        self.matcher.add(f"DVT_{i}", [item])

    # check all documents already processed
    documents_to_process = []
    if patient_id is not None:
        # get all note for patient which are not reviewed
        documents_to_process = db.get_documents_to_annotate(patient_id)
    else:
        # get all notes which are not in annotation collection.
        documents_to_process = db.get_documents_to_annotate()

    document_list = [document for document in documents_to_process]
    if len(document_list) == 0:
        # no notes found to annotate
        logger.info(f"No documents to process for patient {patient_id}")
        if db.get_search_query("tag_query")["nlp_apply"] is True:
            self.process_patient_pines(patient_id)
        return

    document_text = [document["text"] for document in document_list]
    if patient_id is not None:
        logger.info(f"Found {len(document_list)}/{db.get_total_counts('NOTES', patient_id=patient_id)} to process")
    else:
        logger.info(f"Found {len(document_list)}/{db.get_total_counts('NOTES')} documents to process")

    # logger.info(f"sample document: {document_text[0][:100]}")
    annotations = self.nlp_model.pipe([document["text"].lower() for document in document_list],
                                      n_process=processes,
                                      batch_size=batch_size)
    logger.info(f"Starting to process document annotations: {len(document_text)}")
    count = 0
    docs_with_annotations = 0
    for document, doc in zip(document_list, annotations):
        match_count = 0
        sentence_start = 0
        sentence_end = 0
        for sent_no, sentence_annotation in enumerate(doc.sents):
            sentence_text = sentence_annotation.text.strip()
            sentence_end = sentence_start + len(sentence_text)
            matches = self.matcher(sentence_annotation)
            for match in matches:
                _, start, end = match
                token = sentence_annotation[start:end]
                has_negation = is_negated(token)
                token_start = token.start_char
                token_end = token_start + len(token.text)
                annotation = {
                                "sentence": sentence_text,
                                "token": token.text,
                                "isNegated": has_negation,
                                "note_start_index": token_start,
                                "note_end_index": token_end,
                                "sentence_number": sent_no,
                                "sentence_start" : sentence_start,
                                "sentence_end" : sentence_end
                                }
                annotation['note_id'] = document["text_id"]
                annotation["text_date"] = document["text_date"]
                annotation["patient_id"] = document["patient_id"]
                annotation["reviewed"] = False
                db.insert_one_annotation(annotation)
                if not has_negation:
                    if match_count == 0:
                        docs_with_annotations += 1
                    match_count += 1

            sentence_start = sentence_end + 1

        if match_count == 0:
            db.mark_note_reviewed(document["text_id"], reviewed_by="CEDARS")
        count += 1
        if (count) % 10 == 0:
            logger.info(f"Processed {count} / {len(document_list)} documents")

    # Mark the patient as reviewed if no annotations are found.
    if docs_with_annotations == 0:
        db.mark_patient_reviewed(patient_id, "CEDARS")

    # check if nlp processing is enabled
    if docs_with_annotations > 0 and db.get_search_query("tag_query")["nlp_apply"] is True:
        logger.info(f"Processing {docs_with_annotations} documents with PINES")
        self.process_patient_pines(patient_id)

3. Transformer model labelling (optional): individual documents are labelled for their probability (p) of occurring at or after a clinical event. This last step is facultative and offers the possibility of further narrowing the scope of material to be reviewed manually, further improving efficiency. Documents with a p inferior to the predetermined threshold and their associated sentences are marked as reviewed. Patients with no remaining sentences/documents will be considered not to have sustained the event of interest and will not be reviewed manually.

PINES predictions

Get prediction from endpoint. Text goes in the POST request.

Source code in cedars/app/db.py
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
def get_prediction(note: str) -> float:
    """
    ##### PINES predictions

    Get prediction from endpoint. Text goes in the POST request.
    """

    pines_api_url = get_pines_url()

    url = f'{pines_api_url}/predict'
    data = {'text': note}
    log_notes = None
    try:
        response = requests.post(url, json=data, timeout=3600)
        response.raise_for_status()
        res = response.json()["prediction"]
        score = res.get("score")
        label = res.get("label")
        if isinstance(label, str):
            score = 1 - score if "0" in label else score
        else:
            score = 1 - score if label == 0 else score
        log_notes = re.sub(r'\d', '*', note[:20])
        logger.debug(f"Got prediction for note: {log_notes} with score: {score} and label: {label}")
        return score
    except requests.exceptions.RequestException as e:
        logger.error(f"Failed to get prediction for note: {log_notes}")
        raise e

Save PINES predictions

Predict and save the predictions for the given text_ids.

Source code in cedars/app/db.py
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
def predict_and_save(text_ids: Optional[list[str]] = None,
                     note_collection_name: str = "NOTES",
                     pines_collection_name: str = "PINES",
                     force_update: bool = False) -> None:
    """
    ##### Save PINES predictions

    Predict and save the predictions for the given text_ids.
    """
    notes_collection = mongo.db[note_collection_name]
    pines_collection = mongo.db[pines_collection_name]
    query = {}
    if text_ids is not None:
        query = {"text_id": {"$in": text_ids}}

    cedars_notes = notes_collection.find(query)
    count = 0
    for note in cedars_notes:
        note_id = note.get("text_id")
        if force_update or get_note_prediction_from_db(note_id, pines_collection_name) is None:
            logger.info(f"Predicting for note: {note_id}")
            prediction = get_prediction(note.get("text"))
            pines_collection.insert_one({
                "text_id": note_id,
                "text": note.get("text"),
                "text_date" : note.get("text_date"),
                "patient_id": note.get("patient_id"),
                "predicted_score": prediction,
                "report_type": note.get("text_tag_3"),
                "document_type": note.get("text_tag_1")
                })
        count += 1

Event Pre-Loading

Sometimes a cohort of patients will already have been assessed with other methods and CEDARS is used as a redundant method to pick up any previously missed events. In this use case, a list of known clinical events with their dates will exist. This information can be loaded on CEDARS as a "starting point", so as to avoid re-discovering already documented events.

Manual Assessment for Clinical Events

The process by which human abstractors annotate patient records for events is described in the End User Manual. This step can be skipped altogether if a PINES model was used to classify documents. An estimated event date will be generated by PINES. Transformer models often exhibit sufficient performance to be used without individual record review, but an audit step as detailed below is strongly advised to confirm satisfactory sensitivity, specifcity and event time estimation.

Error Handling and Queues

All the jobs are processed at a patient level. For each patient, a job is submitted to a rq. If a job fails, it is retried 3 times before moving it a failed queue.

Queue Operations

  • docker ps - to see list of all docker contains
  • docker exec -it <any-worker-docker-container-id> bash
  • export REDIS_HOST=redis - the service name in nginx/nginx.conf
  • rq info (status)
  • rq requeue --queue cedars -a (requeue all failed jobs)

Launch a task and add it to Mongo if it doesn't already exist.

TODO: insert only one

Source code in cedars/app/db.py
1610
1611
1612
1613
1614
1615
1616
def add_task(task):
    """
    Launch a task and add it to Mongo if it doesn't already exist.
    # TODO: insert only one
    """
    task_db = mongo.db["TASK"]
    task_db.insert_one(task)

Dataset Download

Once there are no patient records left to review, event data can be downloaded from the database via the GUI Detailed information is provided including clinical event dates, individual annotator contribution and review times. If a PINES model was used but no manual annotations were applied, estimated event dates can be used in a time-to-event analysis instead of manual entry.

Download Completed Annotations

This generates a CSV file with the following specifications: 1. Find all patients in the PATIENTS database, these patients become a single row in the CSV file. 2. For each patient - a. list the number of total notes in the database b. list the number of reviewed notes c. list the number of total sentences from annotations d. list the number of reviewed sentences e. list all sentences as a list of strings f. add event date from the annotations for each patient g. add the first and last note date for each patient 3. Convert all columns to proper datatypes

Source code in cedars/app/ops.py
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
@bp.route('/download_annotations', methods=["POST"])
@auth.admin_required
def download_file(filename='annotations.csv'):
    """
    ##### Download Completed Annotations

    This generates a CSV file with the following specifications:
    1. Find all patients in the PATIENTS database,
            these patients become a single row in the CSV file.
    2. For each patient -
        a. list the number of total notes in the database
        b. list the number of reviewed notes
        c. list the number of total sentences from annotations
        d. list the number of reviewed sentences
        e. list all sentences as a list of strings
        f. add event date from the annotations for each patient
        g. add the first and last note date for each patient
    3. Convert all columns to proper datatypes
    """
    logger.info("Downloading annotations")
    filename = request.form.get("filename")
    file = minio.get_object(g.bucket_name, f"annotated_files/{filename}")
    logger.info(f"Downloaded annotations from s3: {filename}")

    return flask.Response(
        file.stream(32*1024),
        mimetype='text/csv',
        headers={"Content-Disposition": f"attachment;filename=cedars_{filename}"}
    )

Audit

CEDARS is by definition semi-automated, and depending on the specific use case and search query some events might be missed. This problem should be quantified by means of a systematic, old-fashion review of randomly selected patients. Typically, at least 200 patients would be selected and their corpora reviewed manually for events. Alternatively, a different method (e.g. billing codes) could be used. This audit dataset should be overlapped with the CEDARS event table to estimate sensitivity of the search query in the cohort at large. If this parameter falls below the previously established minimum acceptable value, the search query scope should be broadened, followed by a database reset, uploading of previously identified events and a new human annotation pass, followed by a repeat audit.

Project Termination

Once all events have been tallied and the audit results are satisfactory, if desired the CEDARS project database can be deleted from the MongoDB database. This is an irreversible operation.

In future, there will be way to archive CEDARS projects, but this feature is not yet available.

Issues

  • Unable to install thinc - downgrade python version < 3.12
Terminate the Project

Reset the database to the initial state.

Source code in cedars/app/db.py
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
def terminate_project():
    """
    ##### Terminate the Project

    Reset the database to the initial state.
    """
    logger.info("Terminating project.")
    # Delete all mongo DB collections
    mongo.db.drop_collection("ANNOTATIONS")
    mongo.db.drop_collection("NOTES")
    mongo.db.drop_collection("PATIENTS")
    mongo.db.drop_collection("USERS")
    mongo.db.drop_collection("QUERY")
    mongo.db.drop_collection("PINES")
    mongo.db.drop_collection("TASK")
    mongo.db.drop_collection("RESULTS")

    project_id = os.getenv("PROJECT_ID", None)

    create_project(project_name=fake.slug(),
                   investigator_name=fake.name(),
                   project_id = project_id)