-
Notifications
You must be signed in to change notification settings - Fork 119
Create LP-Cumulus-Access-Constraints-Procedure (#3984) #3993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
etcart
wants to merge
2
commits into
nasa:ecarton/mdgoetz-lp-ac-procedure
Choose a base branch
from
mdgoetz:patch-1
base: ecarton/mdgoetz-lp-ac-procedure
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
276 changes: 276 additions & 0 deletions
276
docs/data-cookbooks/LP-Cumulus-Access-Constraints-Procedure
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,276 @@ | ||
| --- | ||
| id: LP-Cumulus-Access-Constraints-Procedure | ||
| title: LPDAAC Cumulus Access Constraints Procedure | ||
| hide_title: false | ||
| --- | ||
|
|
||
| ## Purpose: | ||
|
|
||
| The purpose of this SOP is to provide instruction on setting Access Constraints (formerly known as Restriction Flags). | ||
|
|
||
| ## Scope: | ||
|
|
||
| The scope of this SOP includes the System Engineer, System Operator, and Data Manager skillsets on the Cumulus project who will use the UpdateCmrAccessConstraints workflow. This SOP describes the steps needed to update multiple granules from the dashboard, the process is applicable in SIT, UAT, and PROD. This SOP does not include troubleshooting steps if the process fails. | ||
|
|
||
| ## Procedure: | ||
|
|
||
| Use the following steps for the Cumulus Dashboard UpdateCmrAccessConstraints functionality: | ||
|
|
||
| 1. Login to the Cumulus Dashboard in SIT, UAT, or PROD. | ||
| a. Cumulus instances are located at https://wiki.earthdata.nasa.gov/display/LPCUMULUS/Cumulus+Instances. | ||
| b. Users will need to connect to the NASA VPN and provide their Launchpad username and password. | ||
| c. Go to the “Granules” page by selecting “Granules” | ||
|
|
||
|
|
||
| Option 1. You have a few granules on the dashboard to update | ||
|
|
||
| 1. Select the collection of the granules to be updated. | ||
| 2. Select the granules to be updated. Users can select all granules on the page or use the “Search” field to search for specific granules. | ||
| 3. Click on the “Execute” button. | ||
| 4. Choose the "UpdateCmrAccessConstraints" workflow from the dropdown | ||
| 5. Click on the "Add Custom Workflow Meta" link | ||
| 6. Enter json with the access constraint and description in this format and then click "Confirm" | ||
|
|
||
| UpdateCmrAccessConstraints example | ||
| { | ||
| "meta": { | ||
| "accessConstraints": { | ||
| "value": 6, | ||
| "description": "access constraint description" | ||
| } | ||
| } | ||
| } | ||
|
|
||
| Option 2. You have a list of granuleId's you want to update | ||
| 1. Select "Run Bulk Granules" | ||
| 2. Select "Run Bulk Operations" | ||
| 3. Enter the following json with the ids you want to update and then click the "Run Bulk Operations" button | ||
|
|
||
| Bulk Granules list input | ||
| { | ||
| "workflowName": "UpdateCmrAccessConstraints", | ||
| "index": "", | ||
| "query": "", | ||
| "ids": ["ASTGTMV003_N37E009", "ASTGTMV003_N12E021"], | ||
| "meta": { | ||
| "accessConstraints": { | ||
| "value": 6, | ||
| "description": "access constraint description" | ||
| } | ||
| } | ||
| } | ||
|
|
||
| Option 3. Bulk update using Elasticsearch query: | ||
|
|
||
| 1. If you have many granules to update, you can use an Elasticsearch query: | ||
| a. Determine the Elasticsearch index in the Cloud Metrics ELK stack for this Cumulus environment (See Cumulus Instances for the Kibana URL). Normally, you should be able to use the globbed value 'lpdaac-granule-prod*'. | ||
| b. First, use the Cloud Metrics Kibana instance to generate the query | ||
| i. Navigate to the URL | ||
| ii. Go to the Discover tab, and select the index filter '*-granule-*'. | ||
| iii. Construct a Lucene query to select the granules you want to update. You'll likely want to query by 'collectionId', and a temporal range. Make sure to hit "Refresh" if you change the parameters. | ||
|  | ||
|
|
||
| c. Now, extract data for the 'query' object. | ||
|
|
||
| i. Select the "Inspect" menu option, then select the "Request" tab. | ||
|  | ||
| ii. Locate the 'query' object, and extract it. | ||
|
|
||
| d. Create the bulk operation JSON object, and use the extracted 'query' object as the query value. Below is an example: | ||
|
|
||
| bulk UpdateCmrAccessConstraints example | ||
| { | ||
| "workflowName": "UpdateCmrAccessConstraints", | ||
| "index": "lpdaac-granule-prod*", | ||
| "query": { | ||
| "query": { | ||
| "bool": { | ||
| "must": [], | ||
| "filter": [ | ||
| { | ||
| "bool": { | ||
| "filter": [ | ||
| { | ||
| "bool": { | ||
| "should": [ | ||
| { | ||
| "match_phrase": { | ||
| "collectionId": "HLSL30___1.5" | ||
| } | ||
| } | ||
| ], | ||
| "minimum_should_match": 1 | ||
| } | ||
| }, | ||
| { | ||
| "bool": { | ||
| "should": [ | ||
| { | ||
| "match_phrase": { | ||
| "_index": "lpdaac-granule-prod*" | ||
| } | ||
| } | ||
| ], | ||
| "minimum_should_match": 1 | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| }, | ||
| { | ||
| "range": { | ||
| "@timestamp": { | ||
| "gte": "2020-01-01T06:00:00.000Z", | ||
| "lte": "2021-01-01T06:00:00.000Z", | ||
| "format": "strict_date_optional_time" | ||
| } | ||
| } | ||
| } | ||
| ], | ||
| "should": [], | ||
| "must_not": [] | ||
| } | ||
| } | ||
| }, | ||
| "ids": [], | ||
| "meta": { | ||
| "accessConstraints": { | ||
| "value": 6, | ||
| "description": "access constraint description" | ||
| } | ||
| } | ||
| } | ||
| e. Click the “Run Bulk Operations” button. | ||
|
|
||
|
|
||
| Option 4. Bulk update using scripts: | ||
|
|
||
| 1. Logon to elpdvx3 as websvc user. | ||
| 2. cd cumulus-utilities/operations/ops | ||
| 3. Modal .env files are in /home/websvc/cumulus-utilities/config/ | ||
| 4. Logs will be written to /home/websvc/cumulus-utilities/logs/ | ||
| 5. Output files will be created in /home/websvc/cumulus-utilities/output/ | ||
| 6. Verify the cumulus-utilities container is running: | ||
| a. $ docker ps -a | grep ops_cumulus-utilities-app | ||
| ec5eb5bffd7f elpdvx68.cr.usgs.gov:6000/lp-daac-cloud/cumulus/cumulus-utilities/cumulus-utilities-app:ops "python3" 4 days ago Up 4 days ops_cumulus-utilities-app_1 | ||
| 7. Generate the collection files for a provider from CMR. These will contain the concept-id needed to get the granules for a collection. Note that you only need to run this once, unless collections are added to CMR: | ||
| a. $ sh cumulus_utilities_control.sh utility cmr_granule_search_to_file.py --action ACTION --dir DIR [--collfileid COLLFILEID] [--collfile COLLFILE] [--requireToken REQUIRETOKEN] | ||
| i. ACTION: populate_collection_file | ||
| ii. DIR: /app/output/, the output directory inside the container. This maps to /home/websvc/cumulus-utilities/output on elpdvx3 | ||
| iii. COLLFILEID: This can be a short code of your choosing to identify your files | ||
| iv. REQUIRETOKEN: true to login to CMR with the user identified in the <mode>.env (PROD.env) file under /home/websvc/cumulus-utilities/config, false to query as guest | ||
| v. ex: $ sh cumulus_utilities_control.sh utility cmr_granule_search_to_file.py --action populate_collection_file --dir /app/output --collfileid ACL --requireToken true | ||
| 8. Generate a file containing the list of granules for a collection. These will be the granules that you will run the UpdateCmrAccessConstraints workflow for: | ||
| a. choose one of the files produced by the previous command: | ||
| i. $ ls -l /home/websvc/cumulus-utilities/output | ||
| ii. $ cat /home/websvc/cumulus-utilities/output/PROD_<collfileid value>*token.json ex: cat /home/websvc/cumulus-utilities/output/PROD_ACL*token.json | ||
| b. $ sh cumulus_utilities_control.sh utility cmr_granule_search_to_file.py --action ACTION --dir DIR [--collfileid COLLFILEID] [--collfile COLLFILE] [--requireToken REQUIRETOKEN] [--pageSize PAGESIZE] [--cmrSearchAfter CMRSEARCHAFTER] [--startDate STARTDATE] [--endDate ENDDATE] | ||
| c. ACTION: process_collection_file | ||
| d. DIR: /app/output/, the output directory inside the container. This maps to /home/websvc/cumulus-utilities/output on elpdvx3 | ||
| e. COLLFILE: The name of a collection file that was generated in the previous step | ||
| f. REQUIRETOKEN: true to login to CMR with the user identified in the <mode>.env (PROD.env) file under /home/websvc/cumulus-utilities/config, false to query as guest | ||
| g. PAGESIZE: The number of records to query from CMR in one call (calls will be in a loop). | ||
| h. CMRSEARCHAFTER: Used for restarting a incomplete search. Before restarting, save off the granuleId output file so it's not overwritten. Then find the last cmr-search-after value from the log. Run the same command as you initially ran, except include this --cmrSearchAfter argument. | ||
| i. STARTDATE: optional. yyyy-MM-ddTHH:mm:ssZ format. Omit it to start at the beginning. Supplying a value will get granules having a Temporal.RangeDateTime in Cumulus, after and including this date | ||
| j. ENDDATE: optional. yyyy-MM-ddTHH:mm:ssZ format. Omit it to ignore. Supplying a value will get granules having a Temporal.RangeDateTime in Cumulus, up to and including this date | ||
| k. ex: $ sh cumulus_utilities_control.sh utility cmr_granule_search_to_file.py --action process_collection_file --collfile PROD_ACL_ECO1BGEO_001_token.json --requireToken true --pageSize 500 --endDate 2018-08-07T20:24:23.070000Z --dir /app/output/ | ||
| l. The filename containing the granules will have been written near the top of the log. You will need it for the next step. ex. granule file name: /app/output/C1239578043-LPCLOUD_ECO1BGEO.001_PROD_1_1900.txt | ||
| m. Note that when you ran the previous step, a listing of commands was displayed. These were also captured in a file with 'process_cmds' in the name. This will have a listing of the commands that should be very similar to what you need to run: | ||
| i. $ ls -l /home/websvc/cumulus-utilities/output/*process_cmds.txt | ||
| -rw-r--r-- 1 17895 2030 8443 Nov 8 11:21 /home/websvc/cumulus-utilities/output/SIT_ACL_process_cmds.txt | ||
| ii. $ cat /home/websvc/cumulus-utilities/output/PROD_ACL_process_cmds.txt | ||
| iii. The commands are labeled 'local' and 'container'. Use the container version. | ||
| 9. Run the run_bulk_operation.py script to run the UpdateCmrAccessConstraints workflow against the granules in the input file you created in the previous step: | ||
| a. $ sh cumulus_utilities_control.sh utility run_bulk_operation.py --workflow WORKFLOW [--meta META] [--granulelistfile GRANULELISTFILE] --dataset DATASET | ||
| --dir DIR [--percent_failure_acceptable PERCENT_FAILURE_ACCEPTABLE] | ||
| [--percent_running_acceptable PERCENT_RUNNING_ACCEPTABLE] [--limit LIMIT] | ||
| i. WORKFLOW: UpdateCmrAccessConstraints | ||
| ii. META: "'{\"accessConstraints\":{\"value\":101,\"description\":\"Restricted for limited public release\"}}'" (enter your own values for the access constraint value and description) | ||
| iii. GRANULELISTFILE: The file created in the previous step which contains a list of granules for a collection | ||
| iv. DATASET: the dataset the GRANULELISTFILE is for. It will be a shortname and version (which must match shortname and version on the collections tab of the Cumulus Dashboard) joined by 3 underscores. Ex ECO1BGEO___001 | ||
| v. DIR: /app/output/, the output directory inside the container. This maps to /home/websvc/cumulus-utilities/output on elpdvx3 | ||
| vi. PERCENT_FAILURE_ACCEPTABLE: The percentage of failed granules in a batch that is acceptable. Default is 2. ex. Using the default of 2%, if more than 2 of 100 granules fail, processing will stop. If you increase it to 10%, if more than 10 of 100 granules fail, processing will stop. | ||
| vii. PERCENT_RUNNING_ACCEPTABLE: The percentage of running granules in a batch that is acceptable before submitting another batch. Default is 0. ex. Using the default of 0%, another batch will not be submitted until all the granules in the current batch are done running. If you set it to 5%, if 5 or less of 100 granules are still running, another batch will be submitted. Otherwise, it will pause and then check again. | ||
| viii. LIMIT: The number of granules to stage in one batch. Default is 20. | ||
| viiii. Ex: $ sh cumulus_utilities_control.sh utility run_bulk_operation.py --workflow UpdateCmrAccessConstraints --meta "'{\"accessConstraints\":{\"value\":101,\"description\":\"Restricted for limited public release\"}}'" --dataset ECO1BGEO___001 --granulelistfile C1239578043-LPCLOUD_ECO1BGEO.001_PROD_1_1900.txt --percent_failure_acceptable 10 --percent_running_acceptable 5 --limit 5 --dir /app/output/ | ||
|
|
||
|
|
||
| Restarting | ||
|
|
||
| If the run_bulk_operation fails to process the entire granule input file, you can restart it using this guidance: | ||
|
|
||
| You'll need the last granule processed: | ||
|
|
||
| if you still have the output on your screen, look for a line like this: | ||
|
|
||
| 2022-11-15 17:25:57.109673 +0000 INFO running bulk operation with this data: {'ids': '[ECOv002_L1A_BB_21239_014_20220405T002747_0700_01...ECOv002_L1A_BB_21241_012_20220405T032925_0700_01]', 'workflowName': 'UpdateCmrAccessConstraints', 'queueUrl': 'https://sqs.us-west-2.amazonaws.com/643705676985/lp-prod-forward-processing-throttled-queue', 'meta': {'accessConstraints': {'value': 101, 'description': 'Restricted for limited public release'}}} ... | ||
|
|
||
| if it's not on your screen, find the log from your run. It might help to locate your log by reverse sorting the log files by date: ls -lrt . Search the log for the last entry containing "running bulk operation with this data:" | ||
|
|
||
| That entry shows the first and last granule from the range of granules submitted. Get the last granule. In this example it is ECOv002_L1A_BB_21241_012_20220405T032925_0700_01 | ||
|
|
||
| docker exec --user root -it ops_cumulus-utilities-app_1 /bin/bash | ||
|
|
||
| cd output | ||
|
|
||
| ls -l | ||
|
|
||
| root@daa9f4032aff:/app/output# wc -l C2076119270-LPCLOUD_ECO_L1A_BB.002_PROD_1_1900.txt | ||
| 57776 C2076119270-LPCLOUD_ECO_L1A_BB.002_PROD_1_1900.txt | ||
|
|
||
| root@daa9f4032aff:/app/output# grep -n ECOv002_L1A_BB_21241_012_20220405T032925_0700_01 C2076119270-LPCLOUD_ECO_L1A_BB.002_PROD_1_1900.txt | ||
| 1100:ECOv002_L1A_BB_21241_012_20220405T032925_0700_01 | ||
|
|
||
| root@daa9f4032aff:/app/output# head -n 1100 C2076119270-LPCLOUD_ECO_L1A_BB.002_PROD_1_1900.txt > C2076119270-LPCLOUD_ECO_L1A_BB.002_PROD_1_1900_done.txt | ||
|
|
||
| root@daa9f4032aff:/app/output# tail -n +1101 C2076119270-LPCLOUD_ECO_L1A_BB.002_PROD_1_1900.txt > C2076119270-LPCLOUD_ECO_L1A_BB.002_PROD_1_1900_todo.txt | ||
|
|
||
| root@daa9f4032aff:/app/output# wc -l C2076119270-LPCLOUD_ECO_L1A_BB.002_PROD_1_1900_done.txt | ||
| 1100 C2076119270-LPCLOUD_ECO_L1A_BB.002_PROD_1_1900_done.txt | ||
|
|
||
| root@daa9f4032aff:/app/output# wc -l C2076119270-LPCLOUD_ECO_L1A_BB.002_PROD_1_1900_todo.txt | ||
| 56676 C2076119270-LPCLOUD_ECO_L1A_BB.002_PROD_1_1900_todo.txt | ||
|
|
||
| root@daa9f4032aff:/app/output# cat C2076119270-LPCLOUD_ECO_L1A_BB.002_PROD_1_1900_done.txt | ||
| The last line should be: ECOv002_L1A_BB_21241_012_20220405T032925_0700_01 | ||
|
|
||
| You can now run the run_bulk_operation script using the 'todo' file as your input file | ||
|
|
||
| 403 errors | ||
|
|
||
| If you see a 403 error similar to this trying to get a token, it might mean you have too many tokens. You can only have two: | ||
|
|
||
| getting token for user lpdaac_bmgt_ts2 | ||
| making request to https://urs.earthdata.nasa.gov/api/users/token | ||
| A request error occurred - <class 'requests.exceptions.HTTPError'>: 403 Client Error: Forbidden for url: https://urs.earthdata.nasa.gov/api/users/token | ||
|
|
||
| Get the CMR_USER and CMR_PASSWORD and base64 encode them: | ||
|
|
||
| echo -n 'cmr_user:cmr_password' | base64 | ||
|
|
||
| To list your tokens: | ||
|
|
||
| curl --request GET --url https://urs.earthdata.nasa.gov/api/users/tokens -H 'Authorization: Basic <base64encoded info here>' | ||
|
|
||
| To revoke a token: | ||
|
|
||
| curl --request POST --url 'https://urs.earthdata.nasa.gov/api/users/revoke_token?token=<TOKEN here>' -H 'Authorization: Basic <base64encoded info here>' | ||
|
|
||
|
|
||
|
|
||
| To edit files inside container: | ||
|
|
||
| a. docker exec --user root -it ops_cumulus-utilities-app_1 /bin/bash | ||
| b. apt-get install vim | ||
| c. cd output | ||
| d. ls -l | ||
|
|
||
| ## Monitoring execution for all Options: | ||
|
|
||
| 1. Click on the link to be directed to the Operations page. | ||
| a. Verify that the bulk action is running and the status of the event when completed. For example: | ||
|  | ||
|
|
||
| ## Removing an Access Constraint: | ||
|
|
||
| Follow the same procedures, assigning an access constraint value of 0 (assuming 0 is not restricted). | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this elpdvx3 is something specific to your stack/environment? can you describe this such that someone can find their version of the same? I'm assuming it's an EC2 instance by context clues, is this one of your stack's ECS instances?