rs_workflows/flow_utils.md

Utility module for the Prefect flows.

`DprProcessIn`

Bases: BaseModel

Input parameters for the 'dpr-process' flow

Attributes:

Name	Type	Description
`env`	`FlowEnvArgs`	Prefect flow environment
`processor_name`	`DprProcessor \| str`	DPR processor name
`processor_version`	`str`	DPR processor version
`dask_cluster_label`	`str`	Dask cluster label e.g. "dask-l0"
`s3_payload_file`	`str`	S3 path where the processor payload will be written
`pipeline`	`DprPipeline \| str \| None`	Processor pipeline name. The task table propose one or several pipelines. Mandatory if unit is not provided.
`unit`	`str \| None`	Processor unit name. Advanced users can call directly a single unit of the task table. Mandatory if pipeline is not provided.
`priority`	`Priority`	Priority for the cluster dask to be able to prioritise task execution. By default is "low".
`workflow_type`	`WorkflowType`	Workflow type (benchmarking, on-demand, systematic). By default is "on-demand".
`input_products`	`list[dict[str, tuple[str, str]]]`	List of input products for the processor, structured as follows: * input_products.name * (stac item identifier, collection name) Example: [( "S1CADUS", ["S1A1234", "s01-cadip-session"])]
`generated_product_to_collection_identifier`	`list[dict[str, str \| tuple[str, str]]]`	List of output products for the processor, structured as follows: * output_products.name * (product:type, collection name) or * product:type When the collection name is not specified, it is equal to product:type. Example: [( "SRAL0", "s03sral0_" ),( "MWRL0", "s03mwrl0", "my-collection" )]
`auxiliary_product_to_collection_identifier`	`dict[str, str]`	Collection name where to push each auxiliary file (in rs-catalog). To apply the same treatment to all product types simultaneously, a "*" wildcard can be used. By default (when no input is provided), the collection name is set to -aux-
`processing_mode`	`list[ProcessingMode]`	List of modes to be applied when calling the DPR processor.
`start_datetime`	`datetime \| None`	Date that can be used to retrieve auxiliary data on the right time frame.
`end_datetime`	`datetime \| None`	Date that can be used to retrieve auxiliary data on the right time frame.
`satellite`	`SentinelSatellite \| str \| None`	In certain CQL2 queries from task tables, the parameter must be provided, as some auxiliary files depend on the satellite.

Source code in docs/rs-client-libraries/rs_workflows/flow_utils.py

class DprProcessIn(BaseModel):  # pylint: disable=too-many-instance-attributes
    """
    Input parameters for the 'dpr-process' flow

    Attributes:
        env: Prefect flow environment
        processor_name: DPR processor name
        processor_version: DPR processor version
        dask_cluster_label: Dask cluster label e.g. "dask-l0"
        s3_payload_file: S3 path where the processor payload will be written
        pipeline: Processor pipeline name. The task table propose one or several pipelines.
          Mandatory if unit is not provided.
        unit: Processor unit name. Advanced users can call directly a single unit of the task table.
          Mandatory if pipeline is not provided.
        priority: Priority for the cluster dask to be able to prioritise task execution. By default is "low".
        workflow_type: Workflow type (benchmarking, on-demand, systematic). By default is "on-demand".
        input_products: List of input products for the processor, structured as follows:
          * input_products.name
          * (stac item identifier, collection name)
          Example: [( "S1CADUS", ["S1A1234", "s01-cadip-session"])]
        generated_product_to_collection_identifier: List of output products for the processor, structured as follows:
          * output_products.name
          * (product:type, collection name)
          or
          * product:type
          When the collection name is not specified, it is equal to product:type.
          Example: [( "SRAL0", "s03sral0_" ),( "MWRL0", "s03mwrl0", "my-collection" )]
        auxiliary_product_to_collection_identifier: Collection name where to push each auxiliary file (in rs-catalog).
          To apply the same treatment to all product types simultaneously, a "*" wildcard can be used.
          By default (when no input is provided), the collection name is set to <mission>-aux-<product:type>
        processing_mode: List of modes to be applied when calling the DPR processor.
        start_datetime: Date that can be used to retrieve auxiliary data on the right time frame.
        end_datetime: Date that can be used to retrieve auxiliary data on the right time frame.
        satellite: In certain CQL2 queries from task tables, the <satellite> parameter must be provided,
          as some auxiliary files depend on the satellite.
    """

    env: FlowEnvArgs = Field(description="Prefect flow environment")
    processor_name: DprProcessor | str = Field(description="DPR processor name")
    processor_version: str = Field(description="DPR processor version")
    dask_cluster_label: str = Field(description='Dask cluster label e.g. "dask-l0"')
    s3_payload_file: str = Field(description="S3 path where the processor payload will be written")
    # 'pipeline' or 'unit' must be provided
    pipeline: DprPipeline | str | None = Field(
        default=None,
        description="Processor pipeline name. The task table propose one or several pipelines. "
        "Mandatory if unit is not provided.",
    )
    unit: str | None = Field(
        default=None,
        description="Processor unit name. Advanced users can call directly a single unit of the task table. "
        "Mandatory if pipeline is not provided.",
    )

    priority: Priority = Field(
        default=Priority.LOW,
        description="Priority for the cluster dask to be able to prioritise task execution. Default: `low`.",
    )
    workflow_type: WorkflowType = Field(
        default=WorkflowType.ON_DEMAND,
        description="Workflow type (benchmarking, on-demand, systematic). Default: `on-demand`.",
    )

    input_products: list[dict[str, tuple[str, str]]] = Field(
        description="List of input products for the processor, structured as follows: "
        "`input_products.name, (stac item identifier, collection name)`. "
        'Example: `[( "S1CADUS", ["S1A1234", "s01-cadip-session"])]`',
    )
    generated_product_to_collection_identifier: list[dict[str, str | tuple[str, str]]] = Field(
        description="List of output products for the processor, structured as follows: "
        "`output_products.name, (product:type, collection name)` "
        "or "
        "`product:type`. "
        "When the collection name is not specified, it is equal to `product:type`. "
        'Example: `[( "SRAL0", "s03sral0_" ),( "MWRL0", "s03mwrl0", "my-collection" )]`',
    )
    auxiliary_product_to_collection_identifier: dict[str, str] = Field(
        default_factory=dict,
        description="Collection name where to push each auxiliary file (in rs-catalog). "
        "To apply the same treatment to all product types simultaneously, a `*` wildcard can be used. "
        "By default (when no input is provided), the collection name is set to `<mission>-aux-<product:type>`",
    )

    processing_mode: list[ProcessingMode] = Field(
        default_factory=list,
        description="List of modes to be applied when calling the DPR processor.",
    )
    start_datetime: datetime | None = Field(
        default=None,
        description="Date that can be used to retrieve auxiliary data on the right time frame.",
    )
    end_datetime: datetime | None = Field(
        default=None,
        description="Date that can be used to retrieve auxiliary data on the right time frame.",
    )
    satellite: SentinelSatellite | str | None = Field(
        default=None,
        description="In certain CQL2 queries from task tables, the `<satellite>` parameter must be provided, "
        "as some auxiliary files depend on the satellite.",
    )

    @field_validator("processor_name", mode="before")
    @classmethod
    def normalize_processor_name(cls, v):
        """Normalize the processor name to a string."""
        return v.value if isinstance(v, DprProcessor) else v

    @field_validator("satellite", mode="before")
    @classmethod
    def normalize_satellite_name(cls, v):
        """Normalize the satellite name to a string."""
        return v.value if isinstance(v, SentinelSatellite) else v

    @model_validator(mode="after")
    def check_model(self):
        """
        Ensure required inputs are not empty and that exactly one of 'pipeline' or 'unit' is provided.

        The caller must specify either a pipeline or a unit, but not both
        and not neither.
        """
        has_pipeline = bool(self.pipeline)
        has_unit = bool(self.unit)
        if has_pipeline == has_unit:
            raise ValueError("Exactly one of 'pipeline' or 'unit' must be provided.")

        if not self.input_products:
            raise ValueError("'input_products' must contain at least one pystac.Item.")

        if not self.generated_product_to_collection_identifier:
            raise ValueError("'generated_product_to_collection_identifier' must not be empty.")

        if not self.auxiliary_product_to_collection_identifier:
            raise ValueError("'auxiliary_product_to_collection_identifier' must not be empty.")

        return self

`check_model()`

Ensure required inputs are not empty and that exactly one of 'pipeline' or 'unit' is provided.

The caller must specify either a pipeline or a unit, but not both and not neither.

Source code in docs/rs-client-libraries/rs_workflows/flow_utils.py

@model_validator(mode="after")
def check_model(self):
    """
    Ensure required inputs are not empty and that exactly one of 'pipeline' or 'unit' is provided.

    The caller must specify either a pipeline or a unit, but not both
    and not neither.
    """
    has_pipeline = bool(self.pipeline)
    has_unit = bool(self.unit)
    if has_pipeline == has_unit:
        raise ValueError("Exactly one of 'pipeline' or 'unit' must be provided.")

    if not self.input_products:
        raise ValueError("'input_products' must contain at least one pystac.Item.")

    if not self.generated_product_to_collection_identifier:
        raise ValueError("'generated_product_to_collection_identifier' must not be empty.")

    if not self.auxiliary_product_to_collection_identifier:
        raise ValueError("'auxiliary_product_to_collection_identifier' must not be empty.")

    return self

`normalize_processor_name(v)` `classmethod`

Normalize the processor name to a string.

Source code in docs/rs-client-libraries/rs_workflows/flow_utils.py

@field_validator("processor_name", mode="before")
@classmethod
def normalize_processor_name(cls, v):
    """Normalize the processor name to a string."""
    return v.value if isinstance(v, DprProcessor) else v

`normalize_satellite_name(v)` `classmethod`

Normalize the satellite name to a string.

Source code in docs/rs-client-libraries/rs_workflows/flow_utils.py

@field_validator("satellite", mode="before")
@classmethod
def normalize_satellite_name(cls, v):
    """Normalize the satellite name to a string."""
    return v.value if isinstance(v, SentinelSatellite) else v

`DprProcessOut` `dataclass`

Output parameters for the 'dpr-process' flow

Source code in docs/rs-client-libraries/rs_workflows/flow_utils.py

@dataclass
class DprProcessOut:
    """
    Output parameters for the 'dpr-process' flow
    """

    status: bool
    product_identifier: list[Item] = field(default_factory=list)

`FlowEnv`

Prefect flow environment and reusable objects.

Attributes:

Name	Type	Description
`owner_id`	`str`	User/owner ID
`calling_span`	`SpanContext \| None`	OpenTelemetry span of the calling flow, if any.
`this_span`	`SpanContext \| None`	Current OpenTelemetry span.
`rs_client`	`RsClient`	RsClient instance

Source code in docs/rs-client-libraries/rs_workflows/flow_utils.py

class FlowEnv:
    """
    Prefect flow environment and reusable objects.

    Attributes:
        owner_id (str): User/owner ID
        calling_span (SpanContext | None): OpenTelemetry span of the calling flow, if any.
        this_span (SpanContext | None): Current OpenTelemetry span.
        rs_client (RsClient): RsClient instance
    """

    def __init__(self, args: FlowEnvArgs):
        """Constructor."""
        self.owner_id: str = args.owner_id
        self.calling_span: SpanContext | None = None
        self.this_span: SpanContext | None = None

        # Deserialize the calling span, if any
        if args.calling_span:
            self.calling_span = SpanContext(*args.calling_span)

        # Read prefect blocks into env vars
        prefect_utils.read_prefect_blocks(self.owner_id, _sync=True)  # type: ignore

        # Init opentelemetry traces
        init_opentelemetry.init_traces("rs.client")

        # Init the RsClient instance from the env vars
        self.rs_client = RsClient(
            rs_server_href=os.getenv("RSPY_WEBSITE"),
            rs_server_api_key=os.getenv("RSPY_APIKEY"),
            owner_id=self.owner_id,
            logger=get_run_logger(),  # type: ignore
        )

    def serialize(self) -> FlowEnvArgs:
        """Serialize this object with Pydantic."""

        # The serialized object will be used by a new opentelemetry span.
        # Its calling span will be either the current span, or the current calling span.
        new_calling_span = self.this_span or self.calling_span
        if new_calling_span:
            # Only keep the first n attributes, the other need custom serialization
            serialized_span = tuple(new_calling_span)[:3]
        else:
            serialized_span = None

        return FlowEnvArgs(owner_id=self.owner_id, calling_span=serialized_span)  # type: ignore

    @_agnosticcontextmanager
    def start_span(
        self,
        instrumenting_module_name: str,
        name: str,
    ) -> Iterator[Span]:
        """
        Context manager for creating a new main or child OpenTelemetry span and set it
        as the current span in this tracer's context.

        Args:
            instrumenting_module_name: Caller module name, just pass __name__
            name: The name of the span to be created (use a custom name)

        Yields:
            The newly-created span.
        """
        # Create new span and save it
        with init_opentelemetry.start_span(  # pylint: disable=contextmanager-generator-missing-cleanup
            instrumenting_module_name,
            name,
            self.calling_span,
        ) as span:
            self.this_span = trace.get_current_span().get_span_context()
            yield span

`init(args)`

Constructor.

Source code in docs/rs-client-libraries/rs_workflows/flow_utils.py

def __init__(self, args: FlowEnvArgs):
    """Constructor."""
    self.owner_id: str = args.owner_id
    self.calling_span: SpanContext | None = None
    self.this_span: SpanContext | None = None

    # Deserialize the calling span, if any
    if args.calling_span:
        self.calling_span = SpanContext(*args.calling_span)

    # Read prefect blocks into env vars
    prefect_utils.read_prefect_blocks(self.owner_id, _sync=True)  # type: ignore

    # Init opentelemetry traces
    init_opentelemetry.init_traces("rs.client")

    # Init the RsClient instance from the env vars
    self.rs_client = RsClient(
        rs_server_href=os.getenv("RSPY_WEBSITE"),
        rs_server_api_key=os.getenv("RSPY_APIKEY"),
        owner_id=self.owner_id,
        logger=get_run_logger(),  # type: ignore
    )

`serialize()`

Serialize this object with Pydantic.

Source code in docs/rs-client-libraries/rs_workflows/flow_utils.py

def serialize(self) -> FlowEnvArgs:
    """Serialize this object with Pydantic."""

    # The serialized object will be used by a new opentelemetry span.
    # Its calling span will be either the current span, or the current calling span.
    new_calling_span = self.this_span or self.calling_span
    if new_calling_span:
        # Only keep the first n attributes, the other need custom serialization
        serialized_span = tuple(new_calling_span)[:3]
    else:
        serialized_span = None

    return FlowEnvArgs(owner_id=self.owner_id, calling_span=serialized_span)  # type: ignore

`start_span(instrumenting_module_name, name)`

Context manager for creating a new main or child OpenTelemetry span and set it as the current span in this tracer's context.

Parameters:

Name	Type	Description	Default
`instrumenting_module_name`	`str`	Caller module name, just pass name	required
`name`	`str`	The name of the span to be created (use a custom name)	required

Yields:

Type	Description
`Span`	The newly-created span.

Source code in docs/rs-client-libraries/rs_workflows/flow_utils.py

@_agnosticcontextmanager
def start_span(
    self,
    instrumenting_module_name: str,
    name: str,
) -> Iterator[Span]:
    """
    Context manager for creating a new main or child OpenTelemetry span and set it
    as the current span in this tracer's context.

    Args:
        instrumenting_module_name: Caller module name, just pass __name__
        name: The name of the span to be created (use a custom name)

    Yields:
        The newly-created span.
    """
    # Create new span and save it
    with init_opentelemetry.start_span(  # pylint: disable=contextmanager-generator-missing-cleanup
        instrumenting_module_name,
        name,
        self.calling_span,
    ) as span:
        self.this_span = trace.get_current_span().get_span_context()
        yield span

`FlowEnvArgs`

Bases: BaseModel

Prefect flow environment arguments.

Attributes:

Name	Type	Description
`owner_id`	`str`	User/owner ID (necessary to retrieve the user info: API key and OAuth2 cookie)
`from`	`the right Prefect block. NOTE`	may be useless after each user has their own prefect
`calling_span`	`tuple`	Serialized OpenTelemetry span of the calling flow, if any.

Source code in docs/rs-client-libraries/rs_workflows/flow_utils.py

class FlowEnvArgs(BaseModel):
    """
    Prefect flow environment arguments.

    Attributes:
        owner_id: User/owner ID (necessary to retrieve the user info: API key and OAuth2 cookie)
        from the right Prefect block. NOTE: may be useless after each user has their own prefect
        server because there will be only one block.
        calling_span (tuple): Serialized OpenTelemetry span of the calling flow, if any.
    """

    owner_id: str = Field(
        description="User/owner ID (necessary to retrieve the user info) from the right Prefect block",
    )
    calling_span: tuple[int, int, bool] | None = Field(
        default=None,
        description="Serialized OpenTelemetry span of the calling flow, if any",
    )

`Priority`

Bases: str, Enum

Priority for the cluster dask to be able to prioritise task execution.

Source code in docs/rs-client-libraries/rs_workflows/flow_utils.py

class Priority(str, Enum):
    """
    Priority for the cluster dask to be able to prioritise task execution.
    """

    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"

`ProcessingMode`

Bases: str, Enum

List of mode to be applied when calling the DPR processor.

Source code in docs/rs-client-libraries/rs_workflows/flow_utils.py

class ProcessingMode(str, Enum):
    """
    List of mode to be applied when calling the DPR processor.
    """

    NRT = "nrt"
    NTC = "ntc"
    REPROCESSING = "reprocessing"
    SUBS = "subs"
    ALWAYS = "always"

`SentinelSatellite`

Bases: str, Enum

Sentinel satellite name

Source code in docs/rs-client-libraries/rs_workflows/flow_utils.py

class SentinelSatellite(str, Enum):
    """Sentinel satellite name"""

    # String value = STAC standardized value
    S1A = "sentinel-1a"
    S1B = "sentinel-1b"
    S1C = "sentinel-1c"
    S2A = "sentinel-2a"
    S2B = "sentinel-2b"
    S2C = "sentinel-2c"
    S3A = "sentinel-3a"
    S3B = "sentinel-3b"

`WorkflowType`

Bases: str, Enum

Workflow type.

Source code in docs/rs-client-libraries/rs_workflows/flow_utils.py

class WorkflowType(str, Enum):
    """
    Workflow type.
    """

    BENCHMARKING = "benchmarking"
    ON_DEMAND = "on-demand"
    SYSTEMATIC = "systematic"

rs_workflows/flow_utils.md

DprProcessIn

check_model()

normalize_processor_name(v) classmethod

normalize_satellite_name(v) classmethod

DprProcessOut dataclass

FlowEnv

__init__(args)

serialize()

start_span(instrumenting_module_name, name)

FlowEnvArgs

Priority

ProcessingMode

SentinelSatellite

WorkflowType

`DprProcessIn`

`check_model()`

`normalize_processor_name(v)` `classmethod`

`normalize_satellite_name(v)` `classmethod`

`DprProcessOut` `dataclass`

`FlowEnv`

`init(args)`

`serialize()`

`start_span(instrumenting_module_name, name)`

`FlowEnvArgs`

`Priority`

`ProcessingMode`

`SentinelSatellite`

`WorkflowType`