DE int

 0    18 schede    guest3164346
Scarica mp3 Stampa Gioca Testa il tuo livello
 
Domanda English Risposta English
ETL (Extract, Transform, Load)
inizia ad imparare
A process where data is extracted from a source, transformed (e.g. cleaned or aggregated), and then loaded into a database or data warehouse.
ELT (Extract, Load, Transform)
inizia ad imparare
Raw data is first loaded into the destination (like BigQuery), and then transformed using SQL or other tools inside the warehouse.
DAG (Directed Acyclic Graph – Airflow)
inizia ad imparare
A structure used in Airflow to define workflows. It represents a sequence of tasks that must run in a specific, non-circular order.
Partitioning (BigQuery)
inizia ad imparare
Dividing a large table into parts (usually by date) to make queries faster and cheaper by scanning only relevant partitions.
JOIN (SQL)
inizia ad imparare
A way to combine data from two or more tables based on a related column (e.g. user_id).
HAVING vs WHERE (SQL)
inizia ad imparare
WHERE filters rows before aggregation; HAVING filters after. Example: HAVING COUNT(*) > 100.
PySpark
inizia ad imparare
Python API for Apache Spark. It’s used to process very large datasets in a distributed, parallelized way.
BigQuery
inizia ad imparare
A serverless cloud data warehouse from Google, designed for running fast SQL queries on large datasets.
Data Lake
inizia ad imparare
A storage system for raw, unstructured, or semi-structured data — often used for flexible analytics or staging.
Data Warehouse
inizia ad imparare
A structured database optimized for analysis and reporting, typically holding cleaned and transformed data.
Airflow Operator
inizia ad imparare
A unit of work in Airflow DAGs – defines what each task does (e.g. PythonOperator, BashOperator).
Kafka Topic
inizia ad imparare
A named data stream in Apache Kafka where producers send and consumers receive messages.
IAM (Identity and Access Management – GCP)
inizia ad imparare
A system for managing permissions and access to resources in Google Cloud – defines who can do what.
KPI (Key Performance Indicator)
inizia ad imparare
A measurable value that shows how effectively a process or business is performing (e.g. conversion rate, average delay).
Lazy Evaluation (Spark)
inizia ad imparare
Transformations are not executed until an action (like. count() or. collect()) is called – helps optimize performance.
Retry (Airflow)
inizia ad imparare
A setting that allows a task to be automatically retried after failure, helpful for unstable operations.
Data Validation
inizia ad imparare
The process of ensuring that data is accurate and consistent – includes checking for missing values, duplicates, or wrong formats.
Window Function (SQL)
inizia ad imparare
A function that performs calculations across a "window" of rows related to the current row, without collapsing them into a single result (e.g. ROW_NUMBER(), AVG(...) OVER(...)).

Devi essere accedere per pubblicare un commento.