Dubai Land Department (DLD) - Dataset EDA¶
Source: Dubai Pulse Open Data - DLD Transactions and Rent Contracts
Files: Transactions.csv, Rent_Contracts.csv
Purpose: Familiarisation with the raw dataset - structure, coverage, distributions, and data quality notes before any modelling or analysis.
Setup¶
1. Transactions Dataset¶
The transactions file records every registered property sale in Dubai. Each row is one transaction. We load the full file first and inspect it before any filtering.
Shape: 1,253,267 rows x 46 columns
| transaction_id | procedure_id | trans_group_id | trans_group_ar | trans_group_en | procedure_name_ar | procedure_name_en | instance_date | property_type_id | property_type_ar | property_type_en | property_sub_type_id | property_sub_type_ar | property_sub_type_en | property_usage_ar | ... | nearest_metro_ar | nearest_metro_en | nearest_mall_ar | nearest_mall_en | rooms_ar | rooms_en | has_parking | procedure_area | actual_worth | meter_sale_price | rent_value | meter_rent_price | no_of_parties_role_1 | no_of_parties_role_2 | no_of_parties_role_3 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1-11-2004-2023 | 11 | 1 | مبايعات | Sales | بيع | Sell | 27-09-2004 | 1 | أرض | Land | NaN | NaN | NaN | سكني | ... | محطة مترو بنك أبوظبي التجاري | ADCB Metro Station | مول دبي | Dubai Mall | NaN | NaN | 0 | 1904.88 | 4101000.00 | 2152.89 | NaN | NaN | 14.00 | 1.00 | 0.00 |
| 1 | 2-13-2008-381 | 13 | 2 | رهون | Mortgages | تسجيل رهن | Mortgage Registration | 06-03-2008 | 4 | فيلا | Villa | NaN | NaN | NaN | أخرى | ... | محطة مترو بنك أبوظبي التجاري | ADCB Metro Station | مول دبي | Dubai Mall | NaN | NaN | 0 | 896.61 | 3000000.00 | 3345.94 | NaN | NaN | 1.00 | 1.00 | 0.00 |
| 2 | 3-9-2006-300097 | 9 | 3 | هبات | Gifts | هبه | Grant | 17-07-2006 | 4 | فيلا | Villa | NaN | NaN | NaN | سكني | ... | محطة مترو الجافلية | Al Jafiliya Metro Station | مول دبي | Dubai Mall | NaN | NaN | 0 | 341.81 | 1199971.00 | 3510.64 | NaN | NaN | 1.00 | 1.00 | 0.00 |
3 rows × 46 columns
<class 'pandas.DataFrame'> RangeIndex: 1253267 entries, 0 to 1253266 Data columns (total 46 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 transaction_id 1253267 non-null str 1 procedure_id 1253267 non-null int64 2 trans_group_id 1253267 non-null int64 3 trans_group_ar 1253267 non-null str 4 trans_group_en 1253267 non-null str 5 procedure_name_ar 1253267 non-null str 6 procedure_name_en 1253267 non-null str 7 instance_date 1253267 non-null str 8 property_type_id 1253267 non-null int64 9 property_type_ar 1253267 non-null str 10 property_type_en 1253267 non-null str 11 property_sub_type_id 975897 non-null float64 12 property_sub_type_ar 975897 non-null str 13 property_sub_type_en 975897 non-null str 14 property_usage_ar 1253267 non-null str 15 property_usage_en 1253267 non-null str 16 reg_type_id 1253267 non-null int64 17 reg_type_ar 1253267 non-null str 18 reg_type_en 1253267 non-null str 19 area_id 1253267 non-null int64 20 area_name_ar 1253267 non-null str 21 area_name_en 1253267 non-null str 22 building_name_ar 866918 non-null str 23 building_name_en 867318 non-null str 24 project_number 846489 non-null float64 25 project_name_ar 846489 non-null str 26 project_name_en 846489 non-null str 27 master_project_en 1030218 non-null str 28 master_project_ar 1030168 non-null str 29 nearest_landmark_ar 1071296 non-null str 30 nearest_landmark_en 1071296 non-null str 31 nearest_metro_ar 938194 non-null str 32 nearest_metro_en 938194 non-null str 33 nearest_mall_ar 933053 non-null str 34 nearest_mall_en 933053 non-null str 35 rooms_ar 957141 non-null str 36 rooms_en 957141 non-null str 37 has_parking 1253267 non-null int64 38 procedure_area 1253267 non-null float64 39 actual_worth 1253267 non-null float64 40 meter_sale_price 1253267 non-null float64 41 rent_value 34794 non-null float64 42 meter_rent_price 34794 non-null float64 43 no_of_parties_role_1 1252341 non-null float64 44 no_of_parties_role_2 1252341 non-null float64 45 no_of_parties_role_3 1252341 non-null float64 dtypes: float64(10), int64(6), str(30) memory usage: 439.8 MB
1.1 Missing values¶
| Missing | Pct % | |
|---|---|---|
| meter_rent_price | 1218473 | 97.20 |
| rent_value | 1218473 | 97.20 |
| project_name_ar | 406778 | 32.50 |
| project_number | 406778 | 32.50 |
| project_name_en | 406778 | 32.50 |
| building_name_ar | 386349 | 30.80 |
| building_name_en | 385949 | 30.80 |
| nearest_mall_en | 320214 | 25.60 |
| nearest_mall_ar | 320214 | 25.60 |
| nearest_metro_en | 315073 | 25.10 |
| nearest_metro_ar | 315073 | 25.10 |
| rooms_en | 296126 | 23.60 |
| rooms_ar | 296126 | 23.60 |
| property_sub_type_en | 277370 | 22.10 |
| property_sub_type_id | 277370 | 22.10 |
| property_sub_type_ar | 277370 | 22.10 |
| master_project_ar | 223099 | 17.80 |
| master_project_en | 223049 | 17.80 |
| nearest_landmark_ar | 181971 | 14.50 |
| nearest_landmark_en | 181971 | 14.50 |
| no_of_parties_role_1 | 926 | 0.10 |
| no_of_parties_role_2 | 926 | 0.10 |
| no_of_parties_role_3 | 926 | 0.10 |
1.2 Transaction types¶
The trans_group_en column distinguishes sales, mortgages, gifts, and other transaction types.
| Count | |
|---|---|
| trans_group_en | |
| Sales | 944983 |
| Mortgages | 262422 |
| Gifts | 45862 |
1.3 Annual transaction volume (2000 onward)¶
Years before 2000 have very sparse and inconsistent entries. We restrict the time axis to 2000–2025 for a clean view of the modern market.
1.4 Property type breakdown¶
--- property_type_en ---
| Count | |
|---|---|
| property_type_en | |
| Unit | 867318 |
| Villa | 247338 |
| Land | 104854 |
| Building | 33757 |
--- property_sub_type_en ---
| Count | |
|---|---|
| property_sub_type_en | |
| Flat | 752822 |
| Villa | 108357 |
| Office | 61053 |
| Hotel Apartment | 24672 |
| Shop | 14043 |
| Hotel Rooms | 13119 |
| Workshop | 515 |
| Stacked Townhouses | 439 |
| Store | 319 |
| Building | 225 |
--- property_usage_en ---
| Count | |
|---|---|
| property_usage_en | |
| Residential | 1022134 |
| Commercial | 158182 |
| Hospitality | 37854 |
| Other | 27213 |
| Industrial | 4149 |
| Multi-Use | 1971 |
| Agricultural | 1080 |
| Storage | 655 |
| Residential / Commercial | 29 |
--- reg_type_en ---
| Count | |
|---|---|
| reg_type_en | |
| Existing Properties | 876328 |
| Off-Plan Properties | 376939 |
1.5 Room type distribution¶
1.6 Sale price per m² distribution¶
Clipped at the 95th percentile to remove extreme commercial and land transaction values, giving a readable view of the residential price range.
Count (before clip): 1,253,262 Count (after clip): 1,190,598 Clip threshold: AED 28,545 / m² Mean: AED 10,738 / m² Median: AED 9,671 / m²
1.7 Unit area distribution¶
Clipped at 500 m² to focus on residential units. The vast majority of flats and apartments fall well below this threshold; values above represent villas, plots, and commercial units.
Count (before clip): 1,253,267 Count (after clip): 1,052,105 (83.9% of total)
1.8 Top areas and projects¶
--- Top 15 areas ---
| Transactions | |
|---|---|
| area_name_en | |
| Marsa Dubai | 118929 |
| Business Bay | 88294 |
| Al Thanyah Fifth | 84158 |
| Al Barsha South Fourth | 71289 |
| Burj Khalifa | 62300 |
| Al Warsan First | 53377 |
| Jabal Ali First | 43916 |
| Palm Jumeirah | 39104 |
| Al Hebiah Fourth | 37347 |
| Wadi Al Safa 5 | 36569 |
| Al Merkadh | 31243 |
| Al Thanyah Third | 31079 |
| Hadaeq Sheikh Mohammed Bin Rashid | 30751 |
| Al Thanayah Fourth | 29651 |
| Nadd Hessa | 26842 |
--- Top 15 projects ---
| Transactions | |
|---|---|
| project_name_en | |
| REMRAAM | 10281 |
| SKY COURTS | 9663 |
| JUMEIRAH PARK | 6558 |
| INTERNATIONAL CITY EMARATI | 4775 |
| VICTORY HEIGHTS | 4107 |
| LAKESIDE | 3842 |
| CHURCHILL TOWER | 3706 |
| AL KHAIL HEIGHTS | 3430 |
| LAGO VISTA | 3245 |
| DAMAC TOWERS BY PARAMOUNT | 3232 |
| MARINA RESIDENCE | 3189 |
| SEVEN CITY JLT | 3186 |
| BURJ KHALIFA TOWERS | 3168 |
| TOWN SQUARE ZAHRA | 3118 |
| PARK ISLANDS | 3094 |
1.9 Median sale price per m² over time (2000–2025)¶
2. Rent Contracts Dataset¶
The Ejari rent contracts file records all registered tenancy agreements. We inspect the raw file before any filtering.
Shape: 7,111,733 rows x 40 columns
| contract_id | contract_reg_type_id | contract_reg_type_ar | contract_reg_type_en | contract_start_date | contract_end_date | contract_amount | annual_amount | no_of_prop | line_number | is_free_hold | ejari_bus_property_type_id | ejari_bus_property_type_ar | ejari_bus_property_type_en | ejari_property_type_id | ... | master_project_ar | master_project_en | area_id | area_name_ar | area_name_en | actual_area | nearest_landmark_ar | nearest_landmark_en | nearest_metro_ar | nearest_metro_en | nearest_mall_ar | nearest_mall_en | tenant_type_id | tenant_type_ar | tenant_type_en | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | CRT1012981266 | 1 | جديد | New | 07-04-2019 | 06-04-2020 | 85000 | 85000 | 1 | 1 | 1 | 2 | وحدة | Unit | 2.00 | ... | الخليج التجاري | Business Bay | 526 | الخليج التجارى | Business Bay | 140.00 | وسط مدينة دبي | Downtown Dubai | محطة مترو بوج خليفة دبي مول | Buj Khalifa Dubai Mall Metro Station | مول دبي | Dubai Mall | 1.00 | شخص | Person |
| 1 | CRT1012983196 | 1 | جديد | New | 20-04-2019 | 19-04-2020 | 110000 | 110000 | 1 | 1 | 1 | 4 | فيلا | Villa | 841.00 | ... | قرية جميرا المثلثة | Jumeirah Village Triangle | 442 | البرشاء جنوب الخامسة | Al Barsha South Fifth | 734.00 | أكاديمية المدينة الرياضية للسباحة | Sports City Swimming Academy | محطة مترو النخيل | Nakheel Metro Station | مارينا مول | Marina Mall | 1.00 | شخص | Person |
| 2 | CRT1012984226 | 1 | جديد | New | 11-04-2019 | 10-04-2020 | 100000 | 100000 | 1 | 1 | 1 | 4 | فيلا | Villa | 841.00 | ... | NaN | NaN | 506 | اليلايس 1 | Al Yelayiss 1 | 324.00 | دورة دبي للدراجات | Dubai Cycling Course | NaN | NaN | NaN | NaN | 1.00 | شخص | Person |
3 rows × 40 columns
<class 'pandas.DataFrame'> RangeIndex: 7111733 entries, 0 to 7111732 Data columns (total 40 columns): # Column Dtype --- ------ ----- 0 contract_id str 1 contract_reg_type_id int64 2 contract_reg_type_ar str 3 contract_reg_type_en str 4 contract_start_date str 5 contract_end_date str 6 contract_amount int64 7 annual_amount int64 8 no_of_prop int64 9 line_number int64 10 is_free_hold int64 11 ejari_bus_property_type_id int64 12 ejari_bus_property_type_ar str 13 ejari_bus_property_type_en str 14 ejari_property_type_id float64 15 ejari_property_type_en str 16 ejari_property_type_ar str 17 ejari_property_sub_type_id float64 18 ejari_property_sub_type_en str 19 ejari_property_sub_type_ar str 20 property_usage_en str 21 property_usage_ar str 22 project_number float64 23 project_name_ar str 24 project_name_en str 25 master_project_ar str 26 master_project_en str 27 area_id int64 28 area_name_ar str 29 area_name_en str 30 actual_area float64 31 nearest_landmark_ar str 32 nearest_landmark_en str 33 nearest_metro_ar str 34 nearest_metro_en str 35 nearest_mall_ar str 36 nearest_mall_en str 37 tenant_type_id float64 38 tenant_type_ar str 39 tenant_type_en str dtypes: float64(5), int64(8), str(27) memory usage: 2.1 GB
2.1 Missing values¶
| Missing | Pct % | |
|---|---|---|
| project_name_en | 6040500 | 84.90 |
| project_name_ar | 6040500 | 84.90 |
| project_number | 6040500 | 84.90 |
| master_project_ar | 4602506 | 64.70 |
| master_project_en | 4602490 | 64.70 |
| nearest_mall_ar | 841530 | 11.80 |
| nearest_mall_en | 841530 | 11.80 |
| nearest_metro_ar | 776980 | 10.90 |
| nearest_metro_en | 776980 | 10.90 |
| tenant_type_en | 758070 | 10.70 |
| tenant_type_ar | 758070 | 10.70 |
| tenant_type_id | 758070 | 10.70 |
| nearest_landmark_ar | 504240 | 7.10 |
| nearest_landmark_en | 504240 | 7.10 |
| actual_area | 142885 | 2.00 |
| ejari_property_sub_type_en | 60697 | 0.90 |
| ejari_property_sub_type_ar | 60697 | 0.90 |
| ejari_property_sub_type_id | 55853 | 0.80 |
| ejari_property_type_en | 54628 | 0.80 |
| ejari_property_type_ar | 54628 | 0.80 |
| ejari_property_type_id | 53519 | 0.80 |
| property_usage_en | 10949 | 0.20 |
| property_usage_ar | 10949 | 0.20 |
2.2 Contract and property type breakdown¶
--- contract_reg_type_en ---
| Count | |
|---|---|
| contract_reg_type_en | |
| New | 3624049 |
| Renew | 3487684 |
--- ejari_bus_property_type_en ---
| Count | |
|---|---|
| ejari_bus_property_type_en | |
| Unit | 6518338 |
| Villa | 536520 |
| Land | 53519 |
| Building | 3356 |
--- ejari_property_type_en ---
| Count | |
|---|---|
| ejari_property_type_en | |
| Flat | 4157051 |
| Office | 786665 |
| Shop | 731861 |
| Labor Camps | 527968 |
| Villa | 480489 |
| Warehouse | 85368 |
| Studio | 70075 |
| Hotel | 51386 |
--- ejari_property_sub_type_en ---
| Count | |
|---|---|
| ejari_property_sub_type_en | |
| 1bed room+Hall | 1605137 |
| 2 bed rooms+hall | 1535866 |
| Studio | 818154 |
| Office | 771676 |
| Shop | 674592 |
| 3 bed rooms+hall | 556293 |
| Room in labor Camp | 519487 |
| 4 bed rooms+hall | 189635 |
--- property_usage_en ---
| Count | |
|---|---|
| property_usage_en | |
| Residential | 5270287 |
| Commercial | 1750968 |
| Industrial | 29019 |
| Industrial / Commercial | 25583 |
| Multi Usage | 9573 |
| Industrial / Commercial / Residential | 7076 |
| Storage | 2972 |
| Tourist origin | 1972 |
2.3 Annual contract volume (2010–2025)¶
Rent contract registration on Ejari became systematic from around 2010. We restrict the axis to 2010–2025 to avoid showing spurious future dates from data entry errors.
2.4 Annual rent distribution¶
Clipped at 95th percentile and with a minimum of AED 5,000 to remove entry errors. This focuses the view on the realistic residential rental range.
Count (after floor): 7,093,787 Count (after clip): 6,739,097 Clip threshold: AED 1,299,816 Mean: AED 115,925 Median: AED 61,600
2.5 Unit area distribution (rent contracts)¶
Clipped at 500 m² matching the sales dataset, focusing on the residential unit range.
Count (before clip): 6,844,081 Count (after clip): 6,505,600 (95.1% of total)
2.6 Median annual rent per m² over time (2010–2025)¶
Computed from valid records only (area > 0, date within range). The rent per m² metric normalises for unit size differences across years.
2.7 Top areas by contract volume¶
--- Top 15 areas (rent contracts) ---
| Contracts | |
|---|---|
| area_name_en | |
| Al Warsan First | 302247 |
| Jabal Ali First | 259119 |
| Naif | 211797 |
| Al Karama | 208099 |
| Marsa Dubai | 196764 |
| Jabal Ali Industrial First | 194143 |
| Business Bay | 170394 |
| Al Nahda Second | 169891 |
| Al Mararr | 163685 |
| Nadd Hessa | 158552 |
| Al Suq Al Kabeer | 158142 |
| Al Barsha First | 157948 |
| Al Goze Industrial Second | 152263 |
| Al Murqabat | 146972 |
| Mirdif | 145465 |
3. Joint Overview - Sales vs Rent Trends¶
Using the filtered residential segment (2–3 bed flats, 70–160 m², 2014 onward) to compare price and rent on the same time axis.
Filtered sales rows: 81,790 Filtered rent rows: 475,542
| Gross Yield % | |
|---|---|
| date | |
| 2014 | 6.67 |
| 2015 | 7.97 |
| 2016 | 7.85 |
| 2017 | 7.39 |
| 2018 | 7.62 |
| 2019 | 7.46 |
| 2020 | 6.91 |
| 2021 | 5.30 |
| 2022 | 5.00 |
| 2023 | 5.60 |
| 2024 | 6.37 |
4. Data Quality Notes¶
| Issue | Detail |
|---|---|
| Sparse early years | Transactions before 2000 are very few. Consistent coverage begins around 2013-2014. |
| Future dates in rent file | Some rent contracts have malformed dates parsing to 2030+ or beyond. Filter the date index to 2025 or earlier. |
| Outlier prices | meter_sale_price has extreme values from land and commercial transactions. Clip at 95th percentile for residential analysis. |
| Unit area outliers | procedure_area includes very large plots. Clip at 500 m² for residential scope. |
| Master project nulls | A portion of transactions have no master_project_en. Excluded in project-level analysis. |
| Thin projects | Many projects have fewer than 10 transactions. Median price from sparse data is unreliable. Filter to projects with sufficient volume. |
| Rent area nulls | actual_area has some nulls in the rent file, preventing rent per m² calculation for those rows. |
| Date format | Both files use DD-MM-YYYY string format. Parse with format='%d-%m-%Y' to avoid ambiguity. |
| Transaction types | The transactions file includes sales, mortgages, and gifts. Filter to reg_type_en == 'Existing Properties' for resale analysis. |
For yield estimation, ROI projections, and project-level analysis see the companion analysis notebook.