Lesson 3: Creating DataFrames

Course: Data Engineering | Duration: 45 minutes | Level: Intermediate

Learning Objectives

By the end of this lesson, you will be able to:

Create a DataFrame from a dict of lists (the most common pattern)
Create a DataFrame from a list of dicts (JSON-like records)
Identify and use key DataFrame attributes: .shape, .columns, .index, .dtypes, .size

Prerequisites

Lesson 2: Creating and Inspecting Series

Lesson Outline

Part 1: From a Dict of Lists (Most Common)

The most common way to create a DataFrame in practice is to pass a dictionary where each key is a column name and each value is a list of column values.

python

import pandas as pd
 
# Dict of lists — each key becomes a column
employees = pd.DataFrame({
    'name':             ['Alice', 'Bob', 'Carol', 'David'],
    'department':       ['Engineering', 'Marketing', 'Engineering', 'Marketing'],
    'salary':           [95000, 72000, 88000, 68000],
    'years_experience': [5, 3, 7, 2]
})
 
print(employees)
#     name   department  salary  years_experience
# 0  Alice  Engineering   95000                 5
# 1    Bob    Marketing   72000                 3
# 2  Carol  Engineering   88000                 7
# 3  David    Marketing   68000                 2

Rules:

All lists must have the same length — pandas raises a ValueError if they differ
Column order matches the dict key order (Python 3.7+ guarantees dict insertion order)
pandas infers the dtype for each column automatically

Part 2: From a List of Dicts (JSON-Like Records)

When data comes from a JSON API or database query, it typically arrives as a list of records — each record is a dict. pandas handles this pattern directly.

python

import pandas as pd
 
# List of dicts — each dict is one row
records = [
    {'name': 'Alice', 'department': 'Engineering', 'salary': 95000},
    {'name': 'Bob',   'department': 'Marketing',   'salary': 72000},
    {'name': 'Carol', 'department': 'Engineering', 'salary': 88000},
]
 
df = pd.DataFrame(records)
print(df)
#     name   department  salary
# 0  Alice  Engineering   95000
# 1    Bob    Marketing   72000
# 2  Carol  Engineering   88000

Part 3: From CSV/File (Preview)

In practice, most DataFrames come from files, not hardcoded dicts. Here is a quick preview — Section 3 covers this in full detail:

python

import pandas as pd
 
# Reading from a CSV file (Section 3 covers all options)
# df = pd.read_csv('data/employees.csv')
 
# Reading from JSON
# df = pd.read_json('data/employees.json')
 
# Reading from Parquet
# df = pd.read_parquet('data/employees.parquet')

The same DataFrame attributes you learn now apply equally to DataFrames loaded from files.

Part 4: DataFrame Attributes

Once you have a DataFrame, these attributes tell you what you have:

python

import pandas as pd
 
df = pd.DataFrame({
    'name':             ['Alice', 'Bob', 'Carol', 'David'],
    'department':       ['Engineering', 'Marketing', 'Engineering', 'Marketing'],
    'salary':           [95000, 72000, 88000, 68000],
    'years_experience': [5, 3, 7, 2]
})
 
print(df.shape)    # (4, 4) — (rows, columns)
print(df.columns)  # Index(['name', 'department', 'salary', 'years_experience'], dtype='object')
print(df.index)    # RangeIndex(start=0, stop=4, step=1)
print(df.size)     # 16 — total number of cells (4 rows × 4 columns)
print()
print(df.dtypes)
# name                object
# department          object
# salary               int64
# years_experience     int64
# dtype: object

Attribute	What it returns
`.shape`	`(rows, cols)` tuple
`.columns`	Index of column names
`.index`	Row index (default: RangeIndex)
`.dtypes`	Series of column name → dtype
`.size`	Total number of elements (rows × cols)

Practice

Key Takeaways

Dict of lists: most common pattern — pd.DataFrame({'col': [val, ...]}) — all lists must be the same length
List of dicts: JSON/API pattern — pd.DataFrame([{'col': val}, ...]) — missing keys become NaN
Core DataFrame attributes: .shape (rows, cols), .columns (column names), .index (row labels), .dtypes (column types), .size (total cells)
Use .shape[0] for row count and .shape[1] for column count — .size is total cells (rows × cols)

Common Mistakes to Avoid

Unequal list lengths: all value lists in a dict-of-lists must have the same length
Using .size for rows: .size is rows × columns; use .shape[0] for rows
Modifying a dict after creating a DataFrame: changes to the original dict do not affect the DataFrame — it copies the data at creation time

← Previous Lesson | Back to Course Overview | Next Lesson: Indexing and Selection →

Lesson 3: Creating DataFrames

Learning Objectives

Prerequisites

Lesson Outline

Part 1: From a Dict of Lists (Most Common)

Part 2: From a List of Dicts (JSON-Like Records)

Part 3: From CSV/File (Preview)

Part 4: DataFrame Attributes

Practice

Key Takeaways

Common Mistakes to Avoid

Concept Map

Try it yourself