SQLAlchemy Basics

SQLAlchemy Core Components
About ORM (Object-Relational Mapping)
ORM is a programming technique used to establish a mapping between object-oriented programming languages and relational databases. Its full name is Object-Relational Mapping.
By using this technique, you can operate on a database directly through a programming language without having to write SQL statements manually. It also makes it easier to migrate code across different database systems.
Basic ORM concepts:
- Entity: Corresponds to a table in the database.
- Property: Corresponds to a column in a table.
- Relationship: Represents associations between entities, such as one-to-many or many-to-many.
Python SQLAlchemy Example
Suppose we have a User class and a corresponding users table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | |
In this example, we define a User class that maps to the users table in the database. We can use an object-oriented approach to create, query, and manipulate user data without writing SQL statements directly.
Comparison between ORM and traditional database operations:
- Traditional approach:
sql INSERT INTO users (name, email) VALUES ('Alice', '[email protected]'); SELECT * FROM users WHERE name = 'Alice'; - ORM approach:
python new_user = User(name="Alice", email="[email protected]") session.add(new_user) users = session.query(User).filter(User.name == "Alice").all()
The ORM approach is closer to object-oriented programming thinking, and the code is more intuitive and easier to maintain.
Engine
A SQLAlchemy engine uses the create_engine function to connect to a database.
For example:
1 2 | |
The second line creates a SQLAlchemy engine object, which is responsible for managing the database connection. Let’s break down what this line means:
1 | |
- The
create_enginefunction:
create_engineis a function provided by SQLAlchemy for creating anengineobject. This object is the core of interacting with the database.
- Database URL:
-
'sqlite+pysqlite:///:memory:'is a database URL that specifies the database type and connection method.-
sqlite: specifies that SQLite is being used. -
pysqlite: specifies the DBAPI driver. In modern Python,pysqliteusually corresponds to the standard-librarysqlite3. -
/:memory:: specifies that the database runs in memory.
This means the database is temporary, starts with no tables or data, and is not persisted to disk. You need to create tables and insert data manually after connecting in order to do real database operations. Every time the program restarts, the database resets. This approach is especially suitable for testing and development.
-
- The
echo=Trueparameter:
echo=Trueis an optional parameter that tells SQLAlchemy to print all generated SQL statements to standard output. This is very useful for debugging and for learning how SQLAlchemy generates SQL.
Connection
Create a Connection and Execute SQL Statements
SQLAlchemy provides a simple yet powerful way to interact with a database. Let’s look at the basic connection and SQL execution process through an example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | |
Running this code produces the following output:
1 2 3 4 5 6 7 8 | |
Code Explanation: text() --> execute()
text() is used to create an executable SQL text object, which can then be passed to a connection object’s execute() method.
note:
In early versions of SQLAlchemy, the
execute()method could accept raw SQL strings directly. However, starting from SQLAlchemy 1.4, passing SQL strings directly was deprecated for better security and consistency.
Using text() allows us to use named parameters like :parameter, which helps prevent SQL injection attacks and improves code readability.
For example:
1 2 3 4 5 6 7 | |
And in our example:
1 2 3 4 5 6 7 8 | |
BEGIN (implicit) Transaction Management
When SQLAlchemy logs BEGIN (implicit), it means SQLAlchemy has recognized the start of a transaction, but this is based on the behavior of the database driver rather than SQLAlchemy or the application explicitly sending a BEGIN command.
Many database drivers automatically start a transaction when you execute the first operation that requires one. This means you do not need to call BEGIN explicitly to start a transaction.
Query Return Values from execute
Let’s first look at a basic query execution example:
1 2 3 4 5 6 7 8 9 | |
result is a list. In this example, result contains two tuple values: (1, 1) and (2, 4).
Parameterized Query
1 2 3 4 5 6 7 8 9 10 | |
Executing a query with parameters:
text("SELECT x, y FROM some_table WHERE y > :y"): creates a SQL query text object using the named parameter:y{"y": 2}: a parameter dictionary that binds the named parameter:yto the value2
Explanation of the runtime log output:
BEGIN (implicit): starts an implicit transactionSELECT x, y FROM some_table WHERE y > ?: the actual SQL being executed, where?is the parameter placeholder-
[generated in 0.00051s] (2,): shows the time taken to generate the query and the bound parameter value -
x: 2 y: 4: prints the row that matches the conditiony > 2, wherexis2andyis4 -
ROLLBACK: ends the transaction and rolls it back. This usually happens automatically when thewithblock ends
ORM
Session is a core concept in SQLAlchemy ORM (Object-Relational Mapping). It provides a higher-level abstraction for managing database interactions.
example code
1 2 3 4 5 6 7 8 9 | |
result
1 2 3 4 5 6 | |
Metadata
Imagine you are designing a house:
- You need a blueprint that marks the location, size, and purpose of each room
- You also need construction instructions telling the workers how to build the house
In SQLAlchemy, Metadata is like that blueprint:
- It records the “design drawings” of all tables (table names, column names, data types, and so on)
- It also includes “construction instructions” (
create_all,drop_all, and similar methods) that tell the database how to create tables according to the design
Core define
Use Table objects to define table structures directly, for example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | |
~~No one normally uses this~~
ORM Declarative Forms define
First, look at the complete code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | |
About the empty Base class
By defining an empty Base class, you can add custom methods or attributes in the future that will be inherited by all model classes, without modifying each model individually.
For example, adding a created_at column to all tables:
```python from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column from sqlalchemy import DateTime from datetime import datetime
class Base(DeclarativeBase): created_at: Mapped[datetime] = mapped_column(DateTime, default=datetime.utcnow) ```
In large projects, the Base class can be defined in a separate file for better code organization.
Changes between old and new versions:
It is worth noting that SQLAlchemy has changed how declarative base classes are handled across versions. In older versions, the
declarative_base()function was used to create the base class. In SQLAlchemy 2.0, theDeclarativeBaseclass was introduced.
Mapped and mapped_column
Mapped and mapped_column are new concepts in SQLAlchemy 2.0: type annotations.
Mapped[]is used for type hints to specify the type of an ORM-mapped attributemapped_column()defines the specific properties of a column and creates aColumnobject
Reading them together with the User and Address classes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | |
In older versions of SQLAlchemy, you might see code like this:
```python class User(Base): tablename = "user_account"
1 2 3 4id = Column(Integer, primary_key=True) name = Column(String(30)) fullname = Column(String) addresses = relationship("Address", back_populates="user")```
The main difference in newer versions is the use of
something: Mapped[int]andmapped_cloumn, which makes the code clearer and more type-safe. However, SQLAlchemy 2.0 still supports the old definition style.
For more details, please read Table Configuration with Declarative — SQLAlchemy 2.0 Documentation
Common built-in Python types (especially in a database context):
1 2 3 4 5 6 7 8 | |
Common SQLAlchemy database column types:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
If the Python type inside Mapped[] is incompatible with the database column type inside mapped_column(), SQLAlchemy will raise a runtime error. For example:
1 2 3 4 5 6 7 | |
SQLAlchemy’s type system checks whether these types are compatible. Here are some common mappings:
1 2 3 4 5 6 7 8 9 10 | |
In SQLAlchemy 2.0+, if you write only Mapped[str] without using mapped_column(), SQLAlchemy will use some default configuration:
For a simple Mapped[str]:
1 2 3 | |
In this case, SQLAlchemy will use the database’s default VARCHAR length:
- MySQL/MariaDB: the default is VARCHAR(255)
- PostgreSQL: there is no specific limit, but it is still recommended to specify a length explicitly
- SQLite: there is no enforced length limit
Best practice:
1 2 3 4 5 6 7 8 | |
Metadata Features
The table information for all model classes that inherit from Base is automatically registered in Base.metadata.
You can access all registered tables through metadata.tables.
1 2 3 4 5 6 7 8 | |
Run ORM Declarative Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | |
Before creating the tables, SQLAlchemy first executes PRAGMA statements to check whether these tables already exist in the main database (main) and the temporary database (temp), ensuring that existing tables are not created again.
1 2 3 4 | |
CUDR Operation
Similarly, there are Core-style CUDR operations and ORM-style CUDR operations. We will introduce both, with a focus on ORM.
Core CUDR
Insert()
1 2 3 4 5 6 7 8 9 10 | |
Select()
1 2 3 4 5 6 7 | |
Update() and Delete()
1 2 3 4 5 6 7 8 9 10 11 | |
More details are in Basic.py file
ORM CUDR
Session.new() & session.add()
session.new is a property of the SQLAlchemy Session object. It represents the set of all new objects that have been added to the session but not yet committed to the database.
Let’s understand it through code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
Other related session state collections:
session.dirty: contains modified but uncommitted objectssession.deleted: contains objects marked for deletion but not yet committedsession.identity_map: contains all objects tracked by the session
Together, these collections form SQLAlchemy’s object state management system, helping support efficient database operations and transaction management.
Session.get()
session.get(entity, primary_key) takes two main arguments:
1. The first argument is the entity class (in this example, the User class)
2. The second argument is the primary key value (in this example, 4)
If the record already exists in the session’s identity map, it returns the existing object directly.
If it does not exist, it sends a SQL query to the database to fetch the data.
Some usage examples:
1 2 3 4 5 6 7 8 9 | |
Session.flush() & Session.commit()
flush only synchronizes the current changes in the session to the database, but those changes are still inside the transaction and have not yet truly been committed.
commit actually commits the transaction, permanently saving the changes to the database. In practice, commit() first performs flush, then commits the current transaction, and then starts a new transaction.
Usually, you do not need to call flush() manually; let SQLAlchemy handle it automatically.
Remember: flush() is like writing changes into a draft, while commit() is the real save.
Use case:
If you need to use the primary key
idbeforecommit(), you can callflush()explicitly.```python from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class User(Base): tablename = 'users' id = Column(Integer, primary_key=True) name = Column(String)
创建用户
user = User(name="Alice") session.add(user) print(user.id) # 输出: None
flush 之后
session.flush() print(user.id) # 输出: 1 (已经有 ID 了)
这时其他事务还看不到这条数据
commit 之后其他事务才能看到
session.commit() ```
Session.delete()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | |
A concrete example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
Key points:
- session.delete() does not apply only to uncommitted data; it actually deletes records from the database
- But the deletion only becomes truly permanent after commit()
- Before commit(), you can undo the deletion with session.rollback()
So the behavior of session.delete() is consistent with add() / update():
- First mark the change in the session
- Then synchronize it to the database with flush
- Finally save it permanently with commit
Rollback
session.rollback() undoes all changes that have not been committed. Let me explain in detail:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | |
A complete example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | |
So you can understand SQLAlchemy transaction operations like this:
- commit() saves all changes
- rollback() undoes all unsaved changes
- flush() previews the effect of changes, but they can still be undone with rollback()
Session.close()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | |
Best practices:
- Use a context manager (recommended)
1 2 3 4 5 6 7 8 | |
- Full pattern for manual management
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
Querying with query & select
Type of query object:
1 2 3 4 | |
Methods for retrieving query results:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | |
Modern querying style (2.0 style) ==recommended==
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
Complex query examples:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | |
- Suggestions for choosing query methods
- When you expect exactly one result:
- If there must be a result: use
one()- If there may be no result: useone_or_none()- If you only need the first result: usefirst()
- When you need multiple results:
- Use
all()to get a list -
Use
scalars().all()(2.0 style) -
When querying by primary key:
-
Use
session.get(User, id) -
When pagination is needed:
1 2 3 4 5 6 7 8 9 | |
Remember, newer versions of SQLAlchemy recommend using the 2.0-style query syntax (using select()), which provides better type hints and consistency. However, the older query() syntax is still available and is used in many existing codebases.
Session.execute
In SQLAlchemy 2.x, there are indeed two main ways to execute queries: directly using the scalar() family of methods, and using the execute() method. Each has its own use cases, so let’s analyze them:
-
Direct use of the
scalar()family: -scalar()-scalar_one()-scalar_one_or_none() -
Use
execute(), then call other methods as needed such asscalars().all()
Which approach is recommended mainly depends on your specific needs:
- When you expect the query to return a single result, using the
scalar()family directly is cleaner and more straightforward:
- If you are sure the query will return exactly one result or
None, usescalar_one_or_none() - If you are sure the query must return exactly one result, use
scalar_one() - If you want the first result, if any, use
scalar()
These methods return ORM objects directly, with no further processing required.
- When you need more flexible result handling, or expect multiple results,
execute()is more appropriate:
execute()returns aResultobject, which provides more flexibility and control- You can call
scalars(),all(),first(), and other methods as needed - This is suitable for handling multi-row results, or when you need additional processing
Overall, the recommended practice in SQLAlchemy 2.x is:
- For single-result queries, prefer the
scalar()family, because they are more concise and return ORM objects directly. - For queries that may return multiple results, or when more result-processing options are needed, use
execute().
For example:
1 2 3 4 5 | |
This approach keeps the code concise while still providing enough flexibility to handle different query scenarios.
Further Notes on select
Once we import and use select, there is no need to import other keywords such as where, order_by, and so on separately, because select already comes with them:
SELECT and Related Constructs — SQLAlchemy 2.0 Documentation
There you can see the methods included with select:
1 | |
ORM CUDR SELECT
source: Using SELECT Statements — SQLAlchemy 2.0 Documentation
.where usage
In SQLAlchemy 2.0, you can use multiple .where() clauses, and they will be connected with AND.
1 2 3 4 5 | |
However, there are several equivalent alternatives:
Separate multiple conditions with commas inside where (they will automatically be joined with AND):
1 2 3 4 5 6 | |
Use the and_ function:
1 2 3 4 5 6 7 8 9 10 | |
Note
For usage of and_() and or_(), refer to the example below:
python
from sqlalchemy import and_, or_
print(
select(Address.email_address).where(
and_(
or_(User.name == "squidward", User.name == "sandy"),
Address.user_id == User.id,
)
)
)
equal to
sql
SELECT address.email_address
FROM address, user_account
WHERE (user_account.name = :name_1 OR user_account.name = :name_2)
AND address.user_id = user_account.id
These methods are equivalent and generate the same SQL query.
ORM relationship
Features related to ORM relationships
Relationship loading strategies
We can understand four different relationship loading strategies through a concrete example. Suppose we have the following data:
- 10 users
- Each user has 3 addresses
Let’s compare how different loading strategies behave:
Select in load()
1 2 3 4 5 6 7 | |
Lazy Load (default) - N+1 problem
1 2 3 4 5 6 7 8 9 10 11 | |
Joined Load
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
Network overhead analysis:
1 2 3 4 5 6 7 8 9 10 11 12 | |
Best practice recommendations:
- Small datasets, one-to-one relationships: joinedload can be used
- Large datasets, one-to-many relationships: use selectinload
- If you are not sure whether related data will be needed: use the default lazy loading
Session execution methods
1 2 3 4 5 | |
These two approaches are essentially the same. session.scalar() is actually shorthand for session.execute().scalar().
Usage recommendations:
- When querying for a single result, use
session.scalar():
1 2 | |
- When querying for multiple results, use
session.execute().scalars().all():
1 2 | |
- When you need to do more processing on the result, use
execute():
1 2 3 4 5 | |
Short version:
session.scalar() is for querying a single result
session.execute() returns a Result object, offering more control and flexibility, and is suitable for multi-row results
stmt functions
Common functions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | |
Common modifiers:
- .where(): add conditions
- .join(): inner join
- .outerjoin(): left join
- .distinct(): remove duplicates
- .order_by(): sort
- .limit(): limit the number of rows
- .offset(): skip records
- .options(): configure loading options
Common filter conditions:
- ==: equals
- !=: not equals
- .in_(): in a list
- .like(): fuzzy match
- .ilike(): case-insensitive fuzzy match
- >, <, >=, <=: comparisons
- and_(): and
- or_(): or
- .is_(None): is null
- .is_not(None): is not null