Equation 1
The formulation minimizes the difference between achieved and desired adjacencies while enforcing area, proportion, and entrance-to-corridor constraints during layout generation.
Journal of Computational Design and Engineering, 2024
ETH Zurich
A framework that treats architectural space layout design as a reinforcement learning problem, using laser-wall partitioning and PPO agents to explore constrained floor-plan configurations.
Problem
The paper starts from a practical difficulty: early floor-plan design has too many possible configurations to search manually or with rigid subdivision rules.
Space layout design has a combinatorial search space shaped by geometric constraints, room topology, and design objectives. This work frames the task as a Markov decision process so a learning agent can iteratively compose layouts inside a simulated design environment.
The paper introduces SpaceLayoutGym, an OpenAI Gym compatible environment for layout planning. The environment supports customizable design scenarios and evaluates agent performance across area, proportion, and adjacency objectives.
The story is simple: define a more flexible way to draw walls, turn layout quality into an RL objective, then test whether trained agents can generate plausible design options.
Core Idea
Instead of choosing complete room polygons directly, the agent builds the plan through a sequence of wall-placement decisions.
In SpaceLayoutGym, the agent starts from an empty plan and places a sequence of wall components. Each action modifies the layout, while rewards guide the agent toward target room areas, proportions, and adjacency relationships.
Objective
Once wall placement becomes an action, the design problem still needs a learning signal. Equation 1 defines the target fit, and Table 1 shows how misfit is converted into reward.
Equation 1
The formulation minimizes the difference between achieved and desired adjacencies while enforcing area, proportion, and entrance-to-corridor constraints during layout generation.
Table 1
| Reward type | Formula |
|---|---|
| Linear | R = yend - ((yend - ystart) / (xend - xstart)) (x - xstart) |
| Quadratic | R = yend - ((yend - ystart) / (xend - xstart)2) (x - xstart)2 |
| Logarithmic | R = yend - ((yend - ystart) / ln(xend - xstart + 1)) ln(x - xstart + 1) |
Here, x is the layout misfit and y is the reward. The experiments use these mappings to convert lower misfit into higher reward for the learning agent.
Environment
The environment connects the design scenario, the agent's actions, layout evaluation, and policy learning into one repeatable training loop.
Results
The output is not a final architectural plan. It is an early-stage design proposal that satisfies measurable constraints and can broaden the set of options for a designer.
The trained agents generate floor-plan layouts that respond to the selected constraints. The study compares PPO behavior with genetic algorithms and with selected human-designed layouts from RPLAN to examine both optimization performance and design diversity.
Citation
@article{kakooee2024reimagining,
title = {Reimagining space layout design through deep reinforcement learning},
author = {Kakooee, Reza and Dillenburger, Benjamin},
journal = {Journal of Computational Design and Engineering},
volume = {11},
number = {3},
pages = {43--55},
year = {2024},
publisher = {Oxford University Press},
doi = {10.1093/jcde/qwae025}
}