Journal of Computational Design and Engineering, 2024

Reimagining Space Layout Design through Deep Reinforcement Learning

Reza Kakooee and Benjamin Dillenburger

ETH Zurich

A framework that treats architectural space layout design as a reinforcement learning problem, using laser-wall partitioning and PPO agents to explore constrained floor-plan configurations.

Problem

Why layout design needs a different search strategy

The paper starts from a practical difficulty: early floor-plan design has too many possible configurations to search manually or with rigid subdivision rules.

Space layout design has a combinatorial search space shaped by geometric constraints, room topology, and design objectives. This work frames the task as a Markov decision process so a learning agent can iteratively compose layouts inside a simulated design environment.

The paper introduces SpaceLayoutGym, an OpenAI Gym compatible environment for layout planning. The environment supports customizable design scenarios and evaluates agent performance across area, proportion, and adjacency objectives.

The story is simple: define a more flexible way to draw walls, turn layout quality into an RL objective, then test whether trained agents can generate plausible design options.

Core Idea

Let the agent compose space by placing laser-walls

Instead of choosing complete room polygons directly, the agent builds the plan through a sequence of wall-placement decisions.

Sequential laser-wall partitioning process for a floor plan.
Fig. 3. Laser-wall partitioning. The agent places wall primitives one step at a time. Each placed wall extends until it meets the plan boundary or another wall, allowing the environment to create layouts that are less restricted than standard rectangular dissection.

In SpaceLayoutGym, the agent starts from an empty plan and places a sequence of wall components. Each action modifies the layout, while rewards guide the agent toward target room areas, proportions, and adjacency relationships.

  • Customizable floor-plan outlines and room counts.
  • Image or feature-vector state representations.
  • Reward functions for constraint satisfaction and objective attainment.
  • Compatibility with PPO and other RL algorithms.

Objective

Translate design requirements into reward

Once wall placement becomes an action, the design problem still needs a learning signal. Equation 1 defines the target fit, and Table 1 shows how misfit is converted into reward.

Equation 1

min |Adj - Adj*|
s.t. A ≥ Amin
|A - A*| ≤ Ath
1 ≤ P ≤ P*
(C, E) ∈ Adĵ

The formulation minimizes the difference between achieved and desired adjacencies while enforcing area, proportion, and entrance-to-corridor constraints during layout generation.

Table 1

Different reward functions

Reward type Formula
Linear R = yend - ((yend - ystart) / (xend - xstart)) (x - xstart)
Quadratic R = yend - ((yend - ystart) / (xend - xstart)2) (x - xstart)2
Logarithmic R = yend - ((yend - ystart) / ln(xend - xstart + 1)) ln(x - xstart + 1)

Here, x is the layout misfit and y is the reward. The experiments use these mappings to convert lower misfit into higher reward for the learning agent.

Environment

SpaceLayoutGym closes the loop

The environment connects the design scenario, the agent's actions, layout evaluation, and policy learning into one repeatable training loop.

SpaceLayoutGym framework connecting design scenarios, the layout environment, reinforcement learning agents, and evaluation.
Fig. 7. SpaceLayoutGym framework. The framework translates layout requirements into an RL environment, lets the agent interact with the plan through wall-placement actions, and evaluates the resulting layout against geometric and topological design objectives.

Results

The trained agent produces layout alternatives

The output is not a final architectural plan. It is an early-stage design proposal that satisfies measurable constraints and can broaden the set of options for a designer.

Six layouts designed by trained reinforcement learning agents.
Fig. 10. Generated layouts. These six examples show how trained PPO agents respond to different geometric and topological objectives. The layouts are not copied from a dataset; they are produced through sequential interaction with the SpaceLayoutGym environment.

The trained agents generate floor-plan layouts that respond to the selected constraints. The study compares PPO behavior with genetic algorithms and with selected human-designed layouts from RPLAN to examine both optimization performance and design diversity.

Emergent behaviors

  1. Living room by remainder. The agent tends not to assign an explicit wall sequence to the living room. It places the other rooms first and leaves the remaining central area to become the living room.
  2. Rooms along facades. The agent tends to distribute rooms around the perimeter, which supports facade access for daylight and naturally keeps the living room more central.
Read the open-access article

Citation

BibTeX

@article{kakooee2024reimagining,
  title = {Reimagining space layout design through deep reinforcement learning},
  author = {Kakooee, Reza and Dillenburger, Benjamin},
  journal = {Journal of Computational Design and Engineering},
  volume = {11},
  number = {3},
  pages = {43--55},
  year = {2024},
  publisher = {Oxford University Press},
  doi = {10.1093/jcde/qwae025}
}