A new 3D mesh building extraction method

Written by Liuyun Duan | Nov 14, 2023 5:29:00 PM

At Luxcarta, we recently developed a novel, deep learning technique for 3D mesh building extraction from textured meshes. The process significantly speeds up the creation of accurate 3D maps of dense urban areas.

Key takeaways:

Accurate 3D maps of urban areas are vital to many industries but are challenging and time-consuming to produce.
We developed an innovative 3D mesh building extraction method that uses deep learning techniques to generalize and speed up the whole process.
Learn about our new process and how we evaluated it.
Read about potential uses for this new technique.

In recent years, vast swathes of the planet’s surface have been photographed using satellites, aircraft, drones, and other methods. Using powerful computers, it is now possible to stitch together these images to create a ‘textured mesh’. Using 3D building extraction from meshes, we can further enhance them. This process cuts out building footprints with heights from the mesh. It allows us to identify individual structures.

This is an incredibly powerful tool. Polygonal extraction of building footprints allows urban planners, architects, telecommunications RF planners, utilities providers, engineers, and many other professionals to achieve a far deeper understanding of urban areas and building heights for all sorts of purposes.
However, 3D mesh building modeling is typically very time-consuming and resource-intensive. So, we decided to experiment with a new deep-learning method for building segmentation from colour images with elevation data that speeds up the process of generating accurate level of detail (LoD2) buildings. We show the performance and potential of our new method by evaluating it on three worldwide cities with different characteristics – which we presented at the SPIE conference in October 2023 (you can read the paper here).

More breakthroughs: A new automatic wall extraction process

A faster method for 3D building extraction is needed

For many years, cartographers have been able to manually ‘cut out’ building footprints from images and add them as a layer in their GIS mapping systems. However, this process is very time-consuming. Similarly, various techniques also exist for turning 2D aerial or satellite images into a 3D model. But again, this process tends to be resource-intensive and can take several days or even weeks to complete.

This is problematic for several reasons.

First and foremost, it adds a significant delay to any project. Imagine that a city wanted to create a map of the urban environment to help plan their flood defences. Creating a detailed, 3D map would usually add several weeks to the process – and may also require skilled (and expensive) consultants.

There’s also the issue of change. In modern cities, new buildings – both permitted and unofficial (i.e., informal housing) – can be added rapidly and so existing maps can quickly go out of date. If a utilities business wants to build new electricity lines, they need the most up-to-date maps to know where buildings are, and their height. If new structures have appeared in formerly empty space, then this could seriously disrupt the plans. Being able to create up-to-date and accurate 3D maps is therefore very valuable.
Another common problem is image noise and distortions. Satellite and aerial images must be orthorectified (the process of correcting images so they appear as if the photo was taken from directly above). But in urban environments, this can be very challenging – sometimes tall buildings obscure lower-level buildings next to them. The ability to identify these sorts of issues – and correcting them – usually requires highly experienced technicians.

A new process for 3D mesh building extraction

Recent advances in deep learning techniques present tantalising possibilities for polygonal extraction of building footprints from imagery. At Luxcarta, we wanted to explore the possibilities for the semantic segmentation from textured 3D meshes.

First, some definitions can be helpful:

Textured 3D mesh : this simply refers to a 3D map of a place, where coloured triangles are placed over surfaces and objects in the image. For example, if you were extracting 3D meshing features on a map of a street, all pixels containing buildings might be shaded blue, all pixels containing roads red, and all pixels containing vegetation green. The aim is to help with recognising features of interest.
Semantic segmentation : is a computer pattern recognition technique. Simplifying somewhat, labels are applied to pixels that contain the same categories of object. Rather than a human searching for the interested objects in the image and applying the corresponding labels to each pixel, deep learning techniques allow a machine to do this smartly, rapidly, and at scale.
Level of detail : This refers to how much detail is included in 3D building images. LoD0 is the lowest level, showing just the area (‘building footprint’) of the structure. LoD1 is a 3D block model, showing a building as a flat roofed cuboid. An LoD2 building is more complex - it typically shows the shape of the roof and may distinguish the walls and roof with different colors. An LoD3 building shows things like windows and doors. And finally an LoD4 building shows indoor elements of the building.

Overview of our 3D mesh extraction process

For a complete description of our method, read the paper which was published in the SPIE journal. But here’s an overview of our new 3D mesh extraction process:

First, we collected 50km2 textured meshes of 22 different cities around the world from various resources
We generated orthorectified images and their corresponding elevation maps from these meshes as our training data. Our experienced technicians manually generate Ground Truth building polygons. They then verify and validate these polygons.
We used the U-Net based CNN architecture for semantic segmentation. An orientation map is also generated as an output layer to facilitate the polygonization. The segmentation quality is improved by leveraging multiple learning tasks our method.
We applied our self-developed geometrical polygonization algorithm for compact LoD1 building vectorization. Our automatic pipeline also generates a high-quality Digital Height Model. This model is created by extracting the Digital Terrain Model. It assigns height values to building polygons.
We then conducted various tests to validate the quality and accuracy of the 3D building models we had generated. We select three typical cities from Brazil, the USA, and France. Each represents a different urban design style. This selection showcases the visual quality and statistical accuracy of our method for extracting 3D meshing features. For detailed technical statistics and comparisons, please refer to our paper: SPIE 2023.

The results were impressive. Our model demonstrated high (90%+) levels of accuracy (precision and recall), automatically identifying large numbers of building structures, their elevations, and footprints in very different urban environments – from suburban US cities, to compact semi-formal structures in Brazil, through to mixed building types in France. Most importantly, the process was significantly faster than manual polygonal extraction of building footprints. We estimate it could deliver a fourfold increase in productivity.

Qualitative evaluation on Rio de Janeiro, Brazil: segmentation

Qualitative evaluation on Rio de Janeiro, Brazil: polygonization

More innovation: Our automatic road network extraction process

Implications for our new model

Our new method for building extraction via semantic segmentation from textured 3D meshes has multiple potential use cases in almost any sector that requires accurate maps of towns and cities. The fact that it offers a much faster and more accurate method of building 3D models of large areas than what has been previously possible is particularly valuable, particularly in challenging areas. Here are just some example use cases:

Deep learning in urban design and architecture: Will a new building be overlooked by neighbours? How will it change the feel of a street? Where will shadows be cast by a new structure? Our 3D models can help architects and urban designers have a much better understanding of the impacts of their structures through accurate, compact, and light building layers rather than heavy meshes or other low-quality products.
Utilities: Our mesh segmentation model can help utilities companies plan pipe networks, electricity cables, and other infrastructure depending on the height and location of buildings.
Urban planning: The ability to quickly generate an accurate model of a town or city allows urban planners to better plan out interventions. From waste collection to cycle paths to the location of EV charging points or flood defences, accurate 3D maps allow them to plan the most effective possible interventions.
Logistics: When planning routes for drivers, cycle couriers, or even drone delivery, an accurate view of building height, location, parking, and space is essential to logistics businesses.
Telecoms: When planning out the location of cell towers, 5G infrastructure, or RF networks, telecoms firms need an extremely detailed view of what buildings are where. Our model helps them to clearly identify line of sight obstacles.

Balancing the need for speed and detail in 3D building images

When creating a surface mesh, mappers need to find a balance between speed and detail. On the one hand, you could have a very basic mesh generation that just shows the flat, 2D shape diameter of buildings (LoD0). But of course, you wouldn’t be able to visualize the height of each of these structures. This would not be so valuable when doing most kinda of detailed planning work.

Alternatively, you could create a very high quality mesh with LoD3 or even LoD4 buildings. These can be used to create ‘digital twins’. This kind of mesh can represent details of the facades, colors of building materials, and designs of structures. But of course, this amount of texturing takes more time and energy (but it’s still perfectly possible).

For our model, we chose LoD2 mesh generation. LoD2 buildings show important details such as the height of individual structures and roof type and shape, but without showing design elements of individual structure. This approach allows us to create 3D meshes quickly, while still providing enough detail to help our clients make decisions.

In our experience, the needs of most kinds of planning maps can be met with LoD2 buildings. Common LoD2 applications include:

Telecoms planning: Using LoD2 buildings shows the height, density and size of buildings and obstacles in the landscape.
Urban planning: A map that uses LoD2 buildings allows urban planners to quickly understand building density, type and height - you don’t always need higher levels of detail to plan services.

Need to map your world faster?

Our new 3D building extraction technique from meshes is robust, reliable, accurate, and fast. We can significantly speed up the mapping of towns and cities, automatically producing LoD2 bulidings. We achieve this by efficiently and effectively applying deep learning techniques. These techniques perform semantic segmentation of textured 3D meshes.

Would you like to rapidly and accurately create 3D maps of your town or city? contact Luxcarta today and learn about our AI-powered 3D meshing features.

View full post