How We Remove Unauthorized Apartment Listings From Our Data Set at Scale

What is an Unauthorized Apartment Listing?

At HelloData, we track over 25 million apartment units across the U.S., and maintaining the integrity of our data is a top priority. In our ongoing efforts to ensure the highest quality of data for our clients, we have developed a robust system to identify and remove unauthorized rental listings.

Unauthorized listings are those created by brokers who are trying to capture a leasing commission by creating a listing on an ILS that captures interest from prospective renters before they make it to the official listing or the property website. They typically feature the same unit-level data and photographs as listings posted by the property manager, but with a slightly different address, making them difficult to spot at first glance.

When we initially put together our dataset, we identified 129,578 listings that were potentially an unauthorized listing. These listings are particularly challenging to detect because they often appear almost identical to genuine listings, especially those from professionally managed buildings.

To tackle this issue, we employ a combination of advanced technologies.

Real Estate Image Analysis

One of our key tools is image analysis, which plays a crucial role in determining if a listing is unauthorized. Pictures play a major part in describing units and attracting tenants, so we built computer vision algorithms to analyze them: for example they can detect rooms, the level of finishes, the amenities and the presence of watermarks. We've found that unauthorized listings commonly reuse photos from existing listings. Our diverse models help us detect these duplications, even after image cropping, rescaling or light editing (addition of a watermark, for example), playing a vital role in filtering out unreliable information.

Incorrect Address Identification

A significant pattern we've observed is that many unauthorized listings are listed under completely different addresses. These addresses often correspond to unrelated locations like nearby convenience stores or parking garages. This discrepancy, when cross-referenced with our image analysis, has proven to be highly effective in uncovering unauthorized listings.

Price Monitoring Systems

To keep our data as clean as possible, we make sure to remove listings with rents substantially lower than comparable units. These suspiciously low-priced listings are often indicative of unauthorized listings or errors. Regardless of the cause, removing this data is crucial for the accuracy of any real-estate analysis.

Source Prioritization Algorithms

A key factor in distinguishing unauthorized listings is the credibility of the source. We place the highest trust in data from property websites or verified listings, treating them as our primary sources of truth. In the event that we don’t have one of these for a property, we have built a prioritization algorithm to ensure that we always rely on the most trustworthy information to minimize the risk of including unreliable data.

Our commitment to data integrity extends beyond just identifying unauthorized listings. We continuously refine our algorithms and techniques to stay ahead of evolving unauthorized listing tactics. This commitment ensures that our clients can trust the data they receive from us, making informed decisions based on accurate and reliable information.

