• About
  • Privacy Policy
  • Disclaimer
  • Contact
Soft Bliss Academy
No Result
View All Result
  • Home
  • Artificial Intelligence
  • Software Development
  • Machine Learning
  • Research & Academia
  • Startups
  • Home
  • Artificial Intelligence
  • Software Development
  • Machine Learning
  • Research & Academia
  • Startups
Soft Bliss Academy
No Result
View All Result
Home Machine Learning

Detecting Anomalies in Idealista’s Data – The Official Blog of BigML.com

softbliss by softbliss
April 6, 2025
in Machine Learning
0
Detecting Anomalies in Idealista’s Data – The Official Blog of BigML.com
0
SHARES
1
VIEWS
Share on FacebookShare on Twitter


At BigML we love data. Lately, Idealista published this blog post describing some analysis of properties located in some cities of Spain. The data was also included, and was dated 2018. As part of our team lives there and summertime instills a playful disposition, we jumped to our platform to play with it a bit and created some anomaly detectors. This post is merely a description of our work and the results we easily found.

Describing the Data

The repository that was referenced in the post contains several data files, but we focused on the ones that contain sale information, like the ID, price, unitary price, number of bedrooms, etc. They refer to properties located in Madrid, Barcelona, and Valencia and their location is one of the available variables. Unfortunately, the data was not in nice plain CSV files, so even though we are totally partial to Python, we were forced to use R to extract them; but that was a minor setback. Once created, the only transformation we did was removing a geolocation field with duplicated information and we were ready to work.

The Work in the Platform

Starting from one of the CSVs, we dived into BigML. First, we uploaded the three files, one per city, by dragging and dropping them and checked the types inferred automatically in the first one. Only a couple of date fields that were written in a customized format needed some attention, so we configured those to be properly parsed. After that, you just create a dataset that summarizes the information and an anomaly detector to assign the anomaly score, a number that ranges from 0 to 1 to indicate totally normal or very anomalous, respectively. All of this is obtained by using 1-clicks in our Dashboard (no code needed!).

Understanding the Anomalies

Each file has its own outstanding anomalies, and every anomaly is considered so because of a different set of reasons. The following image shows a list of the highest anomalies found in the Valencia_Sale.csv file. The example describes the fields that contributed more to the first found anomaly, which are shown in the right column: being a duplex with a north orientation, a doorman, a terrace, and a swimming pool.

That property is not certainly the usual flat that one can find in Valencia. Looking at the rest of the attributes of that property one discovers that is an isolated house with air conditioning, a lift, a box room, and a wardrobe, so it really stands out from the rest of the crammed flats of a dense city. Looking at the remaining top anomalies, all of them refer to duplexes, most of them studios, with lots of commodities, so our anomaly detectors found mainly uncommon luxurious flats or houses.

Anomalies Distribution

We’ve discussed some of the relevant anomalies that we detected in the data and their individual properties, but we know nothing so far as to their distribution of those anomalies. Do they group under some conditions? To analyze that, we simply compute a batch anomaly score in 1-click. That adds a new column to our dataset, containing the anomaly score for each row. Their distribution can then be drawn as a histogram, showing how there’s a small tail of quite anomalous properties for sale.

In all cases, the tail seems to start around 0.6 and those rows with higher values will be the ones that we consider anomalous.

Our Summer App

Following the summer spirit, that inspires us to engage in all sort of projects, we decided to build an app to show up those results. Having the location for those properties, we were curious to know whether these anomalies were distributed evenly throughout the city or, on the contrary, appeared more frequently in some neighborhoods. Geolocation might be helpful, so we just downloaded the batch anomaly score dataset and used Streamlit and Mapbox to create a simple visualization on a map.

And voilà! We see that anomalies appear more frequently in some neighborhoods. For instance, in Barcelona we see them in the upper side town, where luxurious flats and houses were built, or in the sea shore. The latter also happens in Valencia, where we find them in and old poor neighborhood by the sea side that is recently being gentrified. The distribution of anomalies on a map (or even through windows of time) is an interesting indicator of changes and is a meta-anomaly insight by itself. If you are acquainted with any of these cities, you might want to check the live app here.

My Summer Notebook

Analyzing this data has been a refreshing project that took just a small amount of time and led to a nice example of what anomalies information can reveal. In fact, the automation provided by the BigML platform via scriptify helped us to reproduce the process done by point-and-click in the Dashboard on one of the files to the rest. Using that we could repeat it in parallel and at scale for every city. Of course, we need to walk the last mile and bring the information given by the Machine Learning models to the domain environment, in this case the city maps. This integration in the domain of application is sometimes key for the users to see the real power of Machine Learning models… and in this case, it was also fun to do and nice to look at!

Tags: AnomaliesBigML.comBlogDataDetectingIdealistasOfficial
Previous Post

6 ways to make math more accessible for multilingual learners

Next Post

o1’s Thoughts on LNMs and LMMs • AI Blog

softbliss

softbliss

Related Posts

Machine Learning

Beyond Text Compression: Evaluating Tokenizers Across Scales

by softbliss
June 5, 2025
Teaching AI models the broad strokes to sketch more like humans do | MIT News
Machine Learning

Teaching AI models the broad strokes to sketch more like humans do | MIT News

by softbliss
June 4, 2025
NotebookLM introduces public notebooks for sharing
Machine Learning

NotebookLM introduces public notebooks for sharing

by softbliss
June 4, 2025
8 FREE Platforms to Host Machine Learning Models
Machine Learning

8 FREE Platforms to Host Machine Learning Models

by softbliss
June 4, 2025
RLHF 101: A Technical Tutorial on Reinforcement Learning from Human Feedback – Machine Learning Blog | ML@CMU
Machine Learning

RLHF 101: A Technical Tutorial on Reinforcement Learning from Human Feedback – Machine Learning Blog | ML@CMU

by softbliss
June 3, 2025
Next Post
o1’s Thoughts on LNMs and LMMs • AI Blog

o1’s Thoughts on LNMs and LMMs • AI Blog

Premium Content

Guide to Uber’s H3 for Spatial Indexing

April 4, 2025
Authorship in Academic Publishing: Best Practices

Authorship in Academic Publishing: Best Practices

April 25, 2025
Harnessing AI for a Sustainable Earth Day

Harnessing AI for a Sustainable Earth Day

May 15, 2025

Browse by Category

  • Artificial Intelligence
  • Machine Learning
  • Research & Academia
  • Software Development
  • Startups

Browse by Tags

Amazon API App Artificial Blog Build Building Business Data Development Digital Framework Future Gemini Generative Google Guide Impact Intelligence Key Language Large Learning LLM LLMs Machine Microsoft MIT model Models News NVIDIA Official opinion OReilly Research Science Series Software Startup Startups students Tech Tools Video

Soft Bliss Academy

Welcome to SoftBliss Academy, your go-to source for the latest news, insights, and resources on Artificial Intelligence (AI), Software Development, Machine Learning, Startups, and Research & Academia. We are passionate about exploring the ever-evolving world of technology and providing valuable content for developers, AI enthusiasts, entrepreneurs, and anyone interested in the future of innovation.

Categories

  • Artificial Intelligence
  • Machine Learning
  • Research & Academia
  • Software Development
  • Startups

Recent Posts

  • I Tried 10+ No-Code App Builders — Here’s the Step-by-Step Process to Build Your Own App | by Nitin Sharma | The Startup | Jun, 2025
  • Phishing attacks are evolving, but schools can fight back
  • Gemini 2.5’s native audio capabilities

© 2025 https://softblissacademy.online/- All Rights Reserved

No Result
View All Result
  • Home
  • Artificial Intelligence
  • Software Development
  • Machine Learning
  • Research & Academia
  • Startups

© 2025 https://softblissacademy.online/- All Rights Reserved

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?