WiFi CSV data analytics

Project Owner

  • Alistair Cook (Design Manager and Precinct Planning Representative, CIS)
  • Tariq Adnan (ICT, Networking)

Project Credits

  • Jim Cook (ICT, TechLab)
  • Lydia Gu (ICT, TechLab)

Start Date

08 May 2017


Completed 19 Jun 2017

Problem Statement

Measuring class usage with wireless data. TechLab to undertake an analysis of real utilisation of teaching spaces based on the previously recorded Wifi data (Wifi session reports (2015 - Nov 2016) in CSV format).

The idea is to drill down into the session reports data and find access points (APs) in room, check who is connected to those APs and record their data (username, connection time, session duration etc.). Students connecting to APs in classroom could be assumed to be present in the classroom. 

The output of the data analysis should be filterable. For instance, if the user is interested in data for only one specific classroom, the frontend UI should enable the data filtering.

Final Briefs

We leveraged the TechLab resources, MS Azure to create a Virtual Machine (VM) on the cloud to enable the ETL processing on over 100GB via Hadoop streaming (Hadoop multiple-node cluster), and the data storage using Azure blob storage. The dashboard was designed and delivered to the project owners via PowerBI app embedded in Office 365.

A consistent face-to-face meeting discussing approach and regular updates via email helped everyone in the team be linked and committed to this project until it’s done. A documented and reusable dataset had been created to support subsequent data analysis within this project scope, and even extended to support other data analytics projects such as CPC data analytics.

Challenges & Learnings

  • C1. Hadoop streaming jobs (working on near 89 GB data) always failed on TechLab local Linux Hadoop (Single-Node cluster), until we figured out the proper approach to leverage VMs on the cloud via MS Azure (TechLab resources).
  • L1a. A further plan of next step exploring Cloud Computing in TechLab is expected, including the Cloud core infrastructure / architecture, Virtual Machines / Containers / Virtual Networking in Cloud, Cloud data storage, Cloud costs and Cloud privacy.
  • L1b. Cloud Computing and distributed system knowledge have become the prerequisite of Big Data and Machine Learning.

  • C2. Project owners concern about the privacy.

  • L2. Ensure University privacy policy is communicated and understood prior to commencing project.

Languages / Framework

Python, R, Hadoop MapReduce, Azure Blob Storage, PowerBI

Links to Resources

Source Code repository path

BitBucket Repo

Dev deployment path

  1. VM (in the Cloud) MS Azure (TechLab account) HDInsight (terminated after development done)

  2. Data (in the Cloud) MS Azure (TechLab account) data blob storage (Security access only, please refer to the documentation.)