/
Scraper

Scraper

Overview

Objective: Scrape UNSW course reviews from other websites to be displayed on Unilectives

Milestones

  • Term 2: Scrapped at least one of Uninotes and ATAR Notes / StudentVIP

  • Term 3: Reddit Scrapping & Leftover Tasks

Term 2 Schedule

Milestones

Timeline

Milestones

Timeline

  • Get clarity on permission to use Uninotes / ATAR Notes & StudentVIP data

  • Implement a basic scrapper for the websites

Week 3

  • Finalize scraper design

  • Decide on the Review schema/format and update the database accordingly

  • Clean & Format the review data

Week 4-7

  • Create a server program (including APIs) to insert reviews into the database

Week 8-10

Specifications

Component

Details

Component

Details

Overview

The idea is to implement the following data pipeline:

(1) Scrape data in raw form → (2) Format the data and upload the data to our DB → (3) Add new reviews to frontend.

Stages (1) and (2) should be done via individual script.

Scrape data in raw form

The script(s) here should collect raw data from the following sites,

  • Uninotes / ATAR Notes;

  • StudentVIP;

  • Reddit,

and output a raw JSON to a .json file.

Format the data (and thusly upload)

The script(s) here should take in the raw data from the above .json file and format it according to our own schema shown here.

To inject, we can either make requests to the backend server or do it directly.

[Resources]

Links: Uni Notes, Reddit, ATAR Notes

Discussion Points

  • Typescript vs Python

  • Scrape:

    • Reddit

    • My Experience

    • Uninotes:

  • Create a new repository separate from Unilectives

  • Pretty big project

  • Just do an API call to get the pages to be scrapped

Insert Review to Production Ideas

  1.  

To Do’s as of 8 August 2024

Ticket

Asignee

Status

Ticket

Asignee

Status

Create ATARNotes Review Prisma Schema

Falco

Done

Create ATARNotes Routes & Services

Falco

Done

Create Uninotes & StudentVIP Review Modal

Tom

Done

Integrate Frontend

Tom

Done

Insert Scrapped Reviews to Production

Together

 

Push to Production

Together

 

Related content