Scraper
Overview
Objective: Scrape UNSW course reviews from other websites to be displayed on Unilectives
Milestones
Term 2: Scrapped at least one of Uninotes and ATAR Notes / StudentVIP
Term 3: Reddit Scrapping & Leftover Tasks
Term 2 Schedule
Milestones | Timeline |
---|---|
| Week 3 |
| Week 4-7 |
| Week 8-10 |
Specifications
Component | Details |
---|---|
Overview | The idea is to implement the following data pipeline: (1) Scrape data in raw form → (2) Format the data and upload the data to our DB → (3) Add new reviews to frontend. Stages (1) and (2) should be done via individual script. |
Scrape data in raw form | The script(s) here should collect raw data from the following sites,
and output a raw |
Format the data (and thusly upload) | The script(s) here should take in the raw data from the above To inject, we can either make requests to the backend server or do it directly. |
[Resources] | Links: Uni Notes, Reddit, ATAR Notes |
Discussion Points
Typescript vs Python
Scrape:
Reddit
My Experience
Uninotes:
Create a new repository separate from Unilectives
Pretty big project
Just do an API call to get the pages to be scrapped
Insert Review to Production Ideas
To Do’s as of 8 August 2024
Ticket | Asignee | Status |
---|---|---|
Create ATARNotes Review Prisma Schema | Falco | Done |
Create ATARNotes Routes & Services | Falco | Done |
Create Uninotes & StudentVIP Review Modal | Tom | Done |
Integrate Frontend | Tom | Done |
Insert Scrapped Reviews to Production | Together |
|
Push to Production | Together |
|