Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Component

Details

Overview

The idea is to implement the following data pipeline:

(1) Scrape data in raw form → (2) Format the data and upload the data to our DB → (3) Add new reviews to frontend.

Stages (1) and (2) should be done via individual script.

Scrape data in raw form

The script(s) here should collect raw data from the following sites,

  • Uninotes / ATAR Notes;

  • StudentVIP / ATAR Notes;

  • Reddit,

and output a raw JSON to a .json file.

Format the data (and thusly upload)

The script(s) here should take in the raw data from the above .json file and format it according to our own schema shown here.

To inject, we can either make requests to the backend server or do it directly.

[Resources]

Links: Uni Notes, Reddit, ATAR Notes

...