Jump to content

PCS951 Cloud and Distributed Resources for High Volume Data Processing

Course description for academic year 2024/2025

Contents and structure

The course presents technologies and principles for distributed computing in computer clusters, in grid systems and in the cloud.

The students will do practical work on many of the technologies covered by the course. Examples of technologies include systems for grid computing (e.g. ARC, JAliEn), job schedulers (e.g. Slurm, HTCondor, TORQUE), systems for distributed parallel processing (e.g. MPICH for MPI, Hadoop, Spark), software management in clusters (e.g. Kubernetes, Nomad, Puppet), container technologies (e.g. Podman, Apptainer, Docker) and more.

The course concentrates on challenges related to safe and efficient utilisation of computing resources managed by heterogeneous operators, including protection concerns between the project owner and facility management.

Learning Outcome

Upon completion of the course the candidate should be able to:

Knowledge

  • discuss challenges and solutions for high volume data processing.
  • explain the philosophy of cloud and grid computing.
  • identify tasks well suited for the different distributed computing models.
  • assess selected research papers in the field of high volume data processing.
  • explain the different cloud service models.
  • describe the different hypervisor models used for virtualization.
  • explain the MapReduce programming model.

Skills

  • define and monitor job management, storage management and security in a grid system.
  • design and implement applications of Service Oriented Computing at a global scale.
  • design, implement and run applications on a MapReduce framework.
  • design, implement and run tasks through a computer clustering management platform.

General competence

  • evaluate and apply distributed computing computing resources using textual and graphical interfaces.
  • revise application software to make it suitable for distributed computing.

Entry requirements

General admission criteria for the PhD programme.

Recommended previous knowledge

Experience with using a Unix/Linux operating system.

Teaching methods

Lectures, practical work in lab, project work, and presentation of papers and the course project.

The project should include both a theoretic study and a practical problem solution. The theoretic study should be presented as a lecture and the practical solution in a shorter oral presentation. The project should also be documented in a written report, covering both the theoretic study and the practical problem solution.

Compulsory learning activities

All assignments must be completed within the set deadlines and approved before the exam can be taken. Deadlines for the exercises will be published at the beginning of the semester.

Four assignments has the form of lab exercises, with written lab reports that must be handed in through Canvas.

Five assignments has the form of oral presentations, with presentation slides to be handed in through canvas.

  • A paper presentation,
  • three status reports on the course project,
  • presentation of the project results.

One assignment has the form of a lecture where the student must give presentation of the theoretic background of the course project.

An assignment that is not approved can be delivered two more times in the same semester.

In order to take the exam, the deadlines must be respected.

Approved exercises also give access to postponed examination the following two semesters.

Because the technologies used in the course is in constant development, the lifetime of approved assignments is limited to the following two semesters. The learning outcome from the exercises must be up to date with the standards and technologies used in the course.

Assessment

Oral exam (duration: 40 minutes) - 60% of final grade

Assignment (project report) - 40% of final grade

Grading scale: pass / fail.

Both parts must get a passing grade in order to get a final grade for the course. In case one of the parts gets a failing grade, that part can be taken as a re-sit exam.

Examination support material

Oral exam: No examination support materials allowed.

Assignment (project report): All examination support materials allowed.

More about examination support material

Course reductions

  • DAT351 - Skyløsninger og Distribuerte Dataressurser for Høg-Volum Dataprosessering - Reduction: 10 studypoints