PCS951 Cloud and Distributed Resources for High Volume Data Processing
Course description for academic year 2024/2025
Contents and structure
The course presents technologies and principles for distributed computing in computer clusters, in grid systems and in the cloud.
The students will do practical work on many of the technologies covered by the course. Examples of technologies include systems for grid computing (e.g. ARC, JAliEn), job schedulers (e.g. Slurm, HTCondor, TORQUE), systems for distributed parallel processing (e.g. MPICH for MPI, Hadoop, Spark), software management in clusters (e.g. Kubernetes, Nomad, Puppet), container technologies (e.g. Podman, Apptainer, Docker) and more.
The course concentrates on challenges related to safe and efficient utilisation of computing resources managed by heterogeneous operators, including protection concerns between the project owner and facility management.
Learning Outcome
Upon completion of the course the candidate should be able to:
Knowledge
- discuss challenges and solutions for high volume data processing.
- explain the philosophy of cloud and grid computing.
- identify tasks well suited for the different distributed computing models.
- assess selected research papers in the field of high volume data processing.
- explain the different cloud service models.
- describe the different hypervisor models used for virtualization.
- explain the MapReduce programming model.
Skills
- define and monitor job management, storage management and security in a grid system.
- design and implement applications of Service Oriented Computing at a global scale.
- design, implement and run applications on a MapReduce framework.
- design, implement and run tasks through a computer clustering management platform.
General competence
- evaluate and apply distributed computing computing resources using textual and graphical interfaces.
- revise application software to make it suitable for distributed computing.
Entry requirements
General admission criteria for the PhD programme.
Recommended previous knowledge
Experience with using a Unix/Linux operating system.
Teaching methods
Lectures, practical work in lab, project work, and presentation of papers and the course project.
The project should include both a theoretic study and a practical problem solution. The theoretic study should be presented as a lecture and the practical solution in a shorter oral presentation. The project should also be documented in a written report, covering both the theoretic study and the practical problem solution.
Compulsory learning activities
All assignments must be completed within the set deadlines and approved before the exam can be taken. Deadlines for the exercises will be published at the beginning of the semester.
Four assignments has the form of lab exercises, with written lab reports that must be handed in through Canvas.
Five assignments has the form of oral presentations, with presentation slides to be handed in through canvas.
- A paper presentation,
- three status reports on the course project,
- presentation of the project results.
One assignment has the form of a lecture where the student must give presentation of the theoretic background of the course project.
An assignment that is not approved can be delivered two more times in the same semester.
In order to take the exam, the deadlines must be respected.
Approved exercises also give access to postponed examination the following two semesters.
Because the technologies used in the course is in constant development, the lifetime of approved assignments is limited to the following two semesters. The learning outcome from the exercises must be up to date with the standards and technologies used in the course.
Assessment
Oral exam (duration: 40 minutes) - 60% of final grade
Assignment (project report) - 40% of final grade
Grading scale: pass / fail.
Both parts must get a passing grade in order to get a final grade for the course. In case one of the parts gets a failing grade, that part can be taken as a re-sit exam.
Examination support material
Oral exam: No examination support materials allowed.
Assignment (project report): All examination support materials allowed.
More about examination support materialCourse reductions
- DAT351 - Skyløsninger og Distribuerte Dataressurser for Høg-Volum Dataprosessering - Reduction: 10 studypoints