Skip to main content

1: Big Data Basics

1. Big Data Use Cases

Task: Research and present real-world examples of Big Data applications across different industries (e.g., healthcare, finance, retail, social media).

  • Deliverables: A presentation that highlights at least 3 use cases of Big Data in different industries. Include the problems Big Data helped solve, the type of data involved, and the impact on decision-making or performance.

2. Big Data vs. Traditional Data

Task: Investigate and compare Big Data with traditional databases and data processing methods.

  • Deliverables: A comparison chart or infographic showing the differences between Big Data and traditional data systems. Focus on factors like volume, velocity, variety, and the technologies used.

3. The 5 Vs of Big Data

Task: Explore and explain the 5 Vs of Big Data: Volume, Velocity, Variety, Veracity, and Value.

  • Deliverables: Create a visual presentation or poster that defines and gives real-world examples of each V. Discuss how each characteristic affects how data is processed and stored.

4. Data Privacy and Ethics in Big Data

Task: Research the ethical challenges and privacy concerns related to Big Data.

  • Deliverables: A group debate or panel discussion, with each team member taking a different stance (e.g., the company’s perspective, the consumer’s perspective, the regulator’s perspective). Conclude with a summary of the most critical privacy challenges and proposed solutions.

5. Horizontal vs. Vertical Scaling

Task: Investigate and compare horizontal scaling (adding more servers) and vertical scaling (adding more power to one server).

  • Deliverables: A side-by-side analysis (e.g., slide deck or diagram) outlining the pros and cons of each approach. Include real-world examples of companies that use horizontal or vertical scaling.

6. Batch vs. Stream Processing

Task: Compare batch processing and stream processing and identify situations where each method is more effective.

  • Deliverables: Create a decision tree or flowchart that helps companies decide when to use batch vs. stream processing, based on factors like data size, processing time, and use case scenarios.

7. Overview of Distributed Processing Systems

Task: Research distributed processing and identify key technologies and frameworks like Hadoop, Spark, and Kafka.

  • Deliverables: An overview of each framework, its main purpose, and examples of companies or projects that use it. Present findings in a group discussion or as an interactive Q&A with the class.

8. Big Data Architecture

Task: Research different types of Big Data architectures (e.g., Lambda, Kappa, Data Lakes).

  • Deliverables: A visual diagram of each architecture, including a brief description of how data flows through the system. Discuss the strengths and weaknesses of each in terms of scalability, complexity, and use cases.

9. Big Data Frameworks Showdown

Task: Each group chooses a Big Data framework (Hadoop, Spark, Flink, Kafka, etc.) and dives deep into its features, uses, and limitations.

  • Deliverables: A mini "framework showdown" where each group member pitches their chosen technology to the class, explaining why it’s the best tool for specific Big Data tasks.

Task: Research current developments and the future of Big Data, focusing on emerging technologies, trends like edge computing, and the integration of AI and machine learning.

  • Deliverables: A group presentation highlighting the most exciting innovations in Big Data. Include predictions for the next 5-10 years, backed up by research from recent articles, blogs, and industry reports.

Bonus Assignment: Big Data in Everyday Life

Task: Explore how Big Data is used in daily life, from social media to online shopping, and how it affects consumer behavior.

  • Deliverables: A creative presentation or video illustrating the journey of a data point, from its creation (e.g., when you like a post on Instagram) to how it’s processed, analyzed, and used to influence decisions or recommendations.