These are my two course projects in Practical Data Science, CMU.
In the mid-term project, I made a tutorial to introduce some basic methods for doing learning analytics, particularly focusing on using different sources of data about students to predict their outcomes and find at-risk students.
In the final project, my teammates and I scraped and analyzed review texts and relative feature data from the school page to answer the core question: what makes a good public high school in NYC area?
Check out data set details at >>
Open University Learning Analytics Dataset (OULAD) | Niche Website
Showed how to do build at-risk student prediction models in Python, specifically using pandas, numpy, sklearn, and matplotlib libraries.
Specifically, I firstly used student’s demographic data to build a basic prediction model. Then I used assessment data and VLE (Virtual Learning Environment) data to build week-w prediction models within a loop to achieve the Cumulative effect of each week-w prediction models. Results showed that maximum Recall of models achieved at around the mid-term of the course time.
Led the team to clarify three research questions we want to explore: (1) What do parents/students usually focus on when they write reviews about high schools? (2) How are public high schools in New York distributed based on different factors? (3) What factors might influence the average graduation rate of public high schools in NY?
Took responsibility for doing data crawling and cleaning using Selenium, preparing data for the later analyses.