2024 INFORMS Data Mining Society Data Challenge
Predicting Short Video Popularity for Future Content Success
1. Problem:
In recent years, short videos have emerged as a leading content format across various sectors, experiencing exponential growth. Platforms such as Instagram, YouTube, and Netflix have adapted by introducing their versions of short-form video platforms-Reels, Shorts, and Fast Laughs. Moreover, platforms dedicated to short-form content like TikTok have soared in global popularity. This surge is attributed to the format's convenience, accessibility, and ease of creation. Beyond entertainment, short videos have ventured into marketing, advertising, news production, and education, becoming pivotal in these domains. This shift is expected to foster more creators and expand business opportunities across different sectors. In this evolving media landscape, the ability to understand and predict the popularity of short-form content becomes essential. Accurate popularity prediction models can empower content creators to better tailor their videos, optimize content creation, and enhance viewer engagement. Furthermore, these insights can be instrumental for content creators and marketers to refine video production and advertising strategies, boost brand awareness, and improve conversion rates.
Currently, predicting the popularity of short videos remains a complex and dynamic challenge, due to the scarcity of public datasets and the early state of research methodologies in this field. To address this challenge, we have curated a dataset of short videos with associated meta-information from the most popular short video platform TikTok. This data challenge invites participants to research and develop innovative predictive models that can assess short video popularity through four key engagement metrics: views, hearts (likes), comments, and shares. By understanding these aspects of user engagement and content reach, we aim to foster further research and development in short video popularity prediction, a crucial factor for strategic decision-making in the competitive online social network environment.
The task of the contestant team is to use relevant features of short videos (e.g., release dates, numbers of authors' followers) to predict their popularity, measured through four key engagement metrics: views, hearts (likes), comments, and shares.
2. Timeline:
The entire competition will run until September 9th, 2024. During the training phase contestants should use the provided training data to develop their models. The testing phase will be conducted on a holdout dataset where only the competition committee knows the true values. The testing period will be August 26 - Sept 9, and the holdout dataset will be released prior to August 26. The leaderboard, on the Workshop's website, will be updated once per week. Friday Sept 10 will be the final ranking. We will invite the top four competitors to the INFORMS Data Mining Workshop to present their solutions.
3. Data:
The training data consists of 2,200 TikTok videos.
Training Data Description
● video_id - unique identification for each video
● author_id - encoded author identification
● author_follower_count - the total number of the author's follows
● author_following_count - total number of accounts the author is following
● author_total_heart_count - total number of hearts across all posted videos
● author_total_video_count - total number of videos created by author
● video_create_date - video creation data in Unix Timestamp since the Unix epoch (January 1, 1970, at 00:00:00 UTC)
● video_description - description of the video
● video_definition - video definition
● video_format - video format
Four Response Variables
● video_comment_count - total number of comments the video obtained
● video_heart_count - total number of hearts the video obtained
● video_play_count - total number of plays the video obtained
● video_share_count - total number of shares the video obtained
Test Data
After creating your prediction algorithm on the training data, the contestant team will predict on the holdout test set consisting of 368 TikTok videos.
4. Evaluation Criteria:
● For each of the four response variables, we will rank all the teams by the Mean Absolute Percentage Error (MAPE) on the test data.
● We will then compute the average rank of each team. For instance, if a team ranks 2nd, 1st, 10th, 5th in the four response variables, then the average rank of that team is (2 + 1 + 10 + 5) / 4 = 4.5.
● Finally, all teams are ranked by the aforementioned average rank.
5. Submission
● Each team needs to submit their predictions on the test data through this Google form. As we get closer to time, we will provide the testing set as well as a template for predictions.
● We will release the ranking results of the four response variables on August 27th, September 3th, and September 10th. The first two (on Aug 27 and Sep 3) serve to help contestants improve their models. The final ranking is computed based on the result released on Sep 10.
● One member in each team needs to be identified as the primary contact and provide their email in the submission form.
6. Written Report
All teams invited as finalists are required to submit a 5-page maximum report summarizing their methodologies, due on October 6th, 2024 (anywhere on earth). Winners will be chosen based on the presentations and the reports by a panel of judges.
7. Prize
The finalists will be chosen directly based on numerical results using the average ranking across the four response variables. Each finalist team will have one complimentary registration code for the workshop, but will still need to pay for the main INFORMS conference registration (if attending). If you are invited as one of the four finalists, you will receive a monetary prize. The final selection of winners will be based on the quality of the presentation and the written methodology, judged by our panel of judges.
● First Place Prize: 1000 USD
● Second Place Prize: 500 USD
● Third Place Prize: 250 USD
● Fourth Place Prize: 125 USD
8. Registration
Please use the following link to register for the competition HERE. After registration, you will be emailed a link, to access the training data. If you have any questions, just feel free to contact data challenge chairs below.
Data Challenge Chairs
Hieu Pham
Assistant Professor
Department of Information Systems, Supply Chain, and Analytics, College of Business
The University of Alabama in Huntsville
hieu.pham@uah.edu
Kaizheng Wang
Assistant Professor
Department of Industrial Engineering and Operations Research
Columbia University
kw2934@columbia.edu
Shouyi Wang
Associate Professor
Department of Industrial, Manufacturing & Systems Engineering,
The University of Texas at Arlington
shouyiw@uta.edu
------------------------------
Shouyi Wang, Ph.D.
Associate Professor
Department of Industrial, Manufacturing & Systems Engineering
Center on Stochastic Modeling, Optimization, and Statistics
The University of Texas at Arlington
500 West First Street, Arlington, TX 76019
Office: Woolf Hall 420H
Tel: 817-272-2921
Fax: 817-272-3406
Email:
shouyiw@uta.eduWeb:
https://mentis.uta.edu/explore/profile/shouyi-wang------------------------------