INFORMS Open Forum

2024 INFORMS Data Mining Society Data Challenge

  • 1.  2024 INFORMS Data Mining Society Data Challenge

    Posted 7 hours ago
      |   view attached

    2024 INFORMS Data Mining Society Data Challenge

    Predicting Short Video Popularity for Future Content Success

    1.    Problem:

    In recent years, short videos have emerged as a leading content format across various sectors, experiencing exponential growth. Platforms such as Instagram, YouTube, and Netflix have adapted by introducing their versions of short-form video platforms-Reels, Shorts, and Fast Laughs. Moreover, platforms dedicated to short-form content like TikTok have soared in global popularity. This surge is attributed to the format's convenience, accessibility, and ease of creation. Beyond entertainment, short videos have ventured into marketing, advertising, news production, and education, becoming pivotal in these domains. This shift is expected to foster more creators and expand business opportunities across different sectors. In this evolving media landscape, the ability to understand and predict the popularity of short-form content becomes essential. Accurate popularity prediction models can empower content creators to better tailor their videos, optimize content creation, and enhance viewer engagement. Furthermore, these insights can be instrumental for content creators and marketers to refine video production and advertising strategies, boost brand awareness, and improve conversion rates.

    Currently, predicting the popularity of short videos remains a complex and dynamic challenge, due to the scarcity of public datasets and the early state of research methodologies in this field. To address this challenge, we have curated a dataset of short videos with associated meta-information from the most popular short video platform TikTok. This data challenge invites participants to research and develop innovative predictive models that can assess short video popularity through four key engagement metrics: views, hearts (likes), comments, and shares. By understanding these aspects of user engagement and content reach, we aim to foster further research and development in short video popularity prediction, a crucial factor for strategic decision-making in the competitive online social network environment.

    The task of the contestant team is to use relevant features of short videos (e.g., release dates, numbers of authors' followers) to predict their popularity, measured through four key engagement metrics: views, hearts (likes), comments, and shares.

    2.    Timeline:

    The entire competition will run until September 9th, 2024. During the training phase contestants should use the provided training data to develop their models. The testing phase will be conducted on a holdout dataset where only the competition committee knows the true values. The testing period will be August 26 - Sept 9, and the holdout dataset will be released prior to August 26. The leaderboard, on the Workshop's website, will be updated once per week. Friday Sept 10 will be the final ranking. We will invite the top four competitors to the INFORMS Data Mining Workshop to present their solutions. 

    3.    Data:

    The training data consists of 2,200 TikTok videos.

    Training Data Description

          video_id - unique identification for each video

          author_id - encoded author identification

          author_follower_count - the total number of the author's follows

          author_following_count - total number of accounts the author is following

          author_total_heart_count - total number of hearts across all posted videos

          author_total_video_count - total number of videos created by author

          video_create_date - video creation data in Unix Timestamp since the Unix epoch (January 1, 1970, at 00:00:00 UTC)

          video_description - description of the video

          video_definition - video definition

          video_format - video format

    Four Response Variables

          video_comment_count - total number of comments the video obtained

          video_heart_count - total number of hearts the video obtained

          video_play_count - total number of plays the video obtained

          video_share_count - total number of shares the video obtained

    Test Data

    After creating your prediction algorithm on the training data, the contestant team will predict on the holdout test set consisting of 368 TikTok videos.

    4.    Evaluation Criteria:

          For each of the four response variables, we will rank all the teams by the Mean Absolute Percentage Error (MAPE) on the test data.

          We will then compute the average rank of each team. For instance, if a team ranks 2nd, 1st, 10th, 5th in the four response variables, then the average rank of that team is (2 + 1 + 10 + 5) / 4 = 4.5.

          Finally, all teams are ranked by the aforementioned average rank.

    5.    Submission

          Each team needs to submit their predictions on the test data through this Google form. As we get closer to time, we will provide the testing set as well as a template for predictions.

          We will release the ranking results of the four response variables on August 27th, September 3th, and September 10th. The first two (on Aug 27 and Sep 3) serve to help contestants improve their models. The final ranking is computed based on the result released on Sep 10.

          One member in each team needs to be identified as the primary contact and provide their email in the submission form.

    6.    Written Report

    All teams invited as finalists are required to submit a 5-page maximum report summarizing their methodologies, due on October 6th, 2024 (anywhere on earth). Winners will be chosen based on the presentations and the reports by a panel of judges.

    7.    Prize

    The finalists will be chosen directly based on numerical results using the average ranking across the four response variables. Each finalist team will have one complimentary registration code for the workshop, but will still need to pay for the main INFORMS conference registration (if attending). If you are invited as one of the four finalists, you will receive a monetary prize. The final selection of winners will be based on the quality of the presentation and the written methodology, judged by our panel of judges.

          First Place Prize: 1000 USD

          Second Place Prize: 500 USD

          Third Place Prize: 250 USD

          Fourth Place Prize: 125 USD

    8.    Registration

    Please use the following link to register for the competition HERE. After registration, you will be emailed a link, to access the training data. If you have any questions, just feel free to contact data challenge chairs below. 

    Data Challenge Chairs

    Hieu Pham

    Assistant Professor

    Department of Information Systems, Supply Chain, and Analytics, College of Business

    The University of Alabama in Huntsville

    hieu.pham@uah.edu

    Kaizheng Wang

    Assistant Professor

    Department of Industrial Engineering and Operations Research

    Columbia University

    kw2934@columbia.edu

    Shouyi Wang

    Associate Professor

    Department of Industrial, Manufacturing & Systems Engineering,

    The University of Texas at Arlington

    shouyiw@uta.edu



    ------------------------------
    Shouyi Wang, Ph.D.
    Associate Professor
    Department of Industrial, Manufacturing & Systems Engineering
    Center on Stochastic Modeling, Optimization, and Statistics
    The University of Texas at Arlington
    500 West First Street, Arlington, TX 76019
    Office: Woolf Hall 420H
    Tel: 817-272-2921
    Fax: 817-272-3406
    Email: shouyiw@uta.edu
    Web: https://mentis.uta.edu/explore/profile/shouyi-wang
    ------------------------------