2017 Problem Solving Announcement

Instructions for the 2017 Problem Solving Competition are now posted.

Data Analytics for Railroad Empty-to-Load Peak Kips Prediction

The INFORMS Railroad Application Section (RAS) is pleased to announce the 2017 edition of the RAS Problem Solving Competition.


This competition is designed to introduce participants to the railroad industry and its wealth of analytics problems. Railroad operations are inherently complex, large scale, and represent a source of many exciting and challenging research application topics for practitioners of operations research and management sciences.


First Prize: $2000
Second Prize: $1000
Third Prize: $750

Each winning team will also receive a certificate declaring their prize winning placement in the contest.

In addition, the first prize winner’s contribution will be considered for publication in Networks. While the paper will still be subjected to the journal’s normal refereeing procedure, the paper will receive an expedited refereeing and publication process. More details about this journal can be found here


Any practitioners of operations research and management science who are interested in solving problems in the railroad domain using OR and analytics tools are welcome to participate.  Registration is open to all with the exception of RAS officers and organizing committee members who are NOT eligible to participate. Likewise, members of the organizing committee may NOT help nor guide any participating team.

Teams of up to three members can participate. At least one member of each prize winning team must be available to present the team’s approach and results at the INFORMS Annual Meeting.


Participation in the contest requires registration by the due date given below. Every team must register by this due date to participate in the contest. To register, please send following information to by the deadline. For each team member, provide the following:

  • Member Name, Email, Organization, Position.
  • Prior Experience in problems related to Railroad analytics (Y/N).
  • Brief statement describing what motivated you to participate.


After submitting your registration email, you will receive an email confirming your team’s successful registration and eligibility.

Important dates

May 20: Challenge problem published

June 01, 2017: registration of participation open

June 30: Registration deadline

July 15: Deadline for submitting a clarification question

September 10, 2017: Deadline for submitting full solution and report

October 10, 2017: Announcement of finalists

October 22, 2017: Finalist presentation and winner announced at the INFORMS Annual Meeting


After the solution submission deadline, completed submissions will be reviewed by the RAS problem competition judging panel. Winners will be selected after collaboration and approval by all panel judges.


All participating teams must submit the following by the due date:

  1. A report not exceeding 10 pages (including cover page), normal margins, double spacing, font size of 10. The report must include title page describing the team members, their affiliations, clean and concise description of problem formulation, solution method, and implementation details (Software/Hardware) and results. 
    2. Problem solution. Make sure you provide the solution in the prescribed format. 
    3. Computer program/model.


The judging panel will select finalists based on the following criteria:  

  1. Novelty and elegance of the model proposed   
    2. Solution quality
    3. Solution approach 
    4. Computation time/complexity
    5. Quality of the report submitted


The finalists will make a presentation at the INFORMS Annual Meeting. Aside from the previous factors, the judging panel will take into consideration the clarity of the presentation to make a final decision about the first, second and third places for the competition. Note that being among the finalists and presenting at the Annual Meeting does not guarantee a finalist will receive first, second or third place. The decision of the judges is final. 


E-mail all files described to specifying your team’s name in the e-mail’s subject. Attach only these files: (i) the report, (ii) the solution files and (iii) a zip file (.zip) containing all supporting (source) files.


Can I Publish?

Yes, you can. In fact, RAS encourages you to do so. Anyone can use the RAS competition problem and provided datasets in their publication. References to year-specific problem competitions are given in the URL, and as such you can reference the year-specific competition URL which will not be changed.


Questions and Answers

If you have any questions about the competition problem, submit your question to In fairness to all participants, RAS will publish a Q&A on website. All participants will receive Q&A document through email as well. If there are any more questions, similar process will be repeated. The questions may be collected and answered once a week. No questions will be entertained after July 15, 2017.


If you have any suggestions or ideas for future competitions, please feel free to contact April Kuo ( or Yanfeng Ouyang ( All feedback is greatly appreciated.


2017 Organizing Committee

Clark Cheng, Norfolk Southern

Pooja Dewan, BNSF

Jerry Kam, CSX

Xiaopeng Li, University of South Florida

Yanfeng Ouyang, University of Illinois at Urbana-Champaign

Steven Tyber, GE Transportation

Shanshan Wang, BNSF

Aihong Wen, CSX

Questions and Answers

  • 1. Train symbol

In trn_id, what’s the meaning of train symbol? We found that train symbol is formed with 6 characters, and it seems that it can be separated in 2 parts with 3 letters. For example, DOWULF can be seen as DOU and ULF. Does each of them represent to a train station? If so, does the first part represent the departure station, and the second part represent the destination station? What’s the relation between train symbol and a trip?

       Yes. Train symbol gives abbreviation of a train’s origin and destination (First three letters- origin; last       three letters – destination). A trn_id uniquely identifies a train trip.

  • 2.Train section, train day and train priority 

We found that it ranges from alphabet A to W. Suppose there are 5 trains waiting to depart in a train station. Does it mean the train with former order, say A, will leave earlier than another train with latter order (say B)? Another guess is the order of which a train runs before another train. Suppose two trains both comes to a siding, a train with higher priority will travel precede the other train.

       Although it has name “train priority”, it does not really relates to “priority” of a train.

  • What is the meaning of train section in the column of trn_id? There are 10 types in it with numbers ranging from 0 to 9. Does this number mean, say, time like month? or zone id (e.g., if totally 10 zones on the map)? It might be helpful if we can learn more about what this means.
  • We found the train day (in the column of trn_id) ranges from 1 to 31. Does it mean the departure date in a month?

       The meaning of train section depends on train types. For certain train types, train section means the priority of train within the specific train type. For some other train types, it indicates the counter of number of trains within the year.

       For most train types, train day means the departure day in a month. For some train types, it indicates the counter of number of trains within a year.

      Please refer to the following table:

Train Type

Train Priority

Meaning of Train section

Meaning of Train day
























Departure day of a month


       For example, if a train has train type F then its train section means counter of number of trains regardless of its train priority. If a train has train type J/A/X then when its train priority equals T then train section means counter. For all the other train types except F, H, J, A and X train section means priority within that train type.

     Look at the following example trn_id:

FABCDEF101U Means this train is the 101st F type train within the year having train symbol ABCDEF and train priority U.

  • 3.eqp_axle_nbr

Given the description, we suppose it means the location of a wheel in a car. Suppose a car contains 9 axle, indicating that there are 9 wheels at one side. Assume the closest wheel to the head of the car has number, say 1. Does it mean the wheel with number 9 could be the wheel in the tail of a car?

     There is no head and tail concept of a car. However, the smaller the eqp_axle_nbr is, the closer the axle is to the brake end of the car.


What is an equipment? Does it mean something to be loaded on a car? Or it is just another name for a car (I.e. An equipment = a car)?

      Yes. An equipment = a car.

  • 5.car_initial

For example, what does ‘EQVI’ mean?

      For most cases car_initial indicates car owner. Generally car_initial and car_number together uniquely identifies a car within North America.

  • 6.aar_ct_c

In AAR Car Type Codes Explained & Resources (, is the car description in this page correct for aar_ct_t?

For example, D113 is a locomotive because of the letter D. Then, what does 113 mean?

      You can refer to that website for explanation. The number gives other characteristics of the car, for example, length, height, etc.

  • 7.eqp_grp

We found that there are 9 kinds of alphabets in eqp_grp. We found them corrpesond to BOXC, FLAT, GOND, HOPP, IFLT, MFLT, MISC, TANK, VFLT, as found in Railroad on wiki ( Is this guess for eqp_grp correct? Furthermore, what’s the difference between aar_ct_c and eqp_grp?

       Eqp_grp indicates the logical grouping of car types. For example, BOXC means box cars, FLAT means flat cars. aar_ct_c indicates AAR unique designation of car types. Eqp_grp is a more general categorization compared to aar_ct_c.

  • Missing data

 In training datasets, we found the row 7032197, an observation of uni_whl_id WHLA56870R1-2011-01-  08::2014-01-08, there are 13 variable values missing. It shows “NA” instead. We want to ensure whether these values are indeed missing? Or it is just some mistake caused by the wrong way of file reading?

       There are missing data in the datasets. And you will be expected to clean those missing data yourself.

  • 9. Data for all wheels

Are there sufficient data for all the wheels? For example, suppose a car contains 9 wheels, does the training data contain all 9 wheels?

        Not necessarily. The data is collected based on wheels that have been repaired. And note that if a wheel needs to be repaired, its mate wheel will also be replaced. So you will expect to see the sensor readings for a wheel and its mate wheel as follows (unq_whl_id below is only for demonstration. The specific id might not necessarily be in the training/testing dataset):

EQVI587399L5-2007-06-28::2010-06-28 & EQVI587399R5-2007-06-28::2010-06-28  

  • 10. Sensor health

Can we assume all the sensors work well (I.e., perfect) In other words, we can trust the correctness of sensors so that any strange numbers collected can never be caused by a failed sensor, is this correct?

       Please refer to assumptions (c) and (d) for this question.

  • Data transformation

Do we need to predict the peak kips only for the rows where an L (loaded) follows an E (empty) run? This is mentioned in the document, but the test data contain a mix of L and E type rows.

       Yes, you will need to transform data (both training and testing) and focus only on E-L prediction.

  • 12. Units

  What are the units of the grs_rail_wgt column?       

        It is in pounds.

  • 13. About grs_rail_wgt and tare

Does the gross weight column include the tare weight of the car?

        Yes. Gross weight column represents the tonnage that the particular wheel type can sustain, which includes the tare weight of the car and car loads weight. But note that tare weight of the car in the datasets are in tons.

  • 14.

If we need to predict peak kips in the [next loaded condition], can we assume that we know what the load weight will be? That is, is the peak kip prediction based on [data about last run only], or [data about last run + load in current run]?       

        Yes. You can use car weight, speed, age and detector location (vndr_edr_nme) as predictors.

  • 15.

Do the train priorities and car initials have any particular significance for the speed/load constraints of the train?

        Please refer to Answers for 2 and 5.

  • 16.

For the variable "Kipdays" in the section 3.1 data description, it mentions that kipdays is -999 when the conditions are not met for "KipDays5". What does "KipDays5" mean?

       On this line the format is a bit off. 5 here denotes the footnote number.

  • 17.

We found some values of the variable "edr_eqp_spd" (i.e. the speed of the wheel) are 0. Does it mean that the wheel is measured while the train or the wheel stops exactly over the WILD Detectors?

       Please refer to assumption (c) for this question.

  • 18.

We also found some values of the variable "whl_dyn_kips" are negative. Is that possible? Or it is an error in the measurement?

       This question can be answered by looking at the equation in the box at Page 1. Please note that you are expected to apply data cleaning if needed.

  • 19. Start_date/end_date

(1) What are the meanings of "start_date" and "end_date" in the variable "unq_whl_id"? Does "start_date" mean the first date the wheel was used? Does "end_date" mean the final date that the wheel was discarded? or ?

(2) Do the start-dates/end-dates in the unq_whl_id represent the dates when the wheel was mounted/dismounted? For example, there are several occurences of wheels for which the difference between their end-date and their start-date is exactly 3 years, a suspiciously round number. Is there some sort of expiration date on wheels, which force them to be removed from a car even if they don't appear to have developped any defect? If so, is there a way to know this theoretical maximal lifetime?

       Start_date and end_date indicate when a wheel is attached to and removed from a car, respectively. The difference gives the duration of a specific wheel. For example, the unq_whl_id EQVI587399L5-2007-06-28::2010-06-28 indicates that the wheel was attached to equipment EQVI587399 at fifth axle on the left-side and was used from 2007-06-28 to 2010-06-28.

       3 years is an empirical number if no previous repair date is found. There is no wheel expiration date. You should look at the data to get the expected lifetime of a wheel.

  • 20.

When does one start to record the age of a certain wheel? The time the wheel was made, or the time the car was equipped with that wheel? or ?

      Age = reading date – start date of a unique wheel.

  • 21.

For the variable "rpr_why_cd", we would like to confirm its timing. Does this stand for the code of the most recent (i.e., previous) repair? or the next repair? or the code to record why the wheel can no more work (i.e., the code for its final failure)

      It indicates why a wheel is repaired at the end of the lifetime of the wheel. Take the unq_whl_id EQVI587399L5-2007-06-28::2010-06-28 for example, it indicates the repair reason on 2010-06-28.

  • 22.

Observation number 5 in training data set is EWKKQDP068A. As per the previous explanations provided train type 'E' falls under 'other section'. It means that the train day is the departure day of the month. Then how does 68 represent the departure day? 

      06-train day, 8- train section

  • 23. Average vs dynamic kips

Do the sensors measure these two quantities separately, which then get summed up to compute the peak kips? Or do the sensors measure the peak kips directly, and some formula is applied to derive the average and dynamic kips? In the latter case, what is this formula?

      Please refer to the equation on problem statement page 1 for this question.

  • 24. Criteria for evaluation

In the description of the problem, it is mentionned we must achieve a +/-2 peak kips accuracy for 90% of the observations, and "minimizing" both the alarm type I and type II errors. Is there any value in trying to get the +/-2 peak kips accuracy beyond 90%, or should we focus elsewhere if we achieve this threshold to maximize our chances of winning? In other words, how important is accuracy in the evaluation policy?

      The accuracy is the target and there is definitely value if you try to get an even higher accuracy.

  • 25. What information can be used for making predictions

The goal is to predict the peak kips at every reading k where the status is loaded, and where the status at reading k-1 is empty. But obviously, some information about reading k cannot be used by us for making a prediction at time k: for example, we cannot use both the average and dynamic kips at reading k to predict the peak kips at reading k, since then the problem would be trivial (and unrealistic). In the Q-A pdf you provided, you said: "If we need to predict peak kips in the [next loaded condition], can we assume that we know what the load weight will be? That is, is the peak kip prediction based on [data about last run only], or [data about last run + load in current run]? Yes. You can use car weight, speed, age and detector location (vndr_edr_nme) as predictors."

  1. a) Are these the only features (car weight, speed, age and vndr_edr_nme) from reading k that can be used to predict peak kips at reading k? What about trn_id, vnder_edr_ts_cst, edr_vndr, axle_side, eqp_axle_nbr, eqp_seq_nbr, whl_avg_kips, whl_peak_kips, whl_dyn_kips, whl_dyn_ratio, tare, aar_ct_c, eqp_grp, grs_rail_wgt? Which ones from a reading can be used to predict peak kips at the same reading?

       The predictors I gave are some examples, you should use your own knowledge to figure out what can be put in the predictor list.

  1. b) Moreover, you mention we can use all measurement features (except rpr_why_cd) from reading k-1 to predict peak kips at reading k. Can we also use all features from measurements k-2, k-3, ... 1, if we wish, to perform that prediction at reading k?

      Yes, you can.