Questions and Answers

  • 1. Train symbol

In trn_id, what’s the meaning of train symbol? We found that train symbol is formed with 6 characters, and it seems that it can be separated in 2 parts with 3 letters. For example, DOWULF can be seen as DOU and ULF. Does each of them represent to a train station? If so, does the first part represent the departure station, and the second part represent the destination station? What’s the relation between train symbol and a trip?

       Yes. Train symbol gives abbreviation of a train’s origin and destination (First three letters- origin; last       three letters – destination). A trn_id uniquely identifies a train trip.

  • 2.Train section, train day and train priority 

We found that it ranges from alphabet A to W. Suppose there are 5 trains waiting to depart in a train station. Does it mean the train with former order, say A, will leave earlier than another train with latter order (say B)? Another guess is the order of which a train runs before another train. Suppose two trains both comes to a siding, a train with higher priority will travel precede the other train.

       Although it has name “train priority”, it does not really relates to “priority” of a train.

  • What is the meaning of train section in the column of trn_id? There are 10 types in it with numbers ranging from 0 to 9. Does this number mean, say, time like month? or zone id (e.g., if totally 10 zones on the map)? It might be helpful if we can learn more about what this means.
  • We found the train day (in the column of trn_id) ranges from 1 to 31. Does it mean the departure date in a month?

       The meaning of train section depends on train types. For certain train types, train section means the priority of train within the specific train type. For some other train types, it indicates the counter of number of trains within the year.

       For most train types, train day means the departure day in a month. For some train types, it indicates the counter of number of trains within a year.

      Please refer to the following table:

Train Type

Train Priority

Meaning of Train section

Meaning of Train day
























Departure day of a month


       For example, if a train has train type F then its train section means counter of number of trains regardless of its train priority. If a train has train type J/A/X then when its train priority equals T then train section means counter. For all the other train types except F, H, J, A and X train section means priority within that train type.

     Look at the following example trn_id:

FABCDEF101U Means this train is the 101st F type train within the year having train symbol ABCDEF and train priority U.

  • 3.eqp_axle_nbr

Given the description, we suppose it means the location of a wheel in a car. Suppose a car contains 9 axle, indicating that there are 9 wheels at one side. Assume the closest wheel to the head of the car has number, say 1. Does it mean the wheel with number 9 could be the wheel in the tail of a car?

     There is no head and tail concept of a car. However, the smaller the eqp_axle_nbr is, the closer the axle is to the brake end of the car.


What is an equipment? Does it mean something to be loaded on a car? Or it is just another name for a car (I.e. An equipment = a car)?

      Yes. An equipment = a car.

  • 5.car_initial

For example, what does ‘EQVI’ mean?

      For most cases car_initial indicates car owner. Generally car_initial and car_number together uniquely identifies a car within North America.

  • 6.aar_ct_c

In AAR Car Type Codes Explained & Resources (, is the car description in this page correct for aar_ct_t?

For example, D113 is a locomotive because of the letter D. Then, what does 113 mean?

      You can refer to that website for explanation. The number gives other characteristics of the car, for example, length, height, etc.

  • 7.eqp_grp

We found that there are 9 kinds of alphabets in eqp_grp. We found them corrpesond to BOXC, FLAT, GOND, HOPP, IFLT, MFLT, MISC, TANK, VFLT, as found in Railroad on wiki ( Is this guess for eqp_grp correct? Furthermore, what’s the difference between aar_ct_c and eqp_grp?

       Eqp_grp indicates the logical grouping of car types. For example, BOXC means box cars, FLAT means flat cars. aar_ct_c indicates AAR unique designation of car types. Eqp_grp is a more general categorization compared to aar_ct_c.

  • Missing data

 In training datasets, we found the row 7032197, an observation of uni_whl_id WHLA56870R1-2011-01-  08::2014-01-08, there are 13 variable values missing. It shows “NA” instead. We want to ensure whether these values are indeed missing? Or it is just some mistake caused by the wrong way of file reading?

       There are missing data in the datasets. And you will be expected to clean those missing data yourself.

  • 9. Data for all wheels

Are there sufficient data for all the wheels? For example, suppose a car contains 9 wheels, does the training data contain all 9 wheels?

        Not necessarily. The data is collected based on wheels that have been repaired. And note that if a wheel needs to be repaired, its mate wheel will also be replaced. So you will expect to see the sensor readings for a wheel and its mate wheel as follows (unq_whl_id below is only for demonstration. The specific id might not necessarily be in the training/testing dataset):

EQVI587399L5-2007-06-28::2010-06-28 & EQVI587399R5-2007-06-28::2010-06-28  

  • 10. Sensor health

Can we assume all the sensors work well (I.e., perfect) In other words, we can trust the correctness of sensors so that any strange numbers collected can never be caused by a failed sensor, is this correct?

       Please refer to assumptions (c) and (d) for this question.

  • Data transformation

Do we need to predict the peak kips only for the rows where an L (loaded) follows an E (empty) run? This is mentioned in the document, but the test data contain a mix of L and E type rows.

       Yes, you will need to transform data (both training and testing) and focus only on E-L prediction.

  • 12. Units

  What are the units of the grs_rail_wgt column?       

        It is in pounds.

  • 13. About grs_rail_wgt and tare

Does the gross weight column include the tare weight of the car?

        Yes. Gross weight column represents the tonnage that the particular wheel type can sustain, which includes the tare weight of the car and car loads weight. But note that tare weight of the car in the datasets are in tons.

  • 14.

If we need to predict peak kips in the [next loaded condition], can we assume that we know what the load weight will be? That is, is the peak kip prediction based on [data about last run only], or [data about last run + load in current run]?       

        Yes. You can use car weight, speed, age and detector location (vndr_edr_nme) as predictors.

  • 15.

Do the train priorities and car initials have any particular significance for the speed/load constraints of the train?

        Please refer to Answers for 2 and 5.

  • 16.

For the variable "Kipdays" in the section 3.1 data description, it mentions that kipdays is -999 when the conditions are not met for "KipDays5". What does "KipDays5" mean?

       On this line the format is a bit off. 5 here denotes the footnote number.

  • 17.

We found some values of the variable "edr_eqp_spd" (i.e. the speed of the wheel) are 0. Does it mean that the wheel is measured while the train or the wheel stops exactly over the WILD Detectors?

       Please refer to assumption (c) for this question.

  • 18.

We also found some values of the variable "whl_dyn_kips" are negative. Is that possible? Or it is an error in the measurement?

       This question can be answered by looking at the equation in the box at Page 1. Please note that you are expected to apply data cleaning if needed.

  • 19. Start_date/end_date

(1) What are the meanings of "start_date" and "end_date" in the variable "unq_whl_id"? Does "start_date" mean the first date the wheel was used? Does "end_date" mean the final date that the wheel was discarded? or ?

(2) Do the start-dates/end-dates in the unq_whl_id represent the dates when the wheel was mounted/dismounted? For example, there are several occurences of wheels for which the difference between their end-date and their start-date is exactly 3 years, a suspiciously round number. Is there some sort of expiration date on wheels, which force them to be removed from a car even if they don't appear to have developped any defect? If so, is there a way to know this theoretical maximal lifetime?

       Start_date and end_date indicate when a wheel is attached to and removed from a car, respectively. The difference gives the duration of a specific wheel. For example, the unq_whl_id EQVI587399L5-2007-06-28::2010-06-28 indicates that the wheel was attached to equipment EQVI587399 at fifth axle on the left-side and was used from 2007-06-28 to 2010-06-28.

       3 years is an empirical number if no previous repair date is found. There is no wheel expiration date. You should look at the data to get the expected lifetime of a wheel.

  • 20.

When does one start to record the age of a certain wheel? The time the wheel was made, or the time the car was equipped with that wheel? or ?

      Age = reading date – start date of a unique wheel.

  • 21.

For the variable "rpr_why_cd", we would like to confirm its timing. Does this stand for the code of the most recent (i.e., previous) repair? or the next repair? or the code to record why the wheel can no more work (i.e., the code for its final failure)

      It indicates why a wheel is repaired at the end of the lifetime of the wheel. Take the unq_whl_id EQVI587399L5-2007-06-28::2010-06-28 for example, it indicates the repair reason on 2010-06-28.

  • 22.

Observation number 5 in training data set is EWKKQDP068A. As per the previous explanations provided train type 'E' falls under 'other section'. It means that the train day is the departure day of the month. Then how does 68 represent the departure day? 

      06-train day, 8- train section

  • 23. Average vs dynamic kips

Do the sensors measure these two quantities separately, which then get summed up to compute the peak kips? Or do the sensors measure the peak kips directly, and some formula is applied to derive the average and dynamic kips? In the latter case, what is this formula?

      Please refer to the equation on problem statement page 1 for this question.

  • 24. Criteria for evaluation

In the description of the problem, it is mentionned we must achieve a +/-2 peak kips accuracy for 90% of the observations, and "minimizing" both the alarm type I and type II errors. Is there any value in trying to get the +/-2 peak kips accuracy beyond 90%, or should we focus elsewhere if we achieve this threshold to maximize our chances of winning? In other words, how important is accuracy in the evaluation policy?

      The accuracy is the target and there is definitely value if you try to get an even higher accuracy.

  • 25. What information can be used for making predictions

The goal is to predict the peak kips at every reading k where the status is loaded, and where the status at reading k-1 is empty. But obviously, some information about reading k cannot be used by us for making a prediction at time k: for example, we cannot use both the average and dynamic kips at reading k to predict the peak kips at reading k, since then the problem would be trivial (and unrealistic). In the Q-A pdf you provided, you said: "If we need to predict peak kips in the [next loaded condition], can we assume that we know what the load weight will be? That is, is the peak kip prediction based on [data about last run only], or [data about last run + load in current run]? Yes. You can use car weight, speed, age and detector location (vndr_edr_nme) as predictors."

  1. a) Are these the only features (car weight, speed, age and vndr_edr_nme) from reading k that can be used to predict peak kips at reading k? What about trn_id, vnder_edr_ts_cst, edr_vndr, axle_side, eqp_axle_nbr, eqp_seq_nbr, whl_avg_kips, whl_peak_kips, whl_dyn_kips, whl_dyn_ratio, tare, aar_ct_c, eqp_grp, grs_rail_wgt? Which ones from a reading can be used to predict peak kips at the same reading?

       The predictors I gave are some examples, you should use your own knowledge to figure out what can be put in the predictor list.

  1. b) Moreover, you mention we can use all measurement features (except rpr_why_cd) from reading k-1 to predict peak kips at reading k. Can we also use all features from measurements k-2, k-3, ... 1, if we wish, to perform that prediction at reading k?

      Yes, you can.