The databases available online from the Federal Railroad Administration are dirty. There is the Railroad Equipment Accident database and the Grade Crossing accident database. The database contains accident reports and they have missing, incorrectly coded, misplaced entries, and incorrect latitude and longitude.
The website is at FRA-Homepage
I hope this is useful/
------------------------------
Trefor Williams
Professor of Civil Engineering
Rutgers University
Princeton NJ
Original Message:
Sent: 12-29-2016 10:38
From: William Christian
Subject: Looking for dirty data
In my work, we come across lots of dirty data (alove proprietary and unshareable) and the challenge is always determining truth. The solution often involves generating some simulated data and trying to accurately clean it up.
So have you considered taking a clean dataset and "dirtying" it? If you dirty up the data on your own, you can test the accuracy of different techniques against various forms of dirt (e.g., missing data, misspelled entries, swapped entries, erroneous entries--intentional and unintentional).
That being said, if you find some, do share with the group.
------------------------------
William Christian
Severn MD
Original Message:
Sent: 12-28-2016 09:57
From: Jerry Flatto
Subject: Looking for dirty data
I have an unusual question for folks. Does anyone have any good sources for semi-dirty or dirty data for class projects?
I am teaching a class in unstructured data analysis for undergraduates. The first part of the class will look at cleaning data using OpenRefine. Most of the data sets that I have are relatively clean and the data sets available online also seem to be relatively clean. Thus counter to normal requests, I am looking for data sets that need to be cleaned that I can give as class projects for students to work on. While many organizations have dirty data, I found it difficult to get them to share any of the data for proprietary reasons.
Any ideas are appreciated. You can reply to this posting or email me directly at jflatto@uindy.edu.
Jerry Flatto
"No trees were harmed in the sending of this message; however, a large number
of electrons were slightly inconvenienced..."
Dr. Jerry Flatto, Professor, Information Systems Department - School of Business
University of Indianapolis, Indianapolis, Indiana, USA mailto:jflatto@uindy.edu
Confidentiality Notice: This communication and/or its content are for the sole use of the intended recipient, and may be privileged, confidential, or otherwise protected from disclosure by law. If you are not the intended recipient, please notify the sender and then delete all copies of it. Unless you are the intended recipient, your use or dissemination of the information contained in this communication may be illegal.