Thursday, November 18, 2010

Statistically Improbable Cheating

I recently ran across an article on finding test cheaters with statistics(http://www.telegraph.co.uk/news/newsvideo/weirdnewsvideo/8140456/200-students-admit-cheating-after-professors-online-rant.html), and I remembered an event I had when I was a teacher.  Then, I recovered this from a message on 03/16/2005 when I was a teacher:

I had my students a test recently that contained 65 multiple choice questions. Also, I had a student tell me  my class was easy, so I considered that a challenge to whoop the students with the next test.
So students are taking the test (which I give on computers, and it grades automagically) and the grades are coming back between 40's and 65's.  Then some more classes come in and they start making between 40's and 80's.  So the final average average is about 55.  There is a good equal distribution so I can work with it.


I start figuring out how I am going to scale the test (and in public schools they frown on true Bell Curves.  They are really looking for a line with about 5% failing and 30% A's. ). So I do some computer stuff and figure that I will split the hardest questions from the easiest on a scale with 5 steps.  The hardest are worth 1.8 and the easiest are worth 2.2.  That gives me a good line, and theoretically removes any bonuses for guessing correctly.


So, I get this knocked out. Finally I decided to write a little script to see if anyone is cheating.  I examine everyone's answers, and the odds of two people having 60 of 65 of the same answers on the test is highly unlikely. Seeing how most only got 55% correct, correlations between two different test-takers should not be there.


After I run this script, I get 4 sets of students that have 60 similar answers. 3 of the 4 sets sit right next to each other.  1 of the 4 sets has one student that sits directly behind the other.  There are no other correlations.


Convenient huh?


The odds of two students out 100 that are in of five classes sitting directly next to each other randomly putting the same answers for 60 of 65 four choice multiple questions the exact same is .00064%.


I think it was just brain waves interfering.

No comments: