Hacker News new | past | comments | ask | show | jobs | submit login

The fact that CAPTCHA sweatshops exist is a testament to it's failure as a protocol, let alone the privacy implications (just run X.exe to continue).



Seems like spammers wouldn't hire humans if they could fully automate it? That's about the best you could do as a defense.


Why do you assume they aren't automating it. The obvious thing if I'm a spammer is to hire humans to solve the problem, collect their output and feed it into my ML training. I now have the same dataset that google is using, for my ML.

Actually I'm not sure I need to go to full ML: after a few rounds I can probably just use image compare (not ML) and just feed humans images that I haven't seen before.

Of course round two of the above is to expand on the above. Doing ML for image recognition isn't hard (other than CPU cost). I can also collect statistics, images humans take longer on I will take longer on as well (I can potentially collect eye movement so I have better data than google here - this can feed into ML). Images that humans are unsure of I will fake unsure of by sometimes clicking sometimes not at similar rates to humans.

I don't know what ML google has that isn't public, but we also don't know what scammers have. Ultimately google needs to expose enough data to scammers (who see more captchas than anyone else by nature of their operations) that their ML algorithms have a large training set. Once a scammer realizes the types of data good is looking for it isn't hard to collect other samples for your private training set. Go outside in any city and you will find stoplights and street signs... you now have a training set of data that isn't googles to test on - you need a few cities and seasons worth of course, but that is an implementation detail.


This is why Google will have to keep changing how the captchas work. Maybe using adversarial examples?


Which is why spammers will have humans in the backroom for the foreseeable future. If google tries something different they go to humans to figure it out, it google keeps doing it they automate it.

The game is more expensive for google because google needs expensive people to create the scheme, they can hire cheap people to figure it out. (if cheap people can't figure it out google has failed) They only need expensive people only if/when they decide to automate the scheme.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: