Clients of 2captcha service are faced with wide variety of tasks, from parsing data from small sites to collecting large amounts of information from large resources or search engines. To automate and simplify this work, there are a large number of services integrated with 2captcha, but it is not so easy to understand this variety and pick optimal solution for specific task.
With the help of our customers we have studied popular services for data parsing and compiled for you the top 10 most convenient and flexible of them. Since this list includes wide range of solutions from open source projects to hosted SAAS solutions to desktop software, there is sure to be something for everyone looking to make use of web data!and you can download them to your computer or local environment.
We have added ReCaptcha v3 support almost 2 years ago. ReCaptcha v3 evolves all this time so as our bypassing algorythms. Now we have changed it again and want to tell you more about it.
1. It works good,
2. You pay only for valid tokens which worked well for you,
3. It works good but initially you have to send from 500 to 2000 captchas before it start working.
4. You have to send reportgood\reportbad requests.
First we'd like to remind you that ReCaptcha V3 is not a captcha as we know it. It's a human/robot probability estimate, based on IP address, user browsing history and his behaviour on a website. ReCaptcha V3 can't know if you are a robot or a human, but it could determine a probability of being a human with certain imprecision. Interesting thing about ReCaptcha V3 is it could give a different score for a given browser with one set of cookies from the same IP address for different pages of a given website. And of course the score will be different for different websites. Unfortunately for two years fail to find a way to determine what score will receive a given PC for a certain website. Last time we were using a probability model, which works as "if one recognition from this PC worked with 5 websites and didn't work with 6th website, then most likely it will work for 7th". But now we decided to discontinue it because sometimes it wasn't working at all. However, if one recognition of certain PC worked with some website, then likely this PC's recognitions will work 5-50 times more.
What we did now
We have implemented AllowList and BlockList of workers' PCs for every customer. Initially we send your requests to different PC of our workers. But then when you send us reportgood and reportbad requests, we sort our workers amont this lists based on your requests. When your AllowList contains at least 50 online PCs, half of your requests is sent to these workers. When this list has 500 online PCs, all next requests is distibuted only among these in your AllowList. If your captcha request wasn't solved, we return it cost to your balance, as always. If you send us reportbad request, we now return this captcha cost to your balance, as we did before with certain limitations.
What you need to do
Make sure you are sending reportgood and reportbad requests. Because if you do then you will receive more valid tokens and you will also get refunds when you send reportbad requests.
How payment works
We charge you, as usual, every time you send us a captcha.
Refund will be issued if a token didn't work and you send us reportbad request.
Refund also will be issued if we didn't came up with a resolution and you receive ERROR_CAPTCHA_UNSOLVABLE.
Does it work? Any proofs?
We have customers who need to solve v3 on a websites with incorrect configured recaptcha. It takes half of real visitors as robots. Only 4.5% of our workers provide valid resolutions for these websites. So when a customer starts to send us requests he is receiving only 4.5% of valid tokens. But eventually after about 1500 requests he is getting almost 50% of valid tokens. And after 10000 requests the percentage of valid tokens hits 80%! As for other websites with properly configured v3, you may count on 90% valid tokens from the beginning, 95% after one hundred requests and almost 99% after 1000 captchas.
What if I send requests for different websites. Can I have different lists?
Unfortunately, for now you can only have one pair of access lists. But you may have more than one account and use different 2captcha accounts for different v3 websites.
Why can't you do a pair of lists for every domain instead?
We can't because if one customer make mistake while sending reportgood/reportbad requests he will affect other customers solving captcha on the same host.
Do I need to indicate score when sending request to API?
Yes. We have different algorythms implemented for different score ranges. For example, there is no difference between 0.4 and 0.9. But 0.3 and 0.4 has difference.
How long will you keep the data?
Two days only. We could expand this later, but for now it is two days only.
What if I put all workers in a BlockList?
Until now noone was able do that. Not even 10% of our workers. But if you are working on it, you are obviously doing something wrong.
I have a question!
If you have questions regarding bypassing ReCaptcha v3, you are advised to address it on our forum here or reach our Support team.
Update regarding solving ReCaptcha on google.com
This article is an update to this one.
Today we noticed two issues causing problems with solving ReCaptcha on google.com. 1. A lot of unsolved captchas. Unfortunately, if some worker could not solve it because of bad proxy or could not load a captcha for some reason, this captcha will never be solved. That's because we can't pass it to another worker as this solution 100% will not work.
2. Low percentage of valid tokens (from 40 to 60%)
Issue number one has no solution for now. But we always return your funds if we can't solve a captcha for you.
Second issue fortunately has a solution. What we found today:
1. Proxy is not nessesary, but percentage of good tokens is higher with proxy.
2. You are advised to send us your cookies from google.com. But you should not use a cookies of our workers.
If you send us proxy and your cookies, the percentage of valid tokens will be 100%!
So, what is new for today.
Add a "cookies" parameter to your request to in.php. ":" separates cookie name from it's body and ";" separates different cookies like this:
Add "proxy" parameter in following format:
login:email@example.com:3128 and "proxytype" parameter indicating type of your proxy: HTTP, HTTPS, SOCKS4, SOCKS5.
proxy=login:firstname.lastname@example.org:3128 proxytype=HTTP If your proxy has access control, add our IP address 126.96.36.199 as allowed.
1. How you get google.com cookies if you parse google.sm and has no google.com cookies? How could you get cookies from google.com if you are solving on another website and parser never goes to google.com before captcha arrives? For example, you are parsing www.google.sm and you don't have google.com cookies. All you need is to open google.com and save it's cookies. Then, when you parse on google.sm and receives a captcha, send it to us. 2. If a token doesn't work for you or if we didn't solve a captcha for you, you can't just try to solve it again. Instead, you must go back and get a new captcha from search. If you don't do that and will try to send same captcha again, your IP address will be banned by Google.
By the way, you can see our investigation live on our forum post at captchaforum.com.
Solving captcha for google.com search is working now. You have add an additional parameter "data-s" to your request to our API. You can find it's value in "data-s" variable on a page you want to bypass a captcha challenge on. More info below.
Starting from 18 of May sometimes ReCaptcha was not working from the first submission on google.com/sorry/index webpage. The problem wasn't obvious and was taken as an accident. Nevertheless, the percentage of such cases was rising and reached almost 100% at weekend.
Unfortunately, we thought it was a result of errors with google.com itself at that time. Even the manual solving in browser sometimes didn't work from the first time and resulted in recurrent captcha challenge. Probably there really was an error in google.com, you can check video attached to this post. Now it is fixed by google so you will be able to bypass a captcha challenge from the first time in browser. But it is not working via 2captcha.
Meanwhile on 25th of May we started to understand the scale of the problem as we observe it. Almost all services for parsing products and semantic analysis stopped working so we started our investigation.
What to do now to solve captcha on google.com/sorry/index
1. Find a value of "data-s" variable on google.com/sorry/index page and send it as a value of "data-s" parameter in your request to in.php. Note that this value will be new every time you solve a captcha.
2. You should NOT run these JS which loads the captcha on google.com/sorry/index. If you load the captcha on the webpage and then send us this captcha for solving, a token we provide with a solution then will NOT work. That is because of this new data-s variable. A captcha could be loaded only once with given data-s.
What else new about this new captcha
- Google released undocumented functionality. There is nothing regarding "data-s" in ReCaptcha V2 documentation.
- "Data-s" value is unique and ReCaptcha controls that each data-s is accessed only once. So if you open a webpage and captcha was loaded, then we will not be able to provide a working solution for this captcha. So you have to parse a HTML code of the page for "data-s" value and send it to us.
- There are few more tricks implemented by ReCaptcha along with "data-s" requirement. Now they control the integrity of data on a page with captcha. It takes us a lot of time to figure out how to deal with these tricks. We had to implement the way we transfer captcha to workers from scratch. But you should not worry about that, everything is done at our end.
- Proxy, UserAgent and Cookies doesn't affect the solving process. A solution will work from your IP address with your UserAgent and cookies, while it was solved with worker's IP address from anywhere, UserAgent set to Mozilla Firefox and with worker's cookies. But if you'd like to send us your Proxy and UserAgent, we will use it. More details on that in our API documentation.
We'd like to say thanks to A-Parser. They gave us a hint on a right direction and helped with testing. We belive A-Parser is most powerful search engines parser at the moment. They could even parse Amazon and put results in SQL database as they have a server version which supports unlimited threads (it is only limited by your server's performance).