Post Reply 
Google ads why?
Jun. 02, 2012, 09:39 PM (This post was last modified: Jun. 05, 2012 01:18 AM by ProxRocks.)
Post: #46
RE: Google ads why?
Quote:Are you sure that this link was created by a script in the browser. I think it was not matched because the # and the data that followed was not sent to the server.

Yes, I noticed it never sends full links like that. However, I imagined it did. Let`s see the headers log:

(log1 attached)

There really is no word "search" in the urls, which I believed was there.

The problem is that the links are so generic, that it leaves no chance to limit my filter to exactly the text search results. Any ideas on how to do it?

Quote:The google filters that I use try to avoid these scripts.

If you mean the headers rule posted above, I tried it again, but it didn`t change anything. I don`t know if it if supposed to. Would be nice, though.

For now, http works for me. But it irritates me that google often redirects to https. Maybe there will be a simpler way to block ads in that search.


Attached File(s)
.txt  log1.txt (Size: 3.3 KB / Downloads: 871)
Add Thank You Quote this message in a reply
Jun. 02, 2012, 09:59 PM (This post was last modified: Jun. 05, 2012 01:22 AM by ProxRocks.)
Post: #47
RE: Google ads why?
By the way, that log was taken with the "Proxomitron" in bypass mode. I didn`t pay attention to that fact and thought they were pretty much the same with it off/on.

But if I switch it on, the links are different and really contain full links. However, it still doesn`t behave as if it saw them. I marked all the useful links. But I don`t know why it doesn`t work, when I put masks for these links into the rules url-limiter line.

(log2 attached)


Attached File(s)
.txt  log2.txt (Size: 32.48 KB / Downloads: 640)
Add Thank You Quote this message in a reply
Jun. 03, 2012, 12:37 AM
Post: #48
RE: Google ads why?
The reply for "GET 852" is "HTTP/1.1 204 No Content" and "Content-Length: 0". So, there is nothing there for your web filters to filter.

The other URLs are matched by the URL matches of the filters that you posted. So, I will guess that your filters' Matching Expressions failed to match. You may be able to see this by enabling "HTML Debug Info" and then loading the URL in the browser. See http://proxomitron.info/45/help/Log.html and http://proxomitron.info/45/help/URL%20Commands.html .

About URL matches,

Code:
URL = "\www.google.\w"

I would not do this. It would unnecessarily slow browsing.
\w matches to a "space character" or a ">".
The first \w will search every character of all unmatched URLs for a match.
The second \w is not needed.


You can test URL matches. Right click on the "URL Match" field and select "Test matching".
Add Thank You Quote this message in a reply
Jun. 03, 2012, 03:32 AM (This post was last modified: Jun. 03, 2012 03:33 AM by Gravemind.)
Post: #49
RE: Google ads why?
Quote:I would not do this. It would unnecessarily slow browsing.
\w matches to a "space character" or a ">".
The first \w will search every character of all unmatched URLs for a match.
The second \w is not needed.

That was just for testing. Also, I don`t use \w in the end, but I use it in the beginning — that was a leftover.

I see no alternative to simply match all domains, subdomains, url strings. Works fine for me in over 10000 filters that I use, doesn`t slow down pages. So, it is not that bad. Maybe there is a better symbol?

?++ is mostly the same, but also consumes spaces, which is not necessary. Phrases like "http://" with ^ and other characters won`t do in my filters, I can`t edit them individually by hand — there are thousands of them.

Quote:The reply for "GET 852" is "HTTP/1.1 204 No Content" and "Content-Length: 0". So, there is nothing there for your web filters to filter.

I`m surprised you noticed that, which I didn`t. But that page was trash, killed by \k with

/csi\?v=3\&s — it was in my adblock list.

So it is zero.

Quote:You can test URL matches. Right click on the "URL Match" field and select "Test matching".

That`s exactly the problem. When I test it in that window, it tests fine, but in reality, the match doesn`t happen on the same links, both from the browser window and the headers.

All search html pages contain the word "search" in their urls, so for starters I am trying to match with \wgoogle\wsearch\?

But it doesn`t work. Only when I cut it down to \wgoogle it starts working. This I can`t understand.

I will update if I fix it.

Those filters are very reliable though. However, they destroy all pages, besides the web search.

Still, they always work fine on https and the ads are always destroyed by adding style="display:none" to their HTML-tags.
Add Thank You Quote this message in a reply
Jun. 03, 2012, 03:42 AM
Post: #50
RE: Google ads why?
Quote:The google filters that I use try to avoid these scripts.

Do you use other rules for google besides the one you posted above?
Add Thank You Quote this message in a reply
Jun. 03, 2012, 05:55 AM
Post: #51
RE: Google ads why?
(Jun. 03, 2012 03:32 AM)Gravemind Wrote:  I see no alternative to simply match all domains, subdomains, url strings. Works fine for me in over 10000 filters that I use, doesn`t slow down pages. So, it is not that bad. Maybe there is a better symbol?

?++ is mostly the same, but also consumes spaces, which is not necessary. Phrases like "http://" with ^ and other characters won`t do in my filters, I can`t edit them individually by hand — there are thousands of them.

Things have changed. Computers are much faster now than when I started using the Proxomitron but still...

Something like "[^/]++.google" stops looking when the first "/" is found. Adding $TYPE() to URL matches, like "$TYPE(htm)[^/]++.google" would further reduce unnecessary testing.

Have you had problems with "*"?

What do you gain by using "?++" instead of "*"? "*" should be quicker.
"\w" is good for what it is intended to do, match to a space or ">". Otherwise, I use "*".

(Jun. 03, 2012 03:32 AM)Gravemind Wrote:  
Quote:The reply for "GET 852" is "HTTP/1.1 204 No Content" and "Content-Length: 0". So, there is nothing there for your web filters to filter.

I`m surprised you noticed that, which I didn`t. But that page was trash, killed by \k with

/csi\?v=3\&s — it was in my adblock list.

So it is zero.

Ah, I missed seeing the kill.
But, I think, a "HTTP/1.1 204 No Content" connection is used to send a server data and the reply's "Content-Length=0" is from Google's server. http://www.w3.org/Protocols/rfc2616/rfc2...#sec10.2.5 .

The log looks like you blocked after the request was sent and Google received the info. Is your URL Killer a "In" header filter?

(Jun. 03, 2012 03:32 AM)Gravemind Wrote:  
Quote:You can test URL matches. Right click on the "URL Match" field and select "Test matching".

That`s exactly the problem. When I test it in that window, it tests fine, but in reality, the match doesn`t happen on the same links, both from the browser window and the headers.

All search html pages contain the word "search" in their urls, so for starters I am trying to match with \wgoogle\wsearch\?

But it doesn`t work. Only when I cut it down to \wgoogle it starts working. This I can`t understand.

I don't have an answer beyond AJAX and maybe Google isn't always sending you the same code.
You might try adding a flag to your URL match. Like "\wgoogle\wsearch\?$ALERT(URL Match worked)" or "$TYPE(htm)[^/]++.google.*search\?$ALERT(URL Match worked)". Wink

(Jun. 03, 2012 03:42 AM)Gravemind Wrote:  
Quote:The google filters that I use try to avoid these scripts.

Do you use other rules for google besides the one you posted above?

Yes, http://prxbx.com/forums/showthread.php?tid=1870 ,and I probably should fix some of them.

Sleep
Add Thank You Quote this message in a reply
Jun. 04, 2012, 03:08 AM
Post: #52
RE: Google ads why?
Quote:Have you had problems with "*"?

Yes, this caused me some problems. Also, I read somewhere, that some of the wildcards can be hashed, while others can`t. And the asterisk was not marked as hashable, while \w was. I switched to \w in almost all rules.

However, * is the best choice in adblock plus filters in Firefox.

Quote:What do you gain by using "?++"


I`ve read that there are ?+ and ?++, and the latter works better when you need it to stop at some exact point. I am not sure if there is any real difference.

Quote:Is your URL Killer a "In" header filter?

Yes, because I belived there was no point to kill out-requests. Now I will kill them both.

Quote:I don't have an answer beyond AJAX and maybe Google isn't always sending you the same code.

Apparently, this is the cause, since the page never fully reloaded, and the Proxomitron had nothing to filter. The details don`t matter, but below is a simple way to fix this.

This is one funny thing about the google search.

When I first input the request, it returns the link like this:

http://www.google.ru:443/ (this is the input page http://www.google.ru/ — ignore)

>>>

http://www.google.ru:443/#hl=ru&newwindow=1&output=search&sclient=psy-ab&q=%D0%BA%D1%83%D0%BF%D0%B8%D1%82%D1%8C+%D0%BE%D0%B1%D1%83%D0%B2%D1%8C&oq=%D0%BA%D1%83%D0%BF%D0%B8%D1%82%D1%8C+%D0%BE%D0%B1%D1%83%D0%B2%D1%8C&aq=f&aqi=g10&aql=&gs_l=hp.3..0l10.1627.3047.1.3193.12.11.0.0.0.0.527.3193.3-5j2j1.8.0...0.0.PQhPLuuJ9R4&psj=1&bav=on.2,or.r_gc.r_pw.r_qf.,cf.osb&fp=27ec78adb151ef06&biw=1400&bih=726

or like this:

http://www.google.ru:443/webhp?hl=ru&tab=ww#hl=ru&newwindow=1&sclient=psy-ab&q=%D0%BA%D1%83%D0%BF%D0%B8%D1%82%D1%8C+%D0%BE%D0%B1%D1%83%D0%B2%D1%8C&oq=%D0%BA%D1%83%D0%BF%D0%B8%D1%82%D1%8C+%D0%BE%D0%B1%D1%83%D0%B2%D1%8C&aq=f&aqi=g10&aql=&gs_l=hp.3..0l10.4447.5041.2.5301.5.5.0.0.0.0.359.1403.3-4.4.0...0.0.w4SaoEusqXY&psj=1&bav=on.2,or.r_gc.r_pw.r_qf.,cf.osb&fp=27ec78adb151ef06&biw=1400&bih=726

The former particular link contains the word "search" which seems to be not detectable by the "Proxomitron". Then, if I click on "Images", the link turns back to normal.

>>>

http://www.google.ru:443/search?q=%D0%BA%D1%83%D0%BF%D0%B8%D1%82%D1%8C+%D0%BE%D0%B1%D1%83%D0%B2%D1%8C&hl=ru&newwindow=1&prmd=imvns&source=lnms&tbm=isch&ei=PgjMT5_pGeb54QSA3okV&sa=X&oi=mode_link&ct=mode&cd=2&ved=0CJEBEPwFKAE&biw=1400&bih=726

Now I go back to web search and the web search, too, is back to normal again:

>>>

http://www.google.ru:443/search?q=%D0%BA%D1%83%D0%BF%D0%B8%D1%82%D1%8C+%D0%BE%D0%B1%D1%83%D0%B2%D1%8C&hl=ru&newwindow=1&prmd=imvns&source=lnms&ei=hBLMT_nxOI_04QTkv6ka&sa=X&oi=mode_link&ct=mode&cd=1&ved=0CA8Q_AUoAA&biw=1400&bih=726


Now you see the correct /search?q= in its end (this is what I mean by "normal"), unlike those /webhp?hl= or #hl=. And with /search?q= there is no need for any more rules, than the simple ones just to erase the blocks.

It appears the results in /webhp?hl= or #hl= are delivered in a form different from an html page and they don`t pass through the "Proxomitron" in the full sense of it.

So, the solution is to either use the search panel in your browser, which returns the correct pages with /search?q=, or to use the advanced search as your main google gateway:

http://www.google.ru/advanced_search

(Use .com instead.)

It also returns filterable results.

Or you may change the search to the image search and then back to the web search. This is a hassle, so ignore this method.

To make this hassle-free and seamless, it will be nice if you have a way to make the "Proxomitron" capture the search query from the input box on the main google page http://www.google.ru/ / http://www.google.com/ and then redirect it to http://www.google.ru/search?q=(your query here). I don`t have a rule for this yet, but it would be very convenient for everyone.

Any ideas?

Haha. And that was all hidden in plain sight.

Also, that java-script decryption stuff did work on those lame urls, but it is so much hassle and trouble, that it is only good for perverts, since there are simpler ways.
Add Thank You Quote this message in a reply
Jun. 04, 2012, 10:01 PM
Post: #53
RE: Google ads why?
Wildcards are not hashed but where they are used can affect hashing. "*" should be ok where "?++" is ok.

(Jun. 04, 2012 03:08 AM)Gravemind Wrote:  I`ve read that there are ?+ and ?++, and the latter works better when you need it to stop at some exact point. I am not sure if there is any real difference.

"?+" is "greedy". It matches until there is nothing left. "?+a" would never match because "?+" would consume the "a".
"?++", like "*", looks ahead. "?++a" or "*a" would match because the "a" is not consumed by the wildcard.

(Jun. 04, 2012 03:08 AM)Gravemind Wrote:  Any ideas?

Avoid AJAX. Does this filter help?

Code:
[Patterns]
Name = "Google search No Ajax  12.06.04 [add]"
Active = TRUE
URL = "$TYPE(htm)(www|encrypted).google.(*/)+{1}(intl/(*/)+{1}(^?)|search\?(^tbm=isch|*\&tbm=isch)|webhp|(^?))"
Limit = 256
Match = "if\(c\&\&c.getElementById\)if\(typeof XMLHttpRequest!=d\)a=\"2\";"
Replace = "if (!0) { /* PROX: S-Spec If: ! Removed - (c&&c.getElementById) */ }"
Add Thank You Quote this message in a reply
Jun. 05, 2012, 12:13 AM
Post: #54
RE: Google ads why?
any complaints if i take those massive logs and make them "attachments" instead of ultra-huge scroll-scroll-scroll-scroll-scroll-scroll blocks?

or we could just wait six more posts and our "new page" won't have them, lol...
Add Thank You Quote this message in a reply
Jun. 05, 2012, 01:08 AM
Post: #55
RE: Google ads why?
Quote:any complaints if i take those massive logs

You should probably delete them. They are not of much use.
Add Thank You Quote this message in a reply
Jun. 05, 2012, 01:28 AM
Post: #56
RE: Google ads why?
Yes, this filter does the job.

The only side effect is that it changes the left panel layout a little.

If anybody is curious, I have uploaded 2 pictures. The one with icons appears after the rule is applied. But it`s probably even better.

Thanks.


[Image: db7d67c8d04737fa1db87bd306fe3e21.jpg]


[Image: 4f90d5960becc2172281c77a4bf67ecb.jpg]
Add Thank You Quote this message in a reply
Jun. 05, 2012, 01:33 AM
Post: #57
RE: Google ads why?
(Jun. 05, 2012 01:08 AM)Gravemind Wrote:  
Quote:any complaints if i take those massive logs

You should probably delete them. They are not of much use.

moved them to log1/log2 attachments...
not trying to be "Type A" or anything, lol...
Add Thank You Quote this message in a reply
Oct. 04, 2012, 03:46 AM
Post: #58
RE: Google ads why?
(Mar. 11, 2012 03:19 PM)JJoe Wrote:  The set provides and filters Google cookies. I'll look at changing the localization to USA but Google will still know where you are.
Google knows! mwa ha ha ha ha!
However, you (Mele20) may care about the actual page content, and whoever might read it if you're not using https://
Disable "geo" in the browser
http://www.google.com/search?q=Disable+G...as_qdr=all
Also, maybe look through the js and html for "geo".


I mentioned in another thread that google sends a cleaner page if you send google an archaic user-agent. I rely on that.


BTW, Tor uses their version of firefox ESR, and multiple Tor version updates maintain the same user agent override.
From preference.js in torbutton extension folder here's:
Code:
pref("extensions.torbutton.useragent_override",
     "Mozilla/5.0 (Windows NT 6.1; rv:10.0) Gecko/20100101 Firefox/10.0");
Regular Firefox 15.0.1 user agent was
Code:
User-Agent: Mozilla/5.0 (Windows NT XXXXXXX; rv:15.0) Gecko/20100101 Firefox/15.0.1

+++
Mozilla/5.0 (Windows NT 6.1; rv:10.0) Gecko/20100101 Firefox/10.0
was
User-Agent: Mozilla/5.0 (Windows NT XXXXXXX; rv:15.0) Gecko/20100101 Firefox/15.0.1
Add Thank You Quote this message in a reply
Feb. 26, 2013, 03:43 PM
Post: #59
RE: Google ads why?
(Mar. 11, 2012 03:19 PM)JJoe Wrote:  Start with http://prxbx.com/forums/showthread.php?tid=1870 and then replace "Google Search: Remove Ad Blocks part 1" with

Code:
[Patterns]
Name = "Google Search: Remove Ad Blocks part 1     12.03.11 [multi] (d.s) [ADD] test"
Active = TRUE
Multi = TRUE
URL = "$TST(hCT=*html)(www|encrypted).google."
Limit = 32766
Match = "<div\s?(*>)+{1} <h2 class=$AV(spon)*"
        "("
        "( (<div id=$AV( i+res ) >)+{1,2} <ol>)\#"
        "$SET(sSpec=$GET(sSpec)sponsfloat.)"
        "|"
        "(</div> <div id=$AV(foot) >)\#"
        "$SET(sSpec=$GET(sSpec)sponsfloatfoot.)"
        ")"
        "|"
        "<div id=$AV(bottomads)* (<div [^>]++id=$AV(foot))\#"
        "$SET(sSpec=$GET(sSpec)sponsbottomads.)"
        "|"
        "<div id=$AV(topstuff)$INEST(<div,</div>)</div>"
Replace = "\@"

i've just noticed that the "Remove Ad Blocks part 1 12.03.11" axes Google's "calculator"...
the top-of-results tidbit for when you do a search for "2 x 4 =" or "2 feet to inches"...
Add Thank You Quote this message in a reply
Feb. 26, 2013, 05:44 PM
Post: #60
RE: Google ads why?
(Feb. 26, 2013 03:43 PM)ProxRocks Wrote:  i've just noticed that the "Remove Ad Blocks part 1 12.03.11" axes Google's "calculator"...
the top-of-results tidbit for when you do a search for "2 x 4 =" or "2 feet to inches"...

Can't remember why I did that.

Code:
[Patterns]
Name = "Google Search: Remove Ad Blocks part 1     13.02.26 [multi] (d.s) [ADD] test"
Active = TRUE
Multi = TRUE
URL = "$TST(hCT=*html)(www|encrypted).google."
Limit = 32766
Match = "<div\s?(*>)+{1} <h2 class=$AV(spon)*"
        "("
        "( (<div id=$AV( i+res ) >)+{1,2} <ol>)\#"
        "$SET(sSpec=$GET(sSpec)sponsfloat.)"
        "|"
        "(</div> <div id=$AV(foot) >)\#"
        "$SET(sSpec=$GET(sSpec)sponsfloatfoot.)"
        ")"
        "|"
        "<div id=$AV(bottomads)* (<div [^>]++id=$AV(foot))\#"
        "$SET(sSpec=$GET(sSpec)sponsbottomads.)"
Replace = "\@"

We'll see what shows up.
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: