Post Reply 
Cleaning up deleted forms
Mar. 28, 2006, 02:15 AM (This post was last modified: Apr. 12, 2006 01:17 PM by z12.)
Post: #1
Cleaning up deleted forms
(Moved to Filters In Progress, 3/28/06 - Admin)

For some reason, mainly sloppy html coding, removing forms is a pita.

It seems that most filter sets use a similar method of matching common inner nested tags.

No matter how you match it, it's hard to match everything that should match, and maybe more importantly, not matching what shouldn't be matched.


To get around this, I'm currently testing the code that follows.

The idea is to match the form to be removed with proxo, then use javascript injected immediately after the form to remove the form node from the dom.


Code:
see revised code


As far as I can tell it seems to be working fine. Checking the dom after a page loads, shows no trace of the form.

As is, the main drawback to the above filter is that the browser still requests any images, scripts or whatnot that might be contained in the form.

I was hoping that with the "display:none" style attribute, the browser (Firefox) wouldn't request anything but thats not the case.

So, the bottom line is you may still want to match & remove some tags.

The nice thing is that with the javascript call injected after the form, any other tag element remnats contained in the form will be cleanly removed.


Ideas or comments welcome

Mike
Add Thank You Quote this message in a reply
Apr. 09, 2006, 11:56 PM
Post: #2
 
hmm...

Opera is having an issue with this filter.

It doesn't work, no js errors but the form is not removed.

Firefox did the same thing when the I tried the script inside the form node.

I'll have to dig a bit deeper on this.

Mike
Add Thank You Quote this message in a reply
Apr. 10, 2006, 09:11 PM
Post: #3
 
Mike,

What's the reasoning behind wanting to remove a form in the first place? IOW, why are you doing this?


Oddysey

I'm no longer in the rat race - the rats won't have me!
Add Thank You Quote this message in a reply
Apr. 10, 2006, 11:28 PM
Post: #4
 
That one's easy - because it is OFF-SITE Big Teeth

I basically run sidki's "out-of-the-box" config (with the WIP's added in) for roughly 90% of my surfing activity...

But I do find myself keeping a 'sidki-modified' around that treats any and ALL "off-site" cr@p (even CSS) like, well, like whatever you treat something you don't want ANY of, lol...

Like another thread has brought up, if it is "off-site", it's a 'bug' tracking you in some way or form (to tie in with the current off-site "beast")...

And for that 10% of browsing (like following links from, oh, say Ernie's House), I would just rather ensure no "off-site" web servers are logging my every move, lol...


You know, does Google REALLY "need" to know just how many sites that are "powered by Google" have been visited? What about PayPal?
Add Thank You Quote this message in a reply
Apr. 11, 2006, 01:07 AM
Post: #5
 
I hate forms. Smile!

No seriously, I remove forms for the following reasons:


1. It's an ad container.

2. It's offsite.

3. It's a nag to subscribe to one thing or another.

IOW, I'm easily annoyed. Smile!


The technique of using the dom to aid proxo in removing the form could be used with any container.

I'm just using it with forms, since they tend to be a bit trickier in matching what should be removed.

In an ideal world, you could just match with $NEST(<form,</form>) and be done with it.

Unfortunately, this will often break the page layout due to tags like <div|table|tr|td|> having the start tag before the form, but the end tag within it.

To avoid this usually involves matching common nested tags and other elements to be removed.

However, it seemed I was often adding something new to the matching expression to remove yet another tag that was embeded within the form. Since almost any html tag can be in the form, trying to match all of them seemed a bit excessive to me.

The good news is that removing the form via the dom avoids this matching problem.

When a node is removed via the dom, it removes all child nodes of the node as well. If required, the dom will effectively add any closing tags that are needed to close sibling or parent nodes.

This allows the matching expression to be simplified, since you don't need to match any tags within the form (unless you want to). The js call after the form will remove any tags that weren't removed by proxo.

Now if I could just get it to work with opera. I haven't tried it with ie yet, it may or may not have the same problem.

My theory about this problem is that opera hasn't finished processing the form node and this is preventing the js from accessing it.


ProxRocks Wrote:You know, does Google REALLY "need" to know just how many sites that are "powered by Google" have been visited? What about PayPal?

Yahoo is bad also. They seem to think that "all your clicks are belong to us". They use addEventListener on their links to add an "onclick" method that replaces the href when you click it. Since theres no "onclick" attribute in the html, your only hope is to catch it in the javascript, or possibly a header filter. I'm currently trying out a method that defeats this (works in ie, moz & opera), but thats a topic for another thread.


Mike
Add Thank You Quote this message in a reply
Apr. 12, 2006, 07:06 AM
Post: #6
 
Mike,

OK, hating forms is fine, I can live with that. But how do you respond to posts here on the UOPF, or make new ones yourself??????


Oddysey

I'm no longer in the rat race - the rats won't have me!
Add Thank You Quote this message in a reply
Apr. 12, 2006, 07:09 AM
Post: #7
 
ProxRocks,

If you're worried about an ad being in a container (a form), I'd have to ask, why would Proxo suddenly say to itself "Wow, this ad I'm supposed to remove is inside a form, I'd better not remove it"? I contend that if the ad is not being removed regardless of where it appears on the page, then the filter itself is at fault, not the coding of a container around it.

And FWIW, the page itself is a container, and at that, it's not even the top level in the DOM. Shock


Oddysey

I'm no longer in the rat race - the rats won't have me!
Add Thank You Quote this message in a reply
Apr. 12, 2006, 01:16 PM
Post: #8
 
Oddysey Wrote:But how do you respond to posts here on the UOPF, or make new ones yourself??????

The forms here are not off-site. Anyway, I usually bypass when logging in.

Oddysey Wrote:I contend that if the ad is not being removed regardless of where it appears on the page, then the filter itself is at fault, not the coding of a container around it.

hmm... I thought I was the one saying it's hard to match whats "inside" a form.


As it turns out, it seems that with the browsers I've been debuging with; Firefox, Opera & ie, the DOM treats forms differently than other containers.

To remove the form, I was putting a div tag "inside" the form and the js would then remove the div tag's parent node (the form, I thought).

However, when I disabled the filter's js and viewed the form node in Firefox's DOM Inspector, the div tag I put in was not "In" the form. In fact, nothing was in the form, no child nodes at all. All I found was a form "elements" node which is a collection of form controls that are associated with the form. I suppose this makes some sense as the w3c says that a form is a container for form controls. The w3c goes on to say you can put other elements inside a form, but apparently from the dom's point of view, these elements are not "inside" the form.

For debugging, I "injected" div tags into forms, then browsed the dom. The div tags I injected seemed to appear in random places. Definitely not where I thought it should have appeared.

Next, I tried putting the form "inside" a div tag. This didn't work either. Perhaps this is because the div tag is a block-level element.

Finally I tried putting the form inside a table. This seems to work as the form now shows up as a child node of the table.

So heres the revised filter & js code:

Code:
[Patterns]
Name = "Off-Site Form Killer"
Active = TRUE
URL = "$TYPE(htm)"
Bounds = "$NEST(<form,</form>)"
Limit = 12200
Match = "(<(form)\0)\1([^>]+>&&(^*\s id=$AV(prxKillForm))"
        "(\#\s id=$AV(*))+\#"
        "&*\s action=$AV(*://(^*(poll|search|google.))(^([^/]++.|)$TST(uDom))\9)"
        ") \2"
Replace = "\r\n<table id="prxKillNode">\r\n"
          "<tr><td>\r\n"
          "\1 id="prxKillForm"\@\2\r\n"
          "</td></tr></table>\r\n"
          "<proxo><?nako removed=\0: \9 ?></proxo>\r\n"
          "<script type="text/javascript">prxRemoveNode("prxKillNode");</script>\r\n"


// script located in the js injected by proxo in the head section
function prxRemoveNode(tagId){
  var n=document.getElementById(tagId);
  var pn = n.parentNode;
  pn.removeChild(n);
}

To prevent prox from getting stuck in a matching loop on the form, I added a check to make sure the form doesn't have an id="prxKillForm".

Filter is now working with Firefox, Opera & IE (so far).

Mike
Add Thank You Quote this message in a reply
Apr. 12, 2006, 04:43 PM
Post: #9
 
here is a site where I am getting that filter to match something: http://puzzles.usatoday.com/sudoku/

but the filter seems to be breaking the page...
Add Thank You Quote this message in a reply
Apr. 12, 2006, 07:23 PM
Post: #10
 
I see that, I'll look into it.

On a odd note, I was reading up on sodoku earlier today. Saw a link to that page but I didn't check it out.

Mike
Add Thank You Quote this message in a reply
Apr. 21, 2006, 09:51 AM
Post: #11
 
Ok, I gave up on the idea of trying to contain the form inside a table that was injected by the filters replacement code.

So here's the revised code.

Code:
[Patterns]
Name = "Off-Site Form Killer"
Active = TRUE
Multi = TRUE
URL = "$TYPE(htm)"
Bounds = "$NEST(<form,</form>)*</\w>"
Limit = 12200
Match = "(<form)\2([^>]+>&&(\#\s id=$AV(*))+\#"
        "&*\s action=$AV(*://(^*(poll|search|google.))(^([^/]++.|)$TST(uDom))\9)*) \3"
Replace = "\2 id="prxKillForm" \@\3\r\n"
          "<script type="text/javascript">prxRemoveForm("prxKillForm");</script>\r\n"




// script located in the js injected by proxo in the head section
//-----------------------------------

// create a contains method for mozilla to check if one node is contained in another
// based on http://www.quirksmode.org/blog/archives/2006/01/contains_for_mo.html
// ie & opera have a contains method
// added isSameNode check to make it work like ie & opera

function prxContainsCheck(){
  var n=document.documentElement;
  if(typeof(n.contains)=='undefined' && typeof(n.compareDocumentPosition!='undefined')){
    try{
      Node.prototype.contains=function(arg){return !!(this.compareDocumentPosition(arg) & 16 || this.isSameNode(arg))};
    }catch (e){
      alert("create contains method failed");
    }
  }
}

prxContainsCheck();


//-----------------------------------


function prxRemoveForm(tagId){
  var f, tn, bn, pn;
  
  // get the form node
  f = document.getElementById(tagId);
  
  // get the form control element thats highest in the document tree
  tn = f.elements[0];
  
  // get the form control element thats lowest in the document tree
  bn = f.elements[f.elements.length-1];
  
  // start with the parentNode of the top form control element
  pn=tn.parentNode;
  
  // bubble up till all the form's control elements are found
  while(!(pn.contains(bn)&& pn.contains(f))){
    pn=pn.parentNode;
  }
  
  // remove the node that contains the form & all it's elements
  pn.parentNode.removeChild(pn);
  
}


//-----------------------------------

The js now looks for the node that contains all the form's elements, which in some cases, is not the form.

Since mozilla doesn't have a contains method, the code above creates one that works like the one in ie & opera.

IE doesn't like it when a node is deleted before ie has processed it. Symptoms included getting a 404 after the page has partially loaded or IE poping up a "IE must close down" message.

To get around this problem, I modifed the filters bounds expresion to look for a closing tag after the form is closed. So far this seems to be working, but time will tell. A short time delay also works, but thats not the route I want to go if I don't have to.

Also, I turned on "Multi" since the filter now matches past the form.

Mike
Add Thank You Quote this message in a reply
Apr. 21, 2006, 01:46 PM
Post: #12
 
Hmm... still a bug with this.

Have to take another look.

Mike
Add Thank You Quote this message in a reply
Apr. 21, 2006, 02:10 PM
Post: #13
 
IE is being a pain. It only works reliably when I put a delay in.

Here's the filter that works with ie:

Code:
[Patterns]
Name = "Off-Site Form Killer (ie)"
Active = TRUE
Multi = TRUE
URL = "$TYPE(htm)"
Bounds = "$NEST(<form,</form>)"
Limit = 12200
Match = "(<form)\2([^>]+>&&(\#\s id=$AV(*))+\#"
        "&*\s action=$AV(*://(^*(poll|search|google.))(^([^/]++.|)$TST(uDom))\9)*) \3"
Replace = "\2 id="prxKillForm" \@\3\r\n"
          "<script type="text/javascript">window.SetTimeout('prxRemoveForm("prxKillForm")',10);</script>\r\n"

Since I disable timers till after the page loads, I have to use

window.RealSetTimeout

to call the real timer function.

If you do something similar, modify the SetTimeout function call accordingly.

The js is the same for either filter.

I'd like to figure out a better solution to this ie issue.

Mike
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: