Post Reply 
Another Filtering Proxy
May. 27, 2015, 02:26 PM
Post: #16
RE: Another Filtering Proxy
I spent several hours but couldn't figure out why. It seems to be related to urllib3's headers handling.

Anyway, attached is the unreleased v0.5 with an ugly patch. Let me know if it works.


Attached File(s)
.zip  testing.zip (Size: 11.59 KB / Downloads: 587)
Add Thank You Quote this message in a reply
May. 28, 2015, 04:26 AM (This post was last modified: May. 28, 2015 07:42 AM by cattleyavns.)
Post: #17
RE: Another Filtering Proxy
Great job! I think we should rewrite self.headers.update too, I'm trying to do that now. Urllib3 headers feature is not good at least at this time, I think we should depart the whole header feature from them I use built-in as much as possible.

Can you tell me how to get this line to URLFilter.py and modify it as I want:
Code:
        headers = urllib3._collections.HTTPHeaderDict()
        [headers.add(key, value) for (key, value) in self.headers.items()]

I'm adding proxy feature to AFProxy using proxy_from_url, but I want to patch above problem by set headers = self.headers (req.headers in URLFilter.py)


I'm learning Python but I'm having a really tough question about "threading", threading with Python is not easy at all.. I would like to ask you some question and hope you will help me:
- In threading, how can we download a big file in parts but join it one by one instead wait them all finish and then join.
Code, save as .py and then run it.
Code:
import os, requests
import threading
import urllib.request, urllib.error, urllib.parse
import time

URL = "https://peach.blender.org/wp-content/uploads/poster_bunny_big.jpg"

def buildRange(value, numsplits):
    lst = []
    for i in range(numsplits):
        if i == range(numsplits):
            lst.append('%s-%s' % (int(round(1 + i * value/(numsplits*1.0),0)), int(value - round(1 + i * value/(numsplits*1.0) + value/(numsplits*1.0)-1, 0))))
        if i == 0:
            lst.append('%s-%s' % (i, int(round(1 + i * value/(numsplits*1.0) + value/(numsplits*1.0)-1, 0))))
        else:
            lst.append('%s-%s' % (int(round(1 + i * value/(numsplits*1.0),0)), int(round(1 + i * value/(numsplits*1.0) + value/(numsplits*1.0)-1, 0))))
    return lst

def main(url=None, splitBy=3):
    start_time = time.time()
    if not url:
        print("Please Enter some url to begin download.")
        return

    fileName = "1.jpg"
    sizeInBytes = requests.head(url, headers={'Accept-Encoding': 'identity'}).headers.get('content-length', None)
    print("%s bytes to download." % sizeInBytes)
    if not sizeInBytes:
        print("Size cannot be determined.")
        return

    dataDict = {}

    # split total num bytes into ranges
    ranges = buildRange(int(sizeInBytes), splitBy)

    def downloadChunk(idx, irange):
        print(idx)
        req = urllib.request.Request(url)
        req.headers['Range'] = 'bytes={}'.format(irange)
        dataDict[idx] = urllib.request.urlopen(req).read()
        print("finish: " + str(irange))

    # create one downloading thread per chunk
    downloaders = [
        threading.Thread(
            target=downloadChunk,
            args=(idx, irange),
        )
        for idx,irange in enumerate(ranges)
        ]

    # start threads, let run in parallel, wait for all to finish
    for th in downloaders:
        th.start()
    #for th in downloaders:
        th.join()
        #print(th.join)

    print('done: got {} chunks, total {} bytes'.format(
        len(dataDict), sum( (
            len(chunk) for chunk in list(dataDict.values())
        ) )
    ))

    print("--- %s seconds ---" % str(time.time() - start_time))

    if os.path.exists(fileName):
        os.remove(fileName)
     #reassemble file in correct order
    with open(fileName, 'wb') as fh:
        for _idx,chunk in sorted(dataDict.items()):
            fh.write(chunk)
    #stream_chunk = 16 * 1024
    #with open(fileName, 'wb') as fp:
    #  while True:
    #      for _idx,chunk in sorted(dataDict.items()):
            #fh.write(chunk)
     #       chunking = chunk.read(stream_chunk)
      #      if not chunk:
       #         break
        #    fp.write(chunking)


    print("Finished Writing file %s" % fileName)
    print('file size {} bytes'.format(os.path.getsize(fileName)))

if __name__ == '__main__':
    main(URL, splitBy=3)

What I want is:
- For example we have a big file with 100MB file size
- We will split that file with Content-Length
- We will use "threading" module to download that file in parts to ensure we have as fast as possible download speed instead download one by one without threading then join part.
- But problem is with threading "join()", we cannot stream file or write file to disk instantly like Free Download Manager/Flashget software because "join()" wait for all thread finish.
- But without join(), simply this script will not work, file size return 0 byte because the file write before the download task finish.
- So I want to make threading work like this:
+ Download a file with 4 threads
+ Thread 1 download finish, stream thread 1 data then wait till thread 2 finsh, join thread 2 with thread 1, but even thread 3, 4 finish earlier than thread 2, thread 3, 4 should not join with thread 1 because that action will break the file, it must wait till thread 2 finish then join 1 with 2, then join 3, 4 with.
Add Thank You Quote this message in a reply
May. 28, 2015, 09:53 AM
Post: #18
RE: Another Filtering Proxy
(May. 28, 2015 04:26 AM)cattleyavns Wrote:  I'm adding proxy feature to AFProxy using proxy_from_url, but I want to patch above problem by set headers = self.headers (req.headers in URLFilter.py)

I am not sure if I get your point, but self.headers is available as req.headers in URLFilter.py, and you can operate it freely as you want.

(May. 28, 2015 04:26 AM)cattleyavns Wrote:  - But problem is with threading "join()", we cannot stream file or write file to disk instantly like Free Download Manager/Flashget software because "join()" wait for all thread finish.

I think you can create the file in advance, and in each thread write the data to specified offset via f.seek(offset, from_what). Also you need to take care of Semaphore acquire() and release(). They are all documented in the manual.
Add Thank You Quote this message in a reply
May. 29, 2015, 04:07 AM (This post was last modified: May. 29, 2015 04:13 AM by cattleyavns.)
Post: #19
RE: Another Filtering Proxy
I also want to report a funny bug of AFProxy:
thread.dameon = True

Must be as the official documentation:
thread.daemon = True

But if I change it to thread.daemon = True, simply AFProxy stop working, so I decided to remove it by commenting it. Did we really need this line "thread.daemon = True" ?

(May. 28, 2015 09:53 AM)whenever Wrote:  I am not sure if I get your point, but self.headers is available as req.headers in URLFilter.py, and you can operate it freely as you want.

Thank you, but the headers variable I want to get and change is the "headers" in headers=headers from version 0.4, I want to change it with for example "URLFilter.py", I already could change self.headers with req.headers.

(May. 28, 2015 09:53 AM)whenever Wrote:  I think you can create the file in advance, and in each thread write the data to specified offset via f.seek(offset, from_what). Also you need to take care of Semaphore acquire() and release(). They are all documented in the manual.

Thank you, here is what I get so far, hope this contribute a little bit if you want to add new feature to AFProxy, also the way to do bandwidth throttling (speed limit):

Split file to parts and download and join in parallel:
Code:
import threading
import urllib.request, urllib.error, urllib.parse
import sys

max_thread = 10
# Initialize lock
lock = threading.RLock()

class Downloader(threading.Thread):
    def __init__(self, url, start_size, end_size, fobj, buffer):
        self.url = url
        self.buffer = buffer
        self.start_size = start_size
        self.end_size = end_size
        self.fobj = fobj
        threading.Thread.__init__(self)

    def run(self):
        """
vest only
        """
        with lock:
            print(('starting: %s' % self.getName()))
        self._download()

    def _download(self):
        """
I'm the one moving bricks
        """
        req = urllib.request.Request(self.url)
# Add HTTP Header (RANGE) set to download the data range
        req.headers['Range'] = 'bytes=%s-%s' % (self.start_size, self.end_size)
        f = urllib.request.urlopen(req)
# initialize the current thread file object offsets
        offset = self.start_size
        while 1:
            block = f.read(self.buffer)
# exit the current thread after data acquisition is completed
            if not block:
                with lock:
                    print(('%s done.' % self.getName()))
                break
# write data such as the time course locked threads
# Use with lock instead of the traditional lock.acquire () ..... lock.release ()
# requires python> = 2.5
            with lock:
                sys.stdout.write('%s saveing block...' % self.getName())
# Set the file object offset address
                self.fobj.seek(offset)
# write access to data
                self.fobj.write(block)
                print(block)
                offset = offset + len(block)
                sys.stdout.write('done.\n')


def main(url, thread=3, save_file='', buffer=1024):
# The maximum number of threads can not exceed max_thread
    thread = thread if thread <= max_thread else max_thread
# get file size
    req = urllib.request.urlopen(url)
    size = int(req.getheader('Content-Length'))
    print(size)
# object initialization file
    fobj = open(save_file, 'wb')
# calculated for each thread is responsible for the http Range size based on the number of threads
    avg_size, pad_size = divmod(size, thread)
    plist = []
    for i in range(thread):
        start_size = i*avg_size
        end_size = start_size + avg_size - 1
        if i == thread - 1:
# last thread plus pad_size
            end_size = end_size + pad_size + 1
        t = Downloader(url, start_size, end_size, fobj, buffer)
        plist.append(t)

# start moving bricks
    for t in plist:
        t.start()

# wait for all threads to finish
    #for t in plist:
        t.join()

# end of the course, remember to close the file object
    fobj.close()
    print('Download completed!')

if __name__ == '__main__':
    url = 'http://userscripts-mirror.org/scripts/source/57662.user.js'
    main(url=url, thread=4, save_file='a.user.js', buffer=16*1024)

Limit download speed:
Code:
"""Rate limiters with shared token bucket."""

import os
import sys
import threading
import time
import urllib.request, urllib.parse, urllib.error
import urllib.parse

class TokenBucket(object):
    """An implementation of the token bucket algorithm.
    source: http://code.activestate.com/recipes/511490/

    >>> bucket = TokenBucket(80, 0.5)
    >>> print bucket.consume(10)
    True
    >>> print bucket.consume(90)
    False
    """
    def __init__(self, tokens, fill_rate):
        """tokens is the total tokens in the bucket. fill_rate is the
        rate in tokens/second that the bucket will be refilled."""
        self.capacity = float(tokens)
        self._tokens = float(tokens)
        self.fill_rate = float(fill_rate)
        self.timestamp = time.time()
        self.lock = threading.RLock()

    def consume(self, tokens):
        """Consume tokens from the bucket. Returns 0 if there were
        sufficient tokens, otherwise the expected time until enough
        tokens become available."""
        self.lock.acquire()
        tokens = max(tokens,self.tokens)
        expected_time = (tokens - self.tokens) / self.fill_rate
        if expected_time <= 0:
            self._tokens -= tokens
        self.lock.release()
        return max(0,expected_time)

    @property
    def tokens(self):
        self.lock.acquire()
        if self._tokens < self.capacity:
            now = time.time()
            delta = self.fill_rate * (now - self.timestamp)
            self._tokens = min(self.capacity, self._tokens + delta)
            self.timestamp = now
        value = self._tokens
        self.lock.release()
        return value

class RateLimit(object):
    """Rate limit a url fetch.
    source: http://mail.python.org/pipermail/python-list/2008-January/472859.html
    (but mostly rewritten)
    """
    def __init__(self, bucket, filename):
        self.bucket = bucket
        self.last_update = 0
        self.last_downloaded_kb = 0

        self.filename = filename
        self.avg_rate = None

    def __call__(self, block_count, block_size, total_size):
        total_kb = total_size / 1024.

        downloaded_kb = (block_count * block_size) / 1024.
        just_downloaded = downloaded_kb - self.last_downloaded_kb
        self.last_downloaded_kb = downloaded_kb

        predicted_size = block_size/1024.

        wait_time = self.bucket.consume(predicted_size)
        while wait_time > 0:
            time.sleep(wait_time)
            wait_time = self.bucket.consume(predicted_size)

        now = time.time()
        delta = now - self.last_update
        if self.last_update != 0:
            if delta > 0:
                rate = just_downloaded / delta
                if self.avg_rate is not None:
                    rate = 0.9 * self.avg_rate + 0.1 * rate
                self.avg_rate = rate
            else:
                rate = self.avg_rate or 0.
            print(("%20s: %4.1f%%, %5.1f KiB/s, %.1f/%.1f KiB" % (
                    self.filename, 100. * downloaded_kb / total_kb,
                    rate, downloaded_kb, total_kb,
                )))
        self.last_update = now


def main():
    """Fetch the contents of urls"""

    rate_limit  = float(20)
    urls = {"http://userscripts-mirror.org/scripts/source/57662.user.js"}
    bucket = TokenBucket(10*rate_limit, rate_limit)

    print(("rate limit = %.1f" % (rate_limit,)))

    threads = []
    for url in urls:
        path = urllib.parse.urlparse(url,'http')[2]
        filename = os.path.basename(path)
        print(('Downloading "%s" to "%s"...' % (url,filename)))
        rate_limiter = RateLimit(bucket, filename)
        t = threading.Thread(
            target=urllib.request.urlretrieve,
            args=(url, filename, rate_limiter))
        t.start()
        threads.append(t)

    for t in threads:
        t.join()

    print('All downloads finished')

if __name__ == "__main__":
    main()
Add Thank You Quote this message in a reply
Jun. 03, 2015, 07:56 AM
Post: #20
RE: Another Filtering Proxy
(May. 29, 2015 04:07 AM)cattleyavns Wrote:  I also want to report a funny bug of AFProxy:

Well, you found an old bug, well done!

We don't need it. We can safely remove that line.

(May. 29, 2015 04:07 AM)cattleyavns Wrote:  Thank you, but the headers variable I want to get and change is the "headers" in headers=headers from version 0.4, I want to change it with for example "URLFilter.py", I already could change self.headers with req.headers.

In version 0.4, what you change via req.headers in URLFilter.py will be copied to headers via below lines in AFProxy.py. I don't think you need to do extra work.

Code:
headers = urllib3._collections.HTTPHeaderDict()
[headers.add(key, value) for (key, value) in self.headers.items()]


(May. 29, 2015 04:07 AM)cattleyavns Wrote:  Thank you, here is what I get so far, hope this contribute a little bit if you want to add new feature to AFProxy, also the way to do bandwidth throttling (speed limit):

Thanks, but I want AFProxy to focus on filtering. You are free to make a new proxy based on AFProxy with whatever new features you like. Smile!
Add Thank You Quote this message in a reply
Jun. 05, 2015, 12:26 PM (This post was last modified: Jun. 05, 2015 12:42 PM by cattleyavns.)
Post: #21
RE: Another Filtering Proxy
(Jun. 03, 2015 07:56 AM)whenever Wrote:  Well, you found an old bug, well done!

We don't need it. We can safely remove that line.

Well, I think we will need daemon, without daemon we cannot "Ctrl + C" to exit AFProxy, I feel a little bit uncomfortable when using "X" button to exit instead, and without Daemon we miss "OnExit"'s event using "atexit" module (import atexit).

Do you have any idea how to make thread.daemon = True or thread.setDaemon(True) work ? I tried to recover this feature but all I got is AFProxy stop working.

(Jun. 03, 2015 07:56 AM)whenever Wrote:  In version 0.4, what you change via req.headers in URLFilter.py will be copied to headers via below lines in AFProxy.py. I don't think you need to do extra work.

Code:
headers = urllib3._collections.HTTPHeaderDict()
[headers.add(key, value) for (key, value) in self.headers.items()]


I found another bug, we should move
Code:
        ########## Apply HeaderFilterOut ##########
        if config.HeaderFilter:
            if self.applyFilters('HeaderFilter', 'Out') == 'GetOut':
                return

Right above "headers = urllib3._collections.HTTPHeaderDict()" in your quote above, otherwise we cannot change/add/remove headers.

(Jun. 03, 2015 07:56 AM)whenever Wrote:  Thanks, but I want AFProxy to focus on filtering. You are free to make a new proxy based on AFProxy with whatever new features you like. Smile!

Great, thank you for that offer Wink

Here is my patch for AFProxy to make AFProxy work partially with socks proxy using Urllib(2) (weird, my implement look horrible, but for me better than nothing, right ?), based on version 0.4 because it is stable.

Need another module:
Code:
pip install pySocks

My implement way had my implement way problem, for example:
Code:
http://prxbx.com
-> TLSv1 ERROR
http://ghacks.net
-> Redirect forever error 30x

You might install BitviseSSHClient or AdvOr and set listen port to 10080 or change the line with "10080" with your socks proxy.

Changelog:
Code:
- Added socks support
- Moved:
        ########## Apply HeaderFilterOut ##########
        if config.HeaderFilter:
            if self.applyFilters('HeaderFilter', 'Out') == 'GetOut':
                return

right after URLFilter.

Test: http://ghacks.net/ip/


Attached File(s)
.zip  AFProxy.zip (Size: 5.26 KB / Downloads: 570)
Add Thank You Quote this message in a reply
Jun. 17, 2015, 09:14 AM (This post was last modified: Jun. 17, 2015 09:23 AM by cattleyavns.)
Post: #22
RE: Another Filtering Proxy
Okay, continue, I fixed a SERIOUS problem of http.server library:

- Technical details:
+ Use Firefox
+ Open Network tool (Tools -> Developer Tools -> Network)
+ open http://www.facebook.com
+ Find this url ('ai.php', filter this url with the Network tool's search box), response status icon filled with pink color, not green color, pink means error:
Quote:O https://www.facebook.com/ai.php?ego=++++++++++++

Because, http.server library parses 'raw_requestline' the wrong way, so our GET|POST|CONNECT|HEAD command will look like this:
Code:
__user+++++GET

And probably there is no do___user+++++GET, only do_GET

And here is my patch, I modified http.server's parse_request function and embed it into ProxyTool.py, just replace your ProxyTool.py with:

Code:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

"HTTP Proxy Tools, pyOpenSSL version"

_name = "ProxyTool"
__author__ = 'phoenix'
__version__ = '1.0'

import time
from datetime import datetime
import logging
import cgi
import socket
import select
import selectors
import ssl

from http.server import HTTPServer, BaseHTTPRequestHandler
from socketserver import ThreadingMixIn
from CertTool import get_cert
#Fix SERIOUS problem https://www.facebook.com/ai.php++++ 501(pink, Firefox) on www.facebook.com and youtube.com too
import http.client
import re
#Fix SERIOUS problem

from colorama import *
init(autoreset=True)

logger = logging.getLogger('__main__')

message_format = """\
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html;charset=utf-8">
        <title>Proxy Error: %(code)d</title>
    </head>
    <body>
        <h1>%(code)d: %(message)s</h1>
        <p>The following error occurred while trying to access <strong>%(url)s</strong></p>
        <p><strong>%(explain)s</strong></p>
        <hr>Generated on %(now)s by %(server)s.
    </body>
</html>
"""

def read_write(socket1, socket2):
    "Read and Write contents between 2 sockets, wait 5s for no data before return"
    start = time.time()
    with selectors.DefaultSelector() as selector:
        socket1.setblocking(False)
        socket2.setblocking(False)
        selector.register(socket1, selectors.EVENT_READ)
        selector.register(socket2, selectors.EVENT_READ)
        while True:
            tasks = selector.select(5)
            if not tasks: break
            for key, events in tasks:
                if events & selectors.EVENT_READ:
                    reader = key.fileobj
                    writer = socket2 if reader is socket1 else socket1
                    try:
                        data = reader.recv(1024)
                        if data:
                            writer.sendall(data)
                        else:
                            # EOF
                            selector.unregister(reader)
                            selector.unregister(writer)
                    except (ConnectionAbortedError, ConnectionResetError, BrokenPipeError):
                        pass
        logger.debug("took %.2Fs" % (time.time()-start))

def read_write(socket1, socket2, max_idling=10):
    "Read and Write contents between 2 sockets"
    iw = [socket1, socket2]
    ow = []
    count = 0
    while True:
        count += 1
        (ins, _, exs) = select.select(iw, ow, iw, 1)
        if exs: break
        if ins:
            for reader in ins:
                writer = socket2 if reader is socket1 else socket1
                try:
                    data = reader.recv(1024)
                    if data:
                        writer.send(data)
                        count = 0
                except (ConnectionAbortedError, ConnectionResetError, BrokenPipeError):
                    pass
        if count == max_idling: break

class ProxyRequestHandler(BaseHTTPRequestHandler):
    """RequestHandler with do_CONNECT method defined
    """
    server_version = "%s/%s" % (_name, __version__)
    # do_CONNECT() will set self.ssltunnel to override this
    ssltunnel = False
    # Override default value 'HTTP/1.0'
    protocol_version = 'HTTP/1.1'
    
    def parse_request(self):
        """Parse a request (internal).

        The request should be stored in self.raw_requestline; the results
        are in self.command, self.path, self.request_version and
        self.headers.

        Return True for success, False for failure; on failure, an
        error is sent back.

        """
#Fix SERIOUS problem https://www.facebook.com/ai.php++++ 501(pink, Firefox) on www.facebook.com and youtube.com too
        self.command = None  # set in case of error on the first line
        self.request_version = version = self.default_request_version
        self.close_connection = 1
        requestline = str(self.raw_requestline, 'iso-8859-1')
        requestline = requestline.rstrip('\r\n')
        self.requestline = requestline
        words = requestline.split()
        #if "ai.php" in words:
        
        if re.match('^(GET|POST|CONNECT|PUT|DELETE|PATCH|HEAD)$', words[0]):
            pass
        else:
            words[0] = re.sub('^.*?(GET|POST|CONNECT|PUT|DELETE|PATCH|HEAD)$', '\\1', words[0])
        #print(words[0])
        if len(words) == 3:
            command, path, version = words
            if version[:5] != 'HTTP/':
                self.send_error(400, "Bad request version (%r)" % version)
                return False
            try:
                base_version_number = version.split('/', 1)[1]
                version_number = base_version_number.split(".")
                # RFC 2145 section 3.1 says there can be only one "." and
                #   - major and minor numbers MUST be treated as
                #      separate integers;
                #   - HTTP/2.4 is a lower version than HTTP/2.13, which in
                #      turn is lower than HTTP/12.3;
                #   - Leading zeros MUST be ignored by recipients.
                if len(version_number) != 2:
                    raise ValueError
                version_number = int(version_number[0]), int(version_number[1])
            except (ValueError, IndexError):
                self.send_error(400, "Bad request version (%r)" % version)
                return False
            if version_number >= (1, 1) and self.protocol_version >= "HTTP/1.1":
                self.close_connection = 0
            if version_number >= (2, 0):
                self.send_error(505,
                          "Invalid HTTP Version (%s)" % base_version_number)
                return False
        elif len(words) == 2:
            command, path = words
            self.close_connection = 1
            if command != 'GET':
                self.send_error(400,
                                "Bad HTTP/0.9 request type (%r)" % command)
                return False
        elif not words:
            return False
        else:
            self.send_error(400, "Bad request syntax (%r)" % requestline)
            return False
        #print(command)
        self.command, self.path, self.request_version = command, path, version

        # Examine the headers and look for a Connection directive.
        try:
            self.headers = http.client.parse_headers(self.rfile,
                                                     _class=self.MessageClass)
        except http.client.LineTooLong:
            self.send_error(400, "Line too long")
            return False

        conntype = self.headers.get('Connection', "")
        if conntype.lower() == 'close':
            self.close_connection = 1
        elif (conntype.lower() == 'keep-alive' and
              self.protocol_version >= "HTTP/1.1"):
            self.close_connection = 0
        # Examine the headers and look for an Expect directive
        expect = self.headers.get('Expect', "")
        if (expect.lower() == "100-continue" and
                self.protocol_version >= "HTTP/1.1" and
                self.request_version >= "HTTP/1.1"):
            if not self.handle_expect_100():
                return False
        return True
    
    def log_message(self, format, *args):
        return

    def do_CONNECT(self):
        "Descrypt https request and dispatch to http handler"
        # request line: CONNECT www.example.com:443 HTTP/1.1
        self.host, self.port = self.path.split(":")
        # SSL MITM
        self.wfile.write(("HTTP/1.1 200 Connection established\r\n" +
                          "Proxy-agent: %s\r\n" % self.version_string() +
                          "\r\n").encode('ascii'))
        commonname = '.' + self.host.partition('.')[-1] if self.host.count('.') >= 2 else self.host
        dummycert = get_cert(commonname)
        # set a flag for do_METHOD
        self.ssltunnel = True

        ssl_sock = ssl.wrap_socket(self.connection, keyfile=dummycert, certfile=dummycert, server_side=True)
        # Ref: Lib/socketserver.py#StreamRequestHandler.setup()
        self.connection = ssl_sock
        self.rfile = self.connection.makefile('rb', self.rbufsize)
        self.wfile = self.connection.makefile('wb', self.wbufsize)
        # dispatch to do_METHOD()
        self.handle_one_request()

    def handle_one_request(self):
        """Catch more exceptions than default

        Intend to catch exceptions on local side
        Exceptions on remote side should be handled in do_*()
        """
        try:
            BaseHTTPRequestHandler.handle_one_request(self)
            return
        except (ConnectionError, FileNotFoundError) as e:
            logger.warning(Fore.RED + "%s", e)
        except (ssl.SSLEOFError, ssl.SSLError) as e:
            if hasattr(self, 'url'):
                # Happens after the tunnel is established
                logger.warning(Fore.YELLOW + '"%s" while operating on established local SSL tunnel for [%s]' % (e, self.url))
            else:
                logger.warning(Fore.YELLOW + '"%s" while trying to establish local SSL tunnel for [%s]' % (e, self.path))
        self.close_connection = 1

    def sendout_error(self, url, code, message=None, explain=None):
        "Modified from http.server.send_error() for customized display"
        try:
            shortmsg, longmsg = self.responses[code]
        except KeyError:
            shortmsg, longmsg = '???', '???'
        if message is None:
            message = shortmsg
        if explain is None:
            explain = longmsg
        content = (message_format %
                   {'code': code, 'message': message, 'explain': explain,
                    'url': url, 'now': datetime.today(), 'server': self.server_version})
        body = content.encode('UTF-8', 'replace')
        self.send_response_only(code, message)
        self.send_header("Content-Type", self.error_content_type)
        self.send_header('Content-Length', int(len(body)))
        self.end_headers()
        if self.command != 'HEAD' and code >= 200 and code not in (204, 304):
            self.wfile.write(body)

    def deny_request(self):
        self.send_response_only(403)
        self.send_header('Content-Length', 0)
        self.end_headers()


    def redirect(self, url):
        self.send_response_only(302)
        self.send_header('Content-Length', 0)
        self.send_header('Location', url)
        self.end_headers()

    def forward_to_https_proxy(self):
        "Forward https request to upstream https proxy"
        logger.debug('Using Proxy - %s' % self.proxy)
        proxy_host, proxy_port = self.proxy.split('//')[1].split(':')
        server_conn = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        try:
            server_conn.connect((proxy_host, int(proxy_port)))
            server_conn.send(('CONNECT %s HTTP/1.1\r\n\r\n' % self.path).encode('ascii'))
            server_conn.settimeout(0.1)
            datas = b''
            while True:
                try:
                    data = server_conn.recv(4096)
                except socket.timeout:
                    break
                if data:
                    datas += data
                else:
                    break
            server_conn.setblocking(True)
            if b'200' in datas and b'established' in datas.lower():
                logger.info(Fore.CYAN + '[P] SSL Pass-Thru: https://%s/' % self.path)
                self.wfile.write(("HTTP/1.1 200 Connection established\r\n" +
                                  "Proxy-agent: %s\r\n\r\n" % self.version_string()).encode('ascii'))
                read_write(self.connection, server_conn)
            else:
                logger.warning(Fore.YELLOW + 'Proxy %s failed.', self.proxy)
                if datas:
                    logger.debug(datas)
                    self.wfile.write(datas)
        finally:
            # We don't maintain a connection reuse pool, so close the connection anyway
            server_conn.close()

    def forward_to_socks5_proxy(self):
        "Forward https request to upstream socks5 proxy"
        logger.warning(Fore.YELLOW + 'Socks5 proxy not implemented yet, please use https proxy')

    def tunnel_traffic(self):
        "Tunnel traffic to remote host:port"
        logger.info(Fore.CYAN + '[D] SSL Pass-Thru: https://%s/' % self.path)
        server_conn = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        try:
            server_conn.connect((self.host, int(self.port)))
            self.wfile.write(("HTTP/1.1 200 Connection established\r\n" +
                              "Proxy-agent: %s\r\n" % self.version_string() +
                              "\r\n").encode('ascii'))
            read_write(self.connection, server_conn)
        except TimeoutError:
            self.wfile.write(b"HTTP/1.1 504 Gateway Timeout\r\n\r\n")
            logger.warning(Fore.YELLOW + 'Timed Out: https://%s:%s/' % (self.host, self.port))
        except socket.gaierror as e:
            self.wfile.write(b"HTTP/1.1 503 Service Unavailable\r\n\r\n")
            logger.warning(Fore.YELLOW + '%s: https://%s:%s/' % (e, self.host, self.port))
        finally:
            # We don't maintain a connection reuse pool, so close the connection anyway
            server_conn.close()

    def ssl_get_response(self, conn):
        try:
            server_conn = ssl.wrap_socket(conn, cert_reqs=ssl.CERT_REQUIRED, ca_certs="cacert.pem", ssl_version=ssl.PROTOCOL_TLSv1)
            server_conn.sendall(('%s %s HTTP/1.1\r\n' % (self.command, self.path)).encode('ascii'))
            server_conn.sendall(self.headers.as_bytes())
            if self.postdata:
                server_conn.sendall(self.postdata)
            while True:
                data = server_conn.recv(4096)
                if data:
                    self.wfile.write(data)
                else: break
        except (ssl.SSLEOFError, ssl.SSLError) as e:
            logger.error(Fore.RED + Style.BRIGHT + "[SSLError]")
            self.send_error(417, message="Exception %s" % str(e.__class__), explain=str(e))

    def purge_headers(self, headers):
        "Remove hop-by-hop headers that shouldn't pass through a Proxy"
        for name in ["Connection", "Keep-Alive", "Upgrade",
                     "Proxy-Connection", "Proxy-Authenticate"]:
            try:
                del headers[name]
            except:
                pass

    def write_headers(self, headers):
        self.purge_headers(headers)
        for key, value in headers.items():
            self.send_header(key, value)
        self.end_headers()
        
    def write_headers2(self, headers):
        self.purge_headers(headers)
        for key, value in headers.items():
            self.send_header(key, value)
        
    def purge_write_headers(self, headers):
        self.purge_headers(headers)
        for key, value in headers.items():
            self.send_header(key, value)
        self.end_headers()
        
    def stream_to_client(self, response):
        bufsize = 1024 * 64
        #need_chunked = 'Transfer-Encoding' in response.headers
        written = 0
        while True:
            data = response.read(bufsize)
            if not data:
                if 'Transfer-Encoding' in response.headers:
                    self.wfile.write(b'0\r\n\r\n')
                break
            if 'Transfer-Encoding' in response.headers:
                self.wfile.write(('%x\r\n' % len(data)).encode('ascii'))
            self.wfile.write(data)
            if 'Transfer-Encoding' in response.headers:
                self.wfile.write(b'\r\n')
            written += len(data)
        return written
        
    def stream_to_client2(self, response):
        bufsize = 1024 * 64
        #need_chunked = 'Transfer-Encoding' in response.headers
        written = 0
        while True:
            data = response.read(bufsize)
            if not data:
                if 'Transfer-Encoding' in response.headers:
                    self.wfile.write(b'0\r\n\r\n')
                break
            if 'Transfer-Encoding' in response.headers:
                self.wfile.write(('%x\r\n' % len(data)).encode('ascii'))
            self.wfile.write(data)
            if 'Transfer-Encoding' in response.headers:
                self.wfile.write(b'\r\n')
            written += len(data)
        return written
        
    def http_request_info(self):
        """Return HTTP request information in bytes
        """    
        context = ["CLIENT VALUES:",
                   "client_address = %s" % str(self.client_address),
                   "requestline = %s" % self.requestline,
                   "command = %s" % self.command,
                   "path = %s" % self.path,
                   "request_version = %s" % self.request_version,
                   "",
                   "SERVER VALUES:",
                   "server_version = %s" % self.server_version,
                   "sys_version = %s" % self.sys_version,
                   "protocol_version = %s" % self.protocol_version,
                   "",
                   "HEADER RECEIVED:"]
        for name, value in sorted(self.headers.items()):
            context.append("%s = %s" % (name, value.rstrip()))

        if self.command == "POST":
            context.append("\r\nPOST VALUES:")
            form = cgi.FieldStorage(fp=self.rfile,
                                    headers=self.headers,
                                    environ={'REQUEST_METHOD': 'POST'})
            for field in form.keys():
                fielditem = form[field]
                if fielditem.filename:
                    # The field contains an uploaded file
                    file_data = fielditem.file.read()
                    file_len = len(file_data)
                    context.append('Uploaded %s as "%s" (%d bytes)'
                                   % (field, fielditem.filename, file_len))
                else:
                    # Regular form value
                    context.append("%s = %s" % (field, fielditem.value))
                                    
        return("\r\n".join(context).encode('ascii'))

def demo():
    PORT = 8000

    class ProxyServer(ThreadingMixIn, HTTPServer):
        """Handle requests in a separate thread."""
        pass

    class RequestHandler(ProxyRequestHandler):
        "Displaying HTTP request information"
        server_version = "DemoProxy/0.1"

        def do_METHOD(self):
            "Universal method for GET, POST, HEAD, PUT and DELETE"
            message = self.http_request_info()
            self.send_response(200)
            # 'Content-Length' is important for HTTP/1.1
            self.send_header('Content-Length', len(message))
            self.end_headers()
            self.wfile.write(message)

        do_GET = do_POST = do_HEAD = do_PUT = do_DELETE = do_OPTIONS = do_METHOD

    print('%s serving now, <Ctrl-C> to stop ...' % RequestHandler.server_version)
    print('Listen Addr  : localhost:%s' % PORT)
    print("-" * 10)
    server = ProxyServer(('', PORT), RequestHandler)
    server.serve_forever()

if __name__ == '__main__':
    try:
        demo()
    except KeyboardInterrupt:
        print("Quitting...")
Add Thank You Quote this message in a reply
[-] The following 1 user says Thank You to cattleyavns for this post:
defconnect
Jul. 01, 2015, 08:55 AM (This post was last modified: Jul. 01, 2015 09:22 AM by cattleyavns.)
Post: #23
RE: Another Filtering Proxy
Do you think we can use AFProxy and can filter HTTPS websites without the help of pyOpenSSL ? pyOpenSSL is not a small Python lib, it requires "cffi" and "crytography", both two libs need to compile with GCC (on Linux), so it reduces portability of AFProxy, I hardly install pyOpenSSL and make it works on Lubuntu after installing a bunch of apt-get install ... So I want to replace pyOpenSSL with Python native SSL to do MITM.

So my goal is rewrite CertTool.py and remove all pyOpenSSL code with a random native Python (without C extension or have to compile it) crypto library.
Add Thank You Quote this message in a reply
Jul. 19, 2015, 07:32 AM
Post: #24
RE: Another Filtering Proxy
(Jun. 05, 2015 12:26 PM)cattleyavns Wrote:  Do you have any idea how to make thread.daemon = True work ?

We need to make the main thread not quit so that it can catch the KeyboardInterrupt exception.

Code:
...
    while True:
        time.sleep(1)
except KeyboardInterrupt:
...

(Jun. 05, 2015 12:26 PM)cattleyavns Wrote:  I found another bug, we should move ...

It should already be fixed in the last testing version here.

(Jun. 17, 2015 09:14 AM)cattleyavns Wrote:  Because, http.server library parses 'raw_requestline' the wrong way, so our GET|POST|CONNECT|HEAD command will look like this:
Code:
__user+++++GET

It's more like a browser problem because it seems it didn't compose the request line correctly. It's not the duty of http.server to validate the request commands.

(Jul. 01, 2015 08:55 AM)cattleyavns Wrote:  So my goal is rewrite CertTool.py and remove all pyOpenSSL code with a random native Python (without C extension or have to compile it) crypto library.

How is your finding? I don't think we have much choices unless a similar module becomes part of the standard Python installation.
Add Thank You Quote this message in a reply
Jul. 19, 2015, 05:53 PM (This post was last modified: Jul. 19, 2015 05:55 PM by cattleyavns.)
Post: #25
RE: Another Filtering Proxy
(Jul. 19, 2015 07:32 AM)whenever Wrote:  How is your finding? I don't think we have much choices unless a similar module becomes part of the standard Python installation.

Well, I temporary give it up at this time because I tried so much but didn't get anything. I'm trying other things, for example make AFProxy works as socks proxy, mitmproxy did that (can modify HTTPS traffic) so I think I will try to do that, here is a draft version that work well with Python 3, still It cannot decrypt HTTPS content, one more step and I can reach that.

Socks is way better than HTTP or HTTPS proxy, it works on almost all protocol like email, chat.. and it can encrypt data between client and server, so better privacy (still can block ads, modify webpage, mitmproxy did it).

mitmproxy: http://mitmproxy.org/doc/features/socksproxy.html
Make sure you import cert using mitm.it so mitmproxy can filter HTTPS traffic.
Command line: mitmdump --socks -p 1080

And set your browser's Socks5 proxy as 127.0.0.1 : 1080

Do you have any advice for me ? I think move to socks will be great!


Attached File(s)
.zip  filtered-socks5-proxy-master.zip (Size: 6.5 KB / Downloads: 562)
Add Thank You Quote this message in a reply
Jul. 20, 2015, 03:31 AM (This post was last modified: Jul. 20, 2015 03:34 AM by whenever.)
Post: #26
RE: Another Filtering Proxy
mitmproxy depends on pyOpenSSL too. Check https://github.com/mitmproxy/netlib/blob...rtutils.py

In fact, pyOpenSSL is used for making certificates only. I used to have a version of CertTool.py which uses native openssl command line tool only. You can modify it to work with other certificates manipulate tools if you like.

On the other hand, I'm not sure if content filtering is available at socks level. I had thought the socks mode of mitmproxy is just a frontend of the backend http proxy.


Attached File(s)
.zip  CertTool OpenSSL.zip (Size: 1.49 KB / Downloads: 544)
Add Thank You Quote this message in a reply
Jul. 20, 2015, 06:17 AM
Post: #27
RE: Another Filtering Proxy
Great! Thanks for sharing.

At first I also thought that mitmproxy's socks mode cannot filter webpage like HandyCache's Socks mode, but I was wrong, it probably can filter webpage, and it can filter HTTPS webpage, and it use IP address instead domain matching method (http://8.8.8.8 instead http://www.google.com for example..), but we could use Host request header to correct that limit.
Add Thank You Quote this message in a reply
Jan. 05, 2016, 04:53 PM
Post: #28
RE: Another Filtering Proxy
Hi whenever, any update for Another Filtering Proxy ?
Add Thank You Quote this message in a reply
Jan. 06, 2016, 08:44 AM
Post: #29
RE: Another Filtering Proxy
I'm sorry I haven't been working on it any more. I had been quite busy for the past half year and it seems to continue.
Add Thank You Quote this message in a reply
Jan. 06, 2016, 07:40 PM
Post: #30
RE: Another Filtering Proxy
(Jan. 06, 2016 08:44 AM)whenever Wrote:  I'm sorry I haven't been working on it any more. I had been quite busy for the past half year and it seems to continue.

Good luck, i have another question, i want to compile AFProxy to exe, what should I prepare to do that ? Thank!
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: