Threaded Mode | Linear Mode

**whenever** · May. 27, 2015, 02:26 PM

I spent several hours but couldn't figure out why. It seems to be related to urllib3's headers handling.

Anyway, attached is the unreleased v0.5 with an ugly patch. Let me know if it works.

cattleyavns · (This post was last modified: May. 28, 2015 07:42 AM by cattleyavns.)

Great job! I think we should rewrite self.headers.update too, I'm trying to do that now. Urllib3 headers feature is not good at least at this time, I think we should depart the whole header feature from them I use built-in as much as possible.

Can you tell me how to get this line to URLFilter.py and modify it as I want:

Code:

        headers = urllib3._collections.HTTPHeaderDict()

        [headers.add(key, value) for (key, value) in self.headers.items()]

I'm adding proxy feature to AFProxy using proxy_from_url, but I want to patch above problem by set headers = self.headers (req.headers in URLFilter.py)

I'm learning Python but I'm having a really tough question about "threading", threading with Python is not easy at all.. I would like to ask you some question and hope you will help me:
- In threading, how can we download a big file in parts but join it one by one instead wait them all finish and then join.
Code, save as .py and then run it.

Code:

import os, requests

import threading

import urllib.request, urllib.error, urllib.parse

import time

URL = "https://peach.blender.org/wp-content/uploads/poster_bunny_big.jpg"

def buildRange(value, numsplits):

    lst = []

    for i in range(numsplits):

        if i == range(numsplits):

            lst.append('%s-%s' % (int(round(1 + i * value/(numsplits*1.0),0)), int(value - round(1 + i * value/(numsplits*1.0) + value/(numsplits*1.0)-1, 0))))

        if i == 0:

            lst.append('%s-%s' % (i, int(round(1 + i * value/(numsplits*1.0) + value/(numsplits*1.0)-1, 0))))

        else:

            lst.append('%s-%s' % (int(round(1 + i * value/(numsplits*1.0),0)), int(round(1 + i * value/(numsplits*1.0) + value/(numsplits*1.0)-1, 0))))

    return lst

def main(url=None, splitBy=3):

    start_time = time.time()

    if not url:

        print("Please Enter some url to begin download.")

        return

    fileName = "1.jpg"

    sizeInBytes = requests.head(url, headers={'Accept-Encoding': 'identity'}).headers.get('content-length', None)

    print("%s bytes to download." % sizeInBytes)

    if not sizeInBytes:

        print("Size cannot be determined.")

        return

    dataDict = {}

    # split total num bytes into ranges

    ranges = buildRange(int(sizeInBytes), splitBy)

    def downloadChunk(idx, irange):

        print(idx)

        req = urllib.request.Request(url)

        req.headers['Range'] = 'bytes={}'.format(irange)

        dataDict[idx] = urllib.request.urlopen(req).read()

        print("finish: " + str(irange))

    # create one downloading thread per chunk

    downloaders = [

        threading.Thread(

            target=downloadChunk, 

            args=(idx, irange),

        )

        for idx,irange in enumerate(ranges)

        ]

    # start threads, let run in parallel, wait for all to finish

    for th in downloaders:

        th.start()

    #for th in downloaders:

        th.join()

        #print(th.join)

    print('done: got {} chunks, total {} bytes'.format(

        len(dataDict), sum( (

            len(chunk) for chunk in list(dataDict.values())

        ) )

    ))

    print("--- %s seconds ---" % str(time.time() - start_time))

    if os.path.exists(fileName):

        os.remove(fileName)

     #reassemble file in correct order

    with open(fileName, 'wb') as fh:

        for _idx,chunk in sorted(dataDict.items()):

            fh.write(chunk)

    #stream_chunk = 16 * 1024

    #with open(fileName, 'wb') as fp:

    #  while True:

    #      for _idx,chunk in sorted(dataDict.items()):

            #fh.write(chunk)

     #       chunking = chunk.read(stream_chunk)

      #      if not chunk:

       #         break

        #    fp.write(chunking)

    print("Finished Writing file %s" % fileName)

    print('file size {} bytes'.format(os.path.getsize(fileName)))

if __name__ == '__main__':

    main(URL, splitBy=3)

What I want is:
- For example we have a big file with 100MB file size
- We will split that file with Content-Length
- We will use "threading" module to download that file in parts to ensure we have as fast as possible download speed instead download one by one without threading then join part.
- But problem is with threading "join()", we cannot stream file or write file to disk instantly like Free Download Manager/Flashget software because "join()" wait for all thread finish.
- But without join(), simply this script will not work, file size return 0 byte because the file write before the download task finish.
- So I want to make threading work like this:
+ Download a file with 4 threads
+ Thread 1 download finish, stream thread 1 data then wait till thread 2 finsh, join thread 2 with thread 1, but even thread 3, 4 finish earlier than thread 2, thread 3, 4 should not join with thread 1 because that action will break the file, it must wait till thread 2 finish then join 1 with 2, then join 3, 4 with.

**whenever** · May. 28, 2015, 09:53 AM

(May. 28, 2015 04:26 AM)cattleyavns Wrote: I'm adding proxy feature to AFProxy using proxy_from_url, but I want to patch above problem by set headers = self.headers (req.headers in URLFilter.py)

I am not sure if I get your point, but self.headers is available as req.headers in URLFilter.py, and you can operate it freely as you want.

(May. 28, 2015 04:26 AM)cattleyavns Wrote: - But problem is with threading "join()", we cannot stream file or write file to disk instantly like Free Download Manager/Flashget software because "join()" wait for all thread finish.

I think you can create the file in advance, and in each thread write the data to specified offset via f.seek(offset, from_what). Also you need to take care of Semaphore acquire() and release(). They are all documented in the manual.

cattleyavns · (This post was last modified: May. 29, 2015 04:13 AM by cattleyavns.)

I also want to report a funny bug of AFProxy:
thread.dameon = True

Must be as the official documentation:
thread.daemon = True

But if I change it to thread.daemon = True, simply AFProxy stop working, so I decided to remove it by commenting it. Did we really need this line "thread.daemon = True" ?

(May. 28, 2015 09:53 AM)whenever Wrote: I am not sure if I get your point, but self.headers is available as req.headers in URLFilter.py, and you can operate it freely as you want.

Thank you, but the headers variable I want to get and change is the "headers" in headers=headers from version 0.4, I want to change it with for example "URLFilter.py", I already could change self.headers with req.headers.

(May. 28, 2015 09:53 AM)whenever Wrote: I think you can create the file in advance, and in each thread write the data to specified offset via f.seek(offset, from_what). Also you need to take care of Semaphore acquire() and release(). They are all documented in the manual.

Thank you, here is what I get so far, hope this contribute a little bit if you want to add new feature to AFProxy, also the way to do bandwidth throttling (speed limit):

Split file to parts and download and join in parallel:

Code:

import threading

import urllib.request, urllib.error, urllib.parse

import sys

max_thread = 10

# Initialize lock

lock = threading.RLock()

class Downloader(threading.Thread):

    def __init__(self, url, start_size, end_size, fobj, buffer):

        self.url = url

        self.buffer = buffer

        self.start_size = start_size

        self.end_size = end_size

        self.fobj = fobj

        threading.Thread.__init__(self)

    def run(self):

        """

vest only

        """

        with lock:

            print(('starting: %s' % self.getName()))

        self._download()

    def _download(self):

        """

I'm the one moving bricks

        """

        req = urllib.request.Request(self.url)

# Add HTTP Header (RANGE) set to download the data range

        req.headers['Range'] = 'bytes=%s-%s' % (self.start_size, self.end_size)

        f = urllib.request.urlopen(req)

# initialize the current thread file object offsets

        offset = self.start_size

        while 1:

            block = f.read(self.buffer)

# exit the current thread after data acquisition is completed

            if not block:

                with lock:

                    print(('%s done.' % self.getName()))

                break

# write data such as the time course locked threads

# Use with lock instead of the traditional lock.acquire () ..... lock.release ()

# requires python> = 2.5

            with lock:

                sys.stdout.write('%s saveing block...' % self.getName())

# Set the file object offset address

                self.fobj.seek(offset)

# write access to data

                self.fobj.write(block)

                print(block)

                offset = offset + len(block)

                sys.stdout.write('done.\n')

def main(url, thread=3, save_file='', buffer=1024):

# The maximum number of threads can not exceed max_thread

    thread = thread if thread <= max_thread else max_thread

# get file size

    req = urllib.request.urlopen(url)

    size = int(req.getheader('Content-Length'))

    print(size)

# object initialization file

    fobj = open(save_file, 'wb')

# calculated for each thread is responsible for the http Range size based on the number of threads

    avg_size, pad_size = divmod(size, thread)

    plist = []

    for i in range(thread):

        start_size = i*avg_size

        end_size = start_size + avg_size - 1

        if i == thread - 1:

# last thread plus pad_size

            end_size = end_size + pad_size + 1

        t = Downloader(url, start_size, end_size, fobj, buffer)

        plist.append(t)

# start moving bricks

    for t in plist:

        t.start()

# wait for all threads to finish

    #for t in plist:

        t.join()

# end of the course, remember to close the file object

    fobj.close()

    print('Download completed!')

if __name__ == '__main__':

    url = 'http://userscripts-mirror.org/scripts/source/57662.user.js'

    main(url=url, thread=4, save_file='a.user.js', buffer=16*1024)

Limit download speed:

Code:

"""Rate limiters with shared token bucket."""

import os

import sys

import threading

import time

import urllib.request, urllib.parse, urllib.error

import urllib.parse

class TokenBucket(object):

    """An implementation of the token bucket algorithm.

    source: http://code.activestate.com/recipes/511490/

    >>> bucket = TokenBucket(80, 0.5)

    >>> print bucket.consume(10)

    True

    >>> print bucket.consume(90)

    False

    """

    def __init__(self, tokens, fill_rate):

        """tokens is the total tokens in the bucket. fill_rate is the

        rate in tokens/second that the bucket will be refilled."""

        self.capacity = float(tokens)

        self._tokens = float(tokens)

        self.fill_rate = float(fill_rate)

        self.timestamp = time.time()

        self.lock = threading.RLock()

    def consume(self, tokens):

        """Consume tokens from the bucket. Returns 0 if there were

        sufficient tokens, otherwise the expected time until enough

        tokens become available."""

        self.lock.acquire()

        tokens = max(tokens,self.tokens)

        expected_time = (tokens - self.tokens) / self.fill_rate

        if expected_time <= 0:

            self._tokens -= tokens

        self.lock.release()

        return max(0,expected_time)

    @property

    def tokens(self):

        self.lock.acquire()

        if self._tokens < self.capacity:

            now = time.time()

            delta = self.fill_rate * (now - self.timestamp)

            self._tokens = min(self.capacity, self._tokens + delta)

            self.timestamp = now

        value = self._tokens

        self.lock.release()

        return value

class RateLimit(object):

    """Rate limit a url fetch.

    source: http://mail.python.org/pipermail/python-list/2008-January/472859.html

    (but mostly rewritten)

    """

    def __init__(self, bucket, filename):

        self.bucket = bucket

        self.last_update = 0

        self.last_downloaded_kb = 0

        self.filename = filename

        self.avg_rate = None

    def __call__(self, block_count, block_size, total_size):

        total_kb = total_size / 1024.

        downloaded_kb = (block_count * block_size) / 1024.

        just_downloaded = downloaded_kb - self.last_downloaded_kb

        self.last_downloaded_kb = downloaded_kb

        predicted_size = block_size/1024.

        wait_time = self.bucket.consume(predicted_size)

        while wait_time > 0:

            time.sleep(wait_time)

            wait_time = self.bucket.consume(predicted_size)

        now = time.time()

        delta = now - self.last_update

        if self.last_update != 0:

            if delta > 0:

                rate = just_downloaded / delta

                if self.avg_rate is not None:

                    rate = 0.9 * self.avg_rate + 0.1 * rate

                self.avg_rate = rate

            else:

                rate = self.avg_rate or 0.

            print(("%20s: %4.1f%%, %5.1f KiB/s, %.1f/%.1f KiB" % (

                    self.filename, 100. * downloaded_kb / total_kb,

                    rate, downloaded_kb, total_kb,

                )))

        self.last_update = now

def main():

    """Fetch the contents of urls"""

    rate_limit  = float(20)

    urls = {"http://userscripts-mirror.org/scripts/source/57662.user.js"}

    bucket = TokenBucket(10*rate_limit, rate_limit)

    print(("rate limit = %.1f" % (rate_limit,)))

    threads = []

    for url in urls:

        path = urllib.parse.urlparse(url,'http')[2]

        filename = os.path.basename(path)

        print(('Downloading "%s" to "%s"...' % (url,filename)))

        rate_limiter = RateLimit(bucket, filename)

        t = threading.Thread(

            target=urllib.request.urlretrieve,

            args=(url, filename, rate_limiter))

        t.start()

        threads.append(t)

    for t in threads:

        t.join()

    print('All downloads finished')

if __name__ == "__main__":

    main()

**whenever** · Jun. 03, 2015, 07:56 AM

(May. 29, 2015 04:07 AM)cattleyavns Wrote: I also want to report a funny bug of AFProxy:

Well, you found an old bug, well done!

We don't need it. We can safely remove that line.

(May. 29, 2015 04:07 AM)cattleyavns Wrote: Thank you, but the headers variable I want to get and change is the "headers" in headers=headers from version 0.4, I want to change it with for example "URLFilter.py", I already could change self.headers with req.headers.

In version 0.4, what you change via req.headers in URLFilter.py will be copied to headers via below lines in AFProxy.py. I don't think you need to do extra work.

Code:

headers = urllib3._collections.HTTPHeaderDict()

[headers.add(key, value) for (key, value) in self.headers.items()]

(May. 29, 2015 04:07 AM)cattleyavns Wrote: Thank you, here is what I get so far, hope this contribute a little bit if you want to add new feature to AFProxy, also the way to do bandwidth throttling (speed limit):

Thanks, but I want AFProxy to focus on filtering. You are free to make a new proxy based on AFProxy with whatever new features you like. Smile!

cattleyavns · (This post was last modified: Jun. 05, 2015 12:42 PM by cattleyavns.)

(Jun. 03, 2015 07:56 AM)whenever Wrote: Well, you found an old bug, well done!

We don't need it. We can safely remove that line.

Well, I think we will need daemon, without daemon we cannot "Ctrl + C" to exit AFProxy, I feel a little bit uncomfortable when using "X" button to exit instead, and without Daemon we miss "OnExit"'s event using "atexit" module (import atexit).

Do you have any idea how to make thread.daemon = True or thread.setDaemon(True) work ? I tried to recover this feature but all I got is AFProxy stop working.

(Jun. 03, 2015 07:56 AM)whenever Wrote: In version 0.4, what you change via req.headers in URLFilter.py will be copied to headers via below lines in AFProxy.py. I don't think you need to do extra work.

Code:

headers = urllib3._collections.HTTPHeaderDict() [headers.add(key, value) for (key, value) in self.headers.items()]

I found another bug, we should move

Code:

        ########## Apply HeaderFilterOut ##########

        if config.HeaderFilter:

            if self.applyFilters('HeaderFilter', 'Out') == 'GetOut':

                return

Right above "headers = urllib3._collections.HTTPHeaderDict()" in your quote above, otherwise we cannot change/add/remove headers.

(Jun. 03, 2015 07:56 AM)whenever Wrote: Thanks, but I want AFProxy to focus on filtering. You are free to make a new proxy based on AFProxy with whatever new features you like.

Great, thank you for that offer Wink

Here is my patch for AFProxy to make AFProxy work partially with socks proxy using Urllib(2) (weird, my implement look horrible, but for me better than nothing, right ?), based on version 0.4 because it is stable.

Need another module:

Code:

pip install pySocks

My implement way had my implement way problem, for example:

Code:

http://prxbx.com

-> TLSv1 ERROR

http://ghacks.net

-> Redirect forever error 30x

You might install BitviseSSHClient or AdvOr and set listen port to 10080 or change the line with "10080" with your socks proxy.

Changelog:

Code:

- Added socks support

- Moved:

        ########## Apply HeaderFilterOut ##########

        if config.HeaderFilter:

            if self.applyFilters('HeaderFilter', 'Out') == 'GetOut':

                return

right after URLFilter.

Test: http://ghacks.net/ip/

cattleyavns · (This post was last modified: Jun. 17, 2015 09:23 AM by cattleyavns.)

Okay, continue, I fixed a SERIOUS problem of http.server library:

- Technical details:
+ Use Firefox
+ Open Network tool (Tools -> Developer Tools -> Network)
+ open http://www.facebook.com
+ Find this url ('ai.php', filter this url with the Network tool's search box), response status icon filled with pink color, not green color, pink means error:

Quote:O https://www.facebook.com/ai.php?ego=++++++++++++

Because, http.server library parses 'raw_requestline' the wrong way, so our GET|POST|CONNECT|HEAD command will look like this:

Code:

__user+++++GET

And probably there is no do___user+++++GET, only do_GET

And here is my patch, I modified http.server's parse_request function and embed it into ProxyTool.py, just replace your ProxyTool.py with:

Code:

#!/usr/bin/env python3

# -*- coding: utf-8 -*-

"HTTP Proxy Tools, pyOpenSSL version"

_name = "ProxyTool"

__author__ = 'phoenix'

__version__ = '1.0'

import time

from datetime import datetime

import logging

import cgi

import socket

import select

import selectors

import ssl

from http.server import HTTPServer, BaseHTTPRequestHandler

from socketserver import ThreadingMixIn

from CertTool import get_cert

#Fix SERIOUS problem https://www.facebook.com/ai.php++++ 501(pink, Firefox) on www.facebook.com and youtube.com too

import http.client

import re

#Fix SERIOUS problem

from colorama import *

init(autoreset=True)

logger = logging.getLogger('__main__')

message_format = """\

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

<html>

    <head>

        <meta http-equiv="Content-Type" content="text/html;charset=utf-8">

        <title>Proxy Error: %(code)d</title>

    </head>

    <body>

        <h1>%(code)d: %(message)s</h1>

        <p>The following error occurred while trying to access <strong>%(url)s</strong></p>

        <p><strong>%(explain)s</strong></p>

        <hr>Generated on %(now)s by %(server)s.

    </body>

</html>

"""

def read_write(socket1, socket2):

    "Read and Write contents between 2 sockets, wait 5s for no data before return"

    start = time.time()

    with selectors.DefaultSelector() as selector:

        socket1.setblocking(False)

        socket2.setblocking(False)

        selector.register(socket1, selectors.EVENT_READ)

        selector.register(socket2, selectors.EVENT_READ)

        while True:

            tasks = selector.select(5)

            if not tasks: break

            for key, events in tasks:

                if events & selectors.EVENT_READ:

                    reader = key.fileobj

                    writer = socket2 if reader is socket1 else socket1

                    try:

                        data = reader.recv(1024)

                        if data:

                            writer.sendall(data)

                        else:

                            # EOF

                            selector.unregister(reader)

                            selector.unregister(writer)

                    except (ConnectionAbortedError, ConnectionResetError, BrokenPipeError):

                        pass

        logger.debug("took %.2Fs" % (time.time()-start))

def read_write(socket1, socket2, max_idling=10):

    "Read and Write contents between 2 sockets"

    iw = [socket1, socket2]

    ow = []

    count = 0

    while True:

        count += 1

        (ins, _, exs) = select.select(iw, ow, iw, 1)

        if exs: break

        if ins:

            for reader in ins:

                writer = socket2 if reader is socket1 else socket1

                try:

                    data = reader.recv(1024)

                    if data:

                        writer.send(data)

                        count = 0

                except (ConnectionAbortedError, ConnectionResetError, BrokenPipeError):

                    pass

        if count == max_idling: break

class ProxyRequestHandler(BaseHTTPRequestHandler):

    """RequestHandler with do_CONNECT method defined

    """

    server_version = "%s/%s" % (_name, __version__)

    # do_CONNECT() will set self.ssltunnel to override this

    ssltunnel = False

    # Override default value 'HTTP/1.0'

    protocol_version = 'HTTP/1.1'

    def parse_request(self):

        """Parse a request (internal).

        The request should be stored in self.raw_requestline; the results

        are in self.command, self.path, self.request_version and

        self.headers.

        Return True for success, False for failure; on failure, an

        error is sent back.

        """

#Fix SERIOUS problem https://www.facebook.com/ai.php++++ 501(pink, Firefox) on www.facebook.com and youtube.com too

        self.command = None  # set in case of error on the first line

        self.request_version = version = self.default_request_version

        self.close_connection = 1

        requestline = str(self.raw_requestline, 'iso-8859-1')

        requestline = requestline.rstrip('\r\n')

        self.requestline = requestline

        words = requestline.split()

        #if "ai.php" in words:

        if re.match('^(GET|POST|CONNECT|PUT|DELETE|PATCH|HEAD)$', words[0]):

            pass

        else:

            words[0] = re.sub('^.*?(GET|POST|CONNECT|PUT|DELETE|PATCH|HEAD)$', '\\1', words[0])

        #print(words[0])

        if len(words) == 3:

            command, path, version = words

            if version[:5] != 'HTTP/':

                self.send_error(400, "Bad request version (%r)" % version)

                return False

            try:

                base_version_number = version.split('/', 1)[1]

                version_number = base_version_number.split(".")

                # RFC 2145 section 3.1 says there can be only one "." and

                #   - major and minor numbers MUST be treated as

                #      separate integers;

                #   - HTTP/2.4 is a lower version than HTTP/2.13, which in

                #      turn is lower than HTTP/12.3;

                #   - Leading zeros MUST be ignored by recipients.

                if len(version_number) != 2:

                    raise ValueError

                version_number = int(version_number[0]), int(version_number[1])

            except (ValueError, IndexError):

                self.send_error(400, "Bad request version (%r)" % version)

                return False

            if version_number >= (1, 1) and self.protocol_version >= "HTTP/1.1":

                self.close_connection = 0

            if version_number >= (2, 0):

                self.send_error(505,

                          "Invalid HTTP Version (%s)" % base_version_number)

                return False

        elif len(words) == 2:

            command, path = words

            self.close_connection = 1

            if command != 'GET':

                self.send_error(400,

                                "Bad HTTP/0.9 request type (%r)" % command)

                return False

        elif not words:

            return False

        else:

            self.send_error(400, "Bad request syntax (%r)" % requestline)

            return False

        #print(command)

        self.command, self.path, self.request_version = command, path, version

        # Examine the headers and look for a Connection directive.

        try:

            self.headers = http.client.parse_headers(self.rfile,

                                                     _class=self.MessageClass)

        except http.client.LineTooLong:

            self.send_error(400, "Line too long")

            return False

        conntype = self.headers.get('Connection', "")

        if conntype.lower() == 'close':

            self.close_connection = 1

        elif (conntype.lower() == 'keep-alive' and

              self.protocol_version >= "HTTP/1.1"):

            self.close_connection = 0

        # Examine the headers and look for an Expect directive

        expect = self.headers.get('Expect', "")

        if (expect.lower() == "100-continue" and

                self.protocol_version >= "HTTP/1.1" and

                self.request_version >= "HTTP/1.1"):

            if not self.handle_expect_100():

                return False

        return True

    def log_message(self, format, *args):

        return

    def do_CONNECT(self):

        "Descrypt https request and dispatch to http handler"

        # request line: CONNECT www.example.com:443 HTTP/1.1

        self.host, self.port = self.path.split(":")

        # SSL MITM

        self.wfile.write(("HTTP/1.1 200 Connection established\r\n" +

                          "Proxy-agent: %s\r\n" % self.version_string() +

                          "\r\n").encode('ascii'))

        commonname = '.' + self.host.partition('.')[-1] if self.host.count('.') >= 2 else self.host

        dummycert = get_cert(commonname)

        # set a flag for do_METHOD

        self.ssltunnel = True

        ssl_sock = ssl.wrap_socket(self.connection, keyfile=dummycert, certfile=dummycert, server_side=True)

        # Ref: Lib/socketserver.py#StreamRequestHandler.setup()

        self.connection = ssl_sock

        self.rfile = self.connection.makefile('rb', self.rbufsize)

        self.wfile = self.connection.makefile('wb', self.wbufsize)

        # dispatch to do_METHOD()

        self.handle_one_request()

    def handle_one_request(self):

        """Catch more exceptions than default

        Intend to catch exceptions on local side

        Exceptions on remote side should be handled in do_*()

        """

        try:

            BaseHTTPRequestHandler.handle_one_request(self)

            return

        except (ConnectionError, FileNotFoundError) as e:

            logger.warning(Fore.RED + "%s", e)

        except (ssl.SSLEOFError, ssl.SSLError) as e:

            if hasattr(self, 'url'):

                # Happens after the tunnel is established

                logger.warning(Fore.YELLOW + '"%s" while operating on established local SSL tunnel for [%s]' % (e, self.url))

            else:

                logger.warning(Fore.YELLOW + '"%s" while trying to establish local SSL tunnel for [%s]' % (e, self.path))

        self.close_connection = 1

    def sendout_error(self, url, code, message=None, explain=None):

        "Modified from http.server.send_error() for customized display"

        try:

            shortmsg, longmsg = self.responses[code]

        except KeyError:

            shortmsg, longmsg = '???', '???'

        if message is None:

            message = shortmsg

        if explain is None:

            explain = longmsg

        content = (message_format %

                   {'code': code, 'message': message, 'explain': explain,

                    'url': url, 'now': datetime.today(), 'server': self.server_version})

        body = content.encode('UTF-8', 'replace')

        self.send_response_only(code, message)

        self.send_header("Content-Type", self.error_content_type)

        self.send_header('Content-Length', int(len(body)))

        self.end_headers()

        if self.command != 'HEAD' and code >= 200 and code not in (204, 304):

            self.wfile.write(body)

    def deny_request(self):

        self.send_response_only(403)

        self.send_header('Content-Length', 0)

        self.end_headers()

    def redirect(self, url):

        self.send_response_only(302)

        self.send_header('Content-Length', 0)

        self.send_header('Location', url)

        self.end_headers()

    def forward_to_https_proxy(self):

        "Forward https request to upstream https proxy"

        logger.debug('Using Proxy - %s' % self.proxy)

        proxy_host, proxy_port = self.proxy.split('//')[1].split(':')

        server_conn = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

        try:

            server_conn.connect((proxy_host, int(proxy_port)))

            server_conn.send(('CONNECT %s HTTP/1.1\r\n\r\n' % self.path).encode('ascii'))

            server_conn.settimeout(0.1)

            datas = b''

            while True:

                try:

                    data = server_conn.recv(4096)

                except socket.timeout:

                    break

                if data:

                    datas += data

                else:

                    break

            server_conn.setblocking(True)

            if b'200' in datas and b'established' in datas.lower():

                logger.info(Fore.CYAN + '[P] SSL Pass-Thru: https://%s/' % self.path)

                self.wfile.write(("HTTP/1.1 200 Connection established\r\n" +

                                  "Proxy-agent: %s\r\n\r\n" % self.version_string()).encode('ascii'))

                read_write(self.connection, server_conn)

            else:

                logger.warning(Fore.YELLOW + 'Proxy %s failed.', self.proxy)

                if datas:

                    logger.debug(datas)

                    self.wfile.write(datas)

        finally:

            # We don't maintain a connection reuse pool, so close the connection anyway

            server_conn.close()

    def forward_to_socks5_proxy(self):

        "Forward https request to upstream socks5 proxy"

        logger.warning(Fore.YELLOW + 'Socks5 proxy not implemented yet, please use https proxy')

    def tunnel_traffic(self):

        "Tunnel traffic to remote host:port"

        logger.info(Fore.CYAN + '[D] SSL Pass-Thru: https://%s/' % self.path)

        server_conn = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

        try:

            server_conn.connect((self.host, int(self.port)))

            self.wfile.write(("HTTP/1.1 200 Connection established\r\n" +

                              "Proxy-agent: %s\r\n" % self.version_string() +

                              "\r\n").encode('ascii'))

            read_write(self.connection, server_conn)

        except TimeoutError:

            self.wfile.write(b"HTTP/1.1 504 Gateway Timeout\r\n\r\n")

            logger.warning(Fore.YELLOW + 'Timed Out: https://%s:%s/' % (self.host, self.port))

        except socket.gaierror as e:

            self.wfile.write(b"HTTP/1.1 503 Service Unavailable\r\n\r\n")

            logger.warning(Fore.YELLOW + '%s: https://%s:%s/' % (e, self.host, self.port))

        finally:

            # We don't maintain a connection reuse pool, so close the connection anyway

            server_conn.close()

    def ssl_get_response(self, conn):

        try:

            server_conn = ssl.wrap_socket(conn, cert_reqs=ssl.CERT_REQUIRED, ca_certs="cacert.pem", ssl_version=ssl.PROTOCOL_TLSv1)

            server_conn.sendall(('%s %s HTTP/1.1\r\n' % (self.command, self.path)).encode('ascii'))

            server_conn.sendall(self.headers.as_bytes())

            if self.postdata:

                server_conn.sendall(self.postdata)

            while True:

                data = server_conn.recv(4096)

                if data:

                    self.wfile.write(data)

                else: break

        except (ssl.SSLEOFError, ssl.SSLError) as e:

            logger.error(Fore.RED + Style.BRIGHT + "[SSLError]")

            self.send_error(417, message="Exception %s" % str(e.__class__), explain=str(e))

    def purge_headers(self, headers):

        "Remove hop-by-hop headers that shouldn't pass through a Proxy"

        for name in ["Connection", "Keep-Alive", "Upgrade",

                     "Proxy-Connection", "Proxy-Authenticate"]:

            try:

                del headers[name]

            except:

                pass

    def write_headers(self, headers):

        self.purge_headers(headers)

        for key, value in headers.items():

            self.send_header(key, value)

        self.end_headers()

    def write_headers2(self, headers):

        self.purge_headers(headers)

        for key, value in headers.items():

            self.send_header(key, value)

    def purge_write_headers(self, headers):

        self.purge_headers(headers)

        for key, value in headers.items():

            self.send_header(key, value)

        self.end_headers()

    def stream_to_client(self, response):

        bufsize = 1024 * 64

        #need_chunked = 'Transfer-Encoding' in response.headers

        written = 0

        while True:

            data = response.read(bufsize)

            if not data:

                if 'Transfer-Encoding' in response.headers:

                    self.wfile.write(b'0\r\n\r\n')

                break

            if 'Transfer-Encoding' in response.headers:

                self.wfile.write(('%x\r\n' % len(data)).encode('ascii'))

            self.wfile.write(data)

            if 'Transfer-Encoding' in response.headers:

                self.wfile.write(b'\r\n')

            written += len(data)

        return written

    def stream_to_client2(self, response):

        bufsize = 1024 * 64

        #need_chunked = 'Transfer-Encoding' in response.headers

        written = 0

        while True:

            data = response.read(bufsize)

            if not data:

                if 'Transfer-Encoding' in response.headers:

                    self.wfile.write(b'0\r\n\r\n')

                break

            if 'Transfer-Encoding' in response.headers:

                self.wfile.write(('%x\r\n' % len(data)).encode('ascii'))

            self.wfile.write(data)

            if 'Transfer-Encoding' in response.headers:

                self.wfile.write(b'\r\n')

            written += len(data)

        return written

    def http_request_info(self):

        """Return HTTP request information in bytes

        """    

        context = ["CLIENT VALUES:",

                   "client_address = %s" % str(self.client_address),

                   "requestline = %s" % self.requestline,

                   "command = %s" % self.command,

                   "path = %s" % self.path,

                   "request_version = %s" % self.request_version,

                   "",

                   "SERVER VALUES:",

                   "server_version = %s" % self.server_version,

                   "sys_version = %s" % self.sys_version,

                   "protocol_version = %s" % self.protocol_version,

                   "",

                   "HEADER RECEIVED:"]

        for name, value in sorted(self.headers.items()):

            context.append("%s = %s" % (name, value.rstrip()))

        if self.command == "POST":

            context.append("\r\nPOST VALUES:")

            form = cgi.FieldStorage(fp=self.rfile,

                                    headers=self.headers,

                                    environ={'REQUEST_METHOD': 'POST'})

            for field in form.keys():

                fielditem = form[field]

                if fielditem.filename:

                    # The field contains an uploaded file

                    file_data = fielditem.file.read()

                    file_len = len(file_data)

                    context.append('Uploaded %s as "%s" (%d bytes)'

                                   % (field, fielditem.filename, file_len))

                else:

                    # Regular form value

                    context.append("%s = %s" % (field, fielditem.value))

        return("\r\n".join(context).encode('ascii'))

def demo():

    PORT = 8000

    class ProxyServer(ThreadingMixIn, HTTPServer):

        """Handle requests in a separate thread."""

        pass

    class RequestHandler(ProxyRequestHandler):

        "Displaying HTTP request information"

        server_version = "DemoProxy/0.1"

        def do_METHOD(self):

            "Universal method for GET, POST, HEAD, PUT and DELETE"

            message = self.http_request_info()

            self.send_response(200)

            # 'Content-Length' is important for HTTP/1.1

            self.send_header('Content-Length', len(message))

            self.end_headers()

            self.wfile.write(message)

        do_GET = do_POST = do_HEAD = do_PUT = do_DELETE = do_OPTIONS = do_METHOD

    print('%s serving now, <Ctrl-C> to stop ...' % RequestHandler.server_version)

    print('Listen Addr  : localhost:%s' % PORT)

    print("-" * 10)

    server = ProxyServer(('', PORT), RequestHandler)

    server.serve_forever()

if __name__ == '__main__':

    try:

        demo()

    except KeyboardInterrupt:

        print("Quitting...")

cattleyavns · (This post was last modified: Jul. 01, 2015 09:22 AM by cattleyavns.)

Do you think we can use AFProxy and can filter HTTPS websites without the help of pyOpenSSL ? pyOpenSSL is not a small Python lib, it requires "cffi" and "crytography", both two libs need to compile with GCC (on Linux), so it reduces portability of AFProxy, I hardly install pyOpenSSL and make it works on Lubuntu after installing a bunch of apt-get install ... So I want to replace pyOpenSSL with Python native SSL to do MITM.

So my goal is rewrite CertTool.py and remove all pyOpenSSL code with a random native Python (without C extension or have to compile it) crypto library.

**whenever** · Jul. 19, 2015, 07:32 AM

(Jun. 05, 2015 12:26 PM)cattleyavns Wrote: Do you have any idea how to make thread.daemon = True work ?

We need to make the main thread not quit so that it can catch the KeyboardInterrupt exception.

Code:

...

    while True:

        time.sleep(1)

except KeyboardInterrupt:

...

(Jun. 05, 2015 12:26 PM)cattleyavns Wrote: I found another bug, we should move ...

It should already be fixed in the last testing version here.

(Jun. 17, 2015 09:14 AM)cattleyavns Wrote: Because, http.server library parses 'raw_requestline' the wrong way, so our GET|POST|CONNECT|HEAD command will look like this:

Code:

__user+++++GET

It's more like a browser problem because it seems it didn't compose the request line correctly. It's not the duty of http.server to validate the request commands.

(Jul. 01, 2015 08:55 AM)cattleyavns Wrote: So my goal is rewrite CertTool.py and remove all pyOpenSSL code with a random native Python (without C extension or have to compile it) crypto library.

How is your finding? I don't think we have much choices unless a similar module becomes part of the standard Python installation.

cattleyavns · (This post was last modified: Jul. 19, 2015 05:55 PM by cattleyavns.)

(Jul. 19, 2015 07:32 AM)whenever Wrote: How is your finding? I don't think we have much choices unless a similar module becomes part of the standard Python installation.

Well, I temporary give it up at this time because I tried so much but didn't get anything. I'm trying other things, for example make AFProxy works as socks proxy, mitmproxy did that (can modify HTTPS traffic) so I think I will try to do that, here is a draft version that work well with Python 3, still It cannot decrypt HTTPS content, one more step and I can reach that.

Socks is way better than HTTP or HTTPS proxy, it works on almost all protocol like email, chat.. and it can encrypt data between client and server, so better privacy (still can block ads, modify webpage, mitmproxy did it).

mitmproxy: http://mitmproxy.org/doc/features/socksproxy.html
Make sure you import cert using mitm.it so mitmproxy can filter HTTPS traffic.
Command line: mitmdump --socks -p 1080

And set your browser's Socks5 proxy as 127.0.0.1 : 1080

Do you have any advice for me ? I think move to socks will be great!

**whenever** · (This post was last modified: Jul. 20, 2015 03:34 AM by whenever.)

mitmproxy depends on pyOpenSSL too. Check https://github.com/mitmproxy/netlib/blob...rtutils.py

In fact, pyOpenSSL is used for making certificates only. I used to have a version of CertTool.py which uses native openssl command line tool only. You can modify it to work with other certificates manipulate tools if you like.

On the other hand, I'm not sure if content filtering is available at socks level. I had thought the socks mode of mitmproxy is just a frontend of the backend http proxy.

cattleyavns · Jul. 20, 2015, 06:17 AM

Great! Thanks for sharing.

At first I also thought that mitmproxy's socks mode cannot filter webpage like HandyCache's Socks mode, but I was wrong, it probably can filter webpage, and it can filter HTTPS webpage, and it use IP address instead domain matching method (http://8.8.8.8 instead http://www.google.com for example..), but we could use Host request header to correct that limit.

cattleyavns · Jan. 05, 2016, 04:53 PM

Hi whenever, any update for Another Filtering Proxy ?

**whenever** · Jan. 06, 2016, 08:44 AM

I'm sorry I haven't been working on it any more. I had been quite busy for the past half year and it seems to continue.

cattleyavns · Jan. 06, 2016, 07:40 PM

(Jan. 06, 2016 08:44 AM)whenever Wrote: I'm sorry I haven't been working on it any more. I had been quite busy for the past half year and it seems to continue.

Good luck, i have another question, i want to compile AFProxy to exe, what should I prepare to do that ? Thank!