Post Reply 
Another Filtering Proxy
May. 28, 2015, 04:26 AM (This post was last modified: May. 28, 2015 07:42 AM by cattleyavns.)
Post: #17
RE: Another Filtering Proxy
Great job! I think we should rewrite self.headers.update too, I'm trying to do that now. Urllib3 headers feature is not good at least at this time, I think we should depart the whole header feature from them I use built-in as much as possible.

Can you tell me how to get this line to URLFilter.py and modify it as I want:
Code:
        headers = urllib3._collections.HTTPHeaderDict()
        [headers.add(key, value) for (key, value) in self.headers.items()]

I'm adding proxy feature to AFProxy using proxy_from_url, but I want to patch above problem by set headers = self.headers (req.headers in URLFilter.py)


I'm learning Python but I'm having a really tough question about "threading", threading with Python is not easy at all.. I would like to ask you some question and hope you will help me:
- In threading, how can we download a big file in parts but join it one by one instead wait them all finish and then join.
Code, save as .py and then run it.
Code:
import os, requests
import threading
import urllib.request, urllib.error, urllib.parse
import time

URL = "https://peach.blender.org/wp-content/uploads/poster_bunny_big.jpg"

def buildRange(value, numsplits):
    lst = []
    for i in range(numsplits):
        if i == range(numsplits):
            lst.append('%s-%s' % (int(round(1 + i * value/(numsplits*1.0),0)), int(value - round(1 + i * value/(numsplits*1.0) + value/(numsplits*1.0)-1, 0))))
        if i == 0:
            lst.append('%s-%s' % (i, int(round(1 + i * value/(numsplits*1.0) + value/(numsplits*1.0)-1, 0))))
        else:
            lst.append('%s-%s' % (int(round(1 + i * value/(numsplits*1.0),0)), int(round(1 + i * value/(numsplits*1.0) + value/(numsplits*1.0)-1, 0))))
    return lst

def main(url=None, splitBy=3):
    start_time = time.time()
    if not url:
        print("Please Enter some url to begin download.")
        return

    fileName = "1.jpg"
    sizeInBytes = requests.head(url, headers={'Accept-Encoding': 'identity'}).headers.get('content-length', None)
    print("%s bytes to download." % sizeInBytes)
    if not sizeInBytes:
        print("Size cannot be determined.")
        return

    dataDict = {}

    # split total num bytes into ranges
    ranges = buildRange(int(sizeInBytes), splitBy)

    def downloadChunk(idx, irange):
        print(idx)
        req = urllib.request.Request(url)
        req.headers['Range'] = 'bytes={}'.format(irange)
        dataDict[idx] = urllib.request.urlopen(req).read()
        print("finish: " + str(irange))

    # create one downloading thread per chunk
    downloaders = [
        threading.Thread(
            target=downloadChunk,
            args=(idx, irange),
        )
        for idx,irange in enumerate(ranges)
        ]

    # start threads, let run in parallel, wait for all to finish
    for th in downloaders:
        th.start()
    #for th in downloaders:
        th.join()
        #print(th.join)

    print('done: got {} chunks, total {} bytes'.format(
        len(dataDict), sum( (
            len(chunk) for chunk in list(dataDict.values())
        ) )
    ))

    print("--- %s seconds ---" % str(time.time() - start_time))

    if os.path.exists(fileName):
        os.remove(fileName)
     #reassemble file in correct order
    with open(fileName, 'wb') as fh:
        for _idx,chunk in sorted(dataDict.items()):
            fh.write(chunk)
    #stream_chunk = 16 * 1024
    #with open(fileName, 'wb') as fp:
    #  while True:
    #      for _idx,chunk in sorted(dataDict.items()):
            #fh.write(chunk)
     #       chunking = chunk.read(stream_chunk)
      #      if not chunk:
       #         break
        #    fp.write(chunking)


    print("Finished Writing file %s" % fileName)
    print('file size {} bytes'.format(os.path.getsize(fileName)))

if __name__ == '__main__':
    main(URL, splitBy=3)

What I want is:
- For example we have a big file with 100MB file size
- We will split that file with Content-Length
- We will use "threading" module to download that file in parts to ensure we have as fast as possible download speed instead download one by one without threading then join part.
- But problem is with threading "join()", we cannot stream file or write file to disk instantly like Free Download Manager/Flashget software because "join()" wait for all thread finish.
- But without join(), simply this script will not work, file size return 0 byte because the file write before the download task finish.
- So I want to make threading work like this:
+ Download a file with 4 threads
+ Thread 1 download finish, stream thread 1 data then wait till thread 2 finsh, join thread 2 with thread 1, but even thread 3, 4 finish earlier than thread 2, thread 3, 4 should not join with thread 1 because that action will break the file, it must wait till thread 2 finish then join 1 with 2, then join 3, 4 with.
Add Thank You Quote this message in a reply
Post Reply 


Messages In This Thread
Another Filtering Proxy - whenever - Nov. 22, 2014, 09:35 AM
RE: Another Filtering Proxy - whenever - Nov. 29, 2014, 11:24 AM
RE: Another Filtering Proxy - GunGunGun - Nov. 29, 2014, 02:15 PM
RE: Another Filtering Proxy - whenever - Nov. 30, 2014, 12:35 PM
RE: Another Filtering Proxy - GunGunGun - Dec. 03, 2014, 09:07 PM
RE: Another Filtering Proxy - whenever - Dec. 04, 2014, 01:09 AM
RE: Another Filtering Proxy - GunGunGun - Dec. 04, 2014, 02:33 AM
RE: Another Filtering Proxy - GunGunGun - Dec. 04, 2014, 03:27 PM
RE: Another Filtering Proxy - whenever - Dec. 05, 2014, 08:36 AM
RE: Another Filtering Proxy - GunGunGun - Dec. 05, 2014, 09:05 AM
RE: Another Filtering Proxy - whenever - Dec. 08, 2014, 03:30 AM
RE: Another Filtering Proxy - GunGunGun - Dec. 08, 2014, 09:09 AM
RE: Another Filtering Proxy - whenever - Dec. 08, 2014, 12:11 PM
RE: Another Filtering Proxy - whenever - Dec. 28, 2014, 10:50 AM
RE: Another Filtering Proxy - cattleyavns - May. 26, 2015, 06:22 AM
RE: Another Filtering Proxy - whenever - May. 27, 2015, 02:26 PM
RE: Another Filtering Proxy - cattleyavns - May. 28, 2015 04:26 AM
RE: Another Filtering Proxy - whenever - May. 28, 2015, 09:53 AM
RE: Another Filtering Proxy - cattleyavns - May. 29, 2015, 04:07 AM
RE: Another Filtering Proxy - whenever - Jun. 03, 2015, 07:56 AM
RE: Another Filtering Proxy - cattleyavns - Jun. 05, 2015, 12:26 PM
RE: Another Filtering Proxy - whenever - Jul. 19, 2015, 07:32 AM
RE: Another Filtering Proxy - cattleyavns - Jul. 19, 2015, 05:53 PM
RE: Another Filtering Proxy - cattleyavns - Jun. 17, 2015, 09:14 AM
RE: Another Filtering Proxy - cattleyavns - Jul. 01, 2015, 08:55 AM
RE: Another Filtering Proxy - whenever - Jul. 20, 2015, 03:31 AM
RE: Another Filtering Proxy - cattleyavns - Jul. 20, 2015, 06:17 AM
RE: Another Filtering Proxy - cattleyavns - Jan. 05, 2016, 04:53 PM
RE: Another Filtering Proxy - whenever - Jan. 06, 2016, 08:44 AM
RE: Another Filtering Proxy - cattleyavns - Jan. 06, 2016, 07:40 PM
RE: Another Filtering Proxy - whenever - Jan. 07, 2016, 02:25 AM
RE: Another Filtering Proxy - cattleyavns - Jan. 07, 2016, 08:41 AM
RE: Another Filtering Proxy - cattleyavns - Jan. 10, 2016, 07:27 AM
RE: Another Filtering Proxy - cattleyavns - Jan. 25, 2016, 04:37 PM
RE: Another Filtering Proxy - whenever - May. 16, 2016, 08:42 AM

Forum Jump: