自动登陆华理校园网的python脚本

前段时间实验室的校园网总是被挤掉，虽然没用python写过爬虫，但是这也是契机来学习一下标准库里的urllib, urllib2, cookielib等这些类似的Lib.于是就着手开始写了个自动检测网络连接，并自动连接校园网的python脚本(loggers|GitHub)，便于以后进行扩展，分别写了两个类, 一个class LoggerBase, 一个子类class EcustLogger(LoggerBase)其中的代码风格模仿了这学期看的一个叫CatMap|GitHub的Micro Kinetic Model的风格，从其中学习了利用字典数据类型进行脚本日志的生成等，例如
日志记录

def log(self, event, **kwargs):
    "append log info into log file"
    file_obj = open(self.log_file,'a')
    #append new log infomation
    message_template = Template(self._log_str[event])
    message = message_template.substitute(kwargs)
    append_ctnt = self._log_format % (message, '['+time.ctime()+']')
    file_obj.write(append_ctnt)
    file_obj.close()

其中日志的log_str如下：

self._log_str = {
    'match_file_fail' : 'No file link in \'${url}\'',
    'suffix_unmatch:' : ('Warning: unmatched suffix : \'${suffix_1}\' and '
                   '\'${suffix_2}\'    ,force to change to \'${suffix_2}\''),
    'illegal_path'    : 'illegal_path : ${illegal_path}',
    'download_fail'   : '${filename} download overtime',
    'download_times'  : '${filename} download time:${times} ',
    'download_time'   : '${filename} download time used : ${time_pass}s '
    }

这种生成日志的原理就是利用dict存固定格式的log语句，然后利用re模块的substitute方法替换其中的可变的字符串，然后写个log()方法吧字符串append到日志文件中。

脚本原理还是很简单的，就是不断的想处理表单的程序发送请求，获取返回信息。由于本人对于网站的学习只学习了一点点皮毛，看到学校的校园网登陆的html代码发现处理表单的是一个cgi脚本，不是我以前写网站的时候直接用一个php页面来处理(这里我不是太懂，望轻喷)。
主要登陆的方法：

def do_login(self):
    #set cookie
    cj = cookielib.CookieJar()

    url_login = self.url_login
    form_data = self.form_data_dict
        try:
            opener=urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
            urllib2.install_opener(opener)
            req=urllib2.Request(url_login,urllib.urlencode(form_data))
            u=urllib2.urlopen(req)
            return u.read().decode('utf-8').encode('gbk')
        except:
            print "Ooops! Failed to log in !>_< there may be a problem."
            return

首先先设置cookie，获取表单信息，向表单处理cgi发送request，华理的表单处理cgi在 http://172.20.13.100/cgi-bin/srun_portal 上。最后用urllib2.urlopen()抓取页面信息，转码并返回。

检测网络连接

检测网络连接就是用ping命令了，用python的subprocess module的call函数。我分别选择了华理校园网登录页面，百度，和微博三个进行检测。
网络连接一共分为四种情况：

3个网址都无法ping通: 那就return ‘connect2none’
能ping通登录首页，无法登录其他两个网站: 这就是指连接了校园网还没有登录校园网的情况。
三个网站都能ping通:
这个要进一步分析是否是没有登录校园网，以为在测试脚本的时候，会发现，即使用户没有登录校园网账号，ping外网还是能ping通，不知道这个是什么原因了，囧。那我就尝试获取页面的title的方法看是不是相同来判断是否已登录校园网。单独在LoggerBase里面写了个获取title的方法：

def get_page_title(self, url):
    page_ctnt = urllib2.urlopen(url).read()
    match = re.search(r'(<title>)(.*)(</title>)',page_ctnt)
    return match.group(2) #title

如果获取的title相同，那说明指连接到了校园网不能连接到外网。如果都不同，外网就已经连接上了。* 能连接到校园网而且能够连接到其他两个任意一个:

最终不知道为什么对于1m和4m的区分区别不开，opt选项无论是1还是2，登陆页面都显示已登陆，但是还是连不上外网。其中一些其他的form中的参数我也搞不懂是做什么的，就直接硬上了。。。最终脚本对于4m账号的有限连接有效，对于1m以及无线连接无效(´-ι_-｀)。

自动登陆华理校园网的python脚本

Comments