

发布时间:2016-07-12 10:44:08来源:linux网站作者:江洋大盗与鸭子
sudo pip install scrapy
scrapy crawl dmoz
AttributeError: 'module' object has no attribute 'Spider'
sudo pip install scrapy --upgrade
creating build/temp.linux-x86_64-2.7/src/lxml
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -Isrc/lxml/includes -I/usr/include/python2.7 -c src/lxml/lxml.etree.c -o build/temp.linux-x86_64-2.7/src/lxml/lxml.etree.o -w
In file included from src/lxml/lxml.etree.c:320:0:
src/lxml/includes/etree_defs.h:14:31: fatal error: libxml/xmlversion.h: 没有那个文件或目录
#include "libxml/xmlversion.h"
compilation terminated.
Compile failed: command 'x86_64-linux-gnu-gcc' failed with exit status 1
creating tmp
cc -I/usr/include/libxml2 -c /tmp/xmlXPathInitM_KXBh.c -o tmp/xmlXPathInitM_KXBh.o
cc tmp/xmlXPathInitM_KXBh.o -lxml2 -o a.out
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
Rolling back uninstall of lxml
Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-F1ulO4/lxml/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-OMbiRQ-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-F1ulO4/lxml/
sudo apt-get install python-dev libxml2-dev libxslt1-dev zlib1g-dev
sudo pip install lxml --upgrade
beast@beast:~/Code/python/tutorial$ sudo pip install lxml --upgradeThe directory '/home/beast/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/beast/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:318: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning.
/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:122: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
Collecting lxml
Downloading lxml-3.6.0.tar.gz (3.7MB)
100% |████████████████████| 3.7MB 213kB/s 
Installing collected packages: lxml
Found existing installation: lxml 3.3.3
Uninstalling lxml-3.3.3:
Successfully uninstalled lxml-3.3.3
Running setup.py install for lxml ... done
Successfully installed lxml-3.6.0
beast@beast:~/Code/python/tutorial$ scrapy crawl dmoz
/usr/local/lib/python2.7/dist-packages/scrapy/settings/deprecated.py:26: ScrapyDeprecationWarning: You are using the following settings which are deprecated or obsolete (ask scrapy-users@googlegroups.com for alternatives):
BOT_VERSION: no longer used (user agent defaults to Scrapy now)
warnings.warn(msg, ScrapyDeprecationWarning)
2016-07-11 16:41:56 [scrapy] INFO: Scrapy 1.1.0 started (bot: tutorial)
2016-07-11 16:41:56 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'tutorial.spiders', 'SPIDER_MODULES': ['tutorial.spiders'], 'USER_AGENT': 'tutorial/1.0', 'BOT_NAME': 'tutorial'}
2016-07-11 16:41:56 [scrapy] INFO: Enabled extensions:
2016-07-11 16:41:56 [scrapy] INFO: Enabled downloader middlewares:
2016-07-11 16:41:56 [scrapy] INFO: Enabled spider middlewares:
2016-07-11 16:41:56 [scrapy] INFO: Enabled item pipelines:
2016-07-11 16:41:56 [scrapy] INFO: Spider opened
2016-07-11 16:41:56 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-07-11 16:41:56 [scrapy] DEBUG: Telnet console listening on
2016-07-11 16:41:58 [scrapy] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: None)
2016-07-11 16:41:58 [scrapy] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: None)
2016-07-11 16:41:58 [scrapy] INFO: Closing spider (finished)
2016-07-11 16:41:58 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 472,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 16392,
'downloader/response_count': 2,
'downloader/response_status_count/200': 2,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2016, 7, 6, 8, 41, 58, 337488),
'log_count/DEBUG': 3,
'log_count/INFO': 7,
'response_received_count': 2,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'start_time': datetime.datetime(2016, 7, 6, 8, 41, 56, 777087)}
2016-07-11 16:41:58 [scrapy] INFO: Spider closed (finished)