发布于2021-05-30 20:45 阅读(694) 评论(0) 点赞(23) 收藏(4)
需根据浏览器版本下载Chrome的补丁文件chromedriver
from selenium import webdriver
driver=webdriver.Chrome('D:/chromedriver_win32/chromedriver.exe')# Chrome的补丁文件chromedrive下载目录
url='https://www.ptpress.com.cn/search/books'
driver.get(url)
data=driver.page_source
print(data)
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver=webdriver.Chrome('D:/chromedriver_win32/chromedriver.exe')
url='https://www.ptpress.com.cn/search/books'
driver.get(url)
wait=WebDriverWait(driver,10)# 等待时间,如果在这个时间还没找到元素就会抛出异常
print(driver.find_element_by_id("searchVal"))
<selenium.webdriver.remote.webelement.WebElement (session="8181d8d581c649abba8af74d83941a8a", element="6f9142c5-b326-4933-92b4-05e9e8f93097")>
import time
driver=webdriver.Chrome('D:/chromedriver_win32/chromedriver.exe')
url='https://www.ptpress.com.cn/search/books'
driver.get(url)
driver.execute_script('window.open()')
#print(driver.window_handles)
# 在第一个选项卡放http://www.tipdm.org
# 在第一个选项卡放http://www.tipdm.com
driver.switch_to_window(driver.window_handles[1])
driver.get('http://www.tipdm.com')
time.sleep(1)
driver.switch_to_window(driver.window_handles[0])
driver.get('http://www.tipdm.org')
driver=webdriver.Chrome('D:/chromedriver_win32/chromedriver.exe')
url='https://www.ptpress.com.cn/search/books'
driver.get(url)
# 翻页到底部
driver.execute_script('window.scrollTo(0,document.body.scrollHeight)')
driver.execute_script('alert("Python爬虫")')
driver=webdriver.Chrome('D:/chromedriver_win32/chromedriver.exe')
url='https://www.ptpress.com.cn/search/books'
driver.get(url)
input_first=driver.find_element_by_id("searchVal")
input_second=driver.find_element_by_css_selector("#searchVal")
input_third=driver.find_element_by_xpath('//*[@id="searchVal"]')
print(input_first)
print(input_second)
print(input_third)
<selenium.webdriver.remote.webelement.WebElement (session="143db49e478bb37f15f74791fccbffb3", element="726216cc-cf93-469f-95ba-24918c912096")>
<selenium.webdriver.remote.webelement.WebElement (session="143db49e478bb37f15f74791fccbffb3", element="726216cc-cf93-469f-95ba-24918c912096")>
<selenium.webdriver.remote.webelement.WebElement (session="143db49e478bb37f15f74791fccbffb3", element="726216cc-cf93-469f-95ba-24918c912096")>
此外,还可以通过By类来获取网页元素
from selenium.webdriver.common.by import By
driver=webdriver.Chrome('D:/chromedriver_win32/chromedriver.exe')
url='https://www.ptpress.com.cn/search/books'
driver.get(url)
input_first=driver.find_element(By.ID,"searchVal")
print(input_first)
<selenium.webdriver.remote.webelement.WebElement (session="485bf28500c1cbf5f4e6341baf3f6a20", element="275a8a3d-08bd-4de9-a0b9-6d24c5d38537")>
driver=webdriver.Chrome('D:/chromedriver_win32/chromedriver.exe')
url='https://www.ptpress.com.cn/search/books'
driver.get(url)
lis=driver.find_element_by_css_selector('#nav')
print(lis)
<selenium.webdriver.remote.webelement.WebElement (session="990d7412f2982c53040b045aad28cb5f", element="70fa1b17-1fd3-489f-a268-e52fda75b040")>
此外,还可以通过By类来获取网页元素
driver=webdriver.Chrome('D:/chromedriver_win32/chromedriver.exe')
url='https://www.ptpress.com.cn/search/books'
driver.get(url)
lis=driver.find_element(By.CSS_SELECTOR,'#nav')
print(lis)
<selenium.webdriver.remote.webelement.WebElement (session="0a411b2b9333f56fe7b3eb888961f643", element="5077e561-0e14-420b-94e6-587ad2be3e7f")>
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import re
import time
driver = webdriver.Chrome('D:/chromedriver_win32/chromedriver.exe')
url = 'https://www.ptpress.com.cn/search/books'
wait = WebDriverWait(driver, 10)
# 模拟搜索"Python编程"
# 打开网页
driver.get(url)
# 等待搜索按钮加载完成
search_btn = driver.find_element_by_id("searchVal")
# 在搜索框填写Python编程
search_btn.send_keys('Python编程')
# 等待确认按钮加载完成
confirm_btn = wait.until(
EC.element_to_be_clickable(
(By.CSS_SELECTOR, '#app>div:nth-child(1)>div>div>div>button>i'))
)
# 点击确认按钮
confirm_btn.click()
# 等待5秒
time.sleep(5)
html = driver.page_source
# 使用BeautifulSoup找到书籍信息的模块
soup = BeautifulSoup(html, 'lxml')
a = soup.select('.rows')
# 使用正则表达式解析书籍图片信息
ls1 = '<img src="(.*?)"/></div>'
pattern = re.compile(ls1, re.S)
res_img = re.findall(pattern, str(a))
# 使用正则表达式解析书籍文字信息
ls2 = '<img src=".*?"/></div>.*?<p>(.*?)</p></a>'
pattern1 = re.compile(ls2, re.S)
res_test = re.findall(pattern1, str(a))
print(res_test, res_img)
作者:卡卡卡
链接:http://www.phpheidong.com/blog/article/86914/a8656665d7dea12a614b/
来源:php黑洞网
任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任
昵称:
评论内容:(最多支持255个字符)
---无人问津也好,技不如人也罢,你都要试着安静下来,去做自己该做的事,而不是让内心的烦躁、焦虑,坏掉你本来就不多的热情和定力
Copyright © 2018-2021 php黑洞网 All Rights Reserved 版权所有,并保留所有权利。 京ICP备18063182号-4
投诉与举报,广告合作请联系vgs_info@163.com或QQ3083709327
免责声明:网站文章均由用户上传,仅供读者学习交流使用,禁止用做商业用途。若文章涉及色情,反动,侵权等违法信息,请向我们举报,一经核实我们会立即删除!