[파이썬] BeautifulSoup 웹크롤링 하다가 메모...

#Beautiful Soup Documentation — Beautiful Soup 4.9.0 ...

https://www.crummy.com/software/BeautifulSoup/bs4/doc/

https://www.youtube.com/watch?v=ZTJjW7XuHIY

https://blog.naver.com/htk1019/220975205958

# a태그의 href 주소 가져올 때

soup = BeautifulSoup(res.content, 'html.parser', from_encoding='utf-8')

for anchor in soup.find_all('a'):

print(anchor.get('href','/'))

# <span class="ah_k">를 선택해서 Text만 출력할 때

soup = BeautifulSoup(res.content, 'html.parser', from_encoding='utf-8')

for anchor in soup.select('span.ah_k'):

print(anchor.get_text())

# class를 찾아서 Text를 출력할 때

#<td width="7%" class="snapshot-td2-cp" align="left" title="">Short Float</td>

#<td width="8%" class="snapshot-td2" align="left"><b>8.06%</b></td>

soup = BeautifulSoup(res.content, 'html.parser', from_encoding='utf-8')

names = soup.find_all(class_='snapshot-td2-cp')

datas = soup.find_all(class_='snapshot-td2')

for i in range(len(names)):

name = names[i].getText() #태그에서 Text만 가져온다.

data = datas[i].getText()

print(name + " " + data)

***

soup.select("title") #title태그로 선택

soup.select("p:nth-of-type(3)") #p태그로 된 3번째 자식

soup.select("body a") #body태그 안쪽에 있는 자손 중에 a태그를 선택해라, <body>태그의 모든 *자손 (자손=자식, 자식의 자식 포함)

[아래 2개는 같은 결과를 출력함]

#getText()를 이용해서 Text 가져오기

soup = BeautifulSoup(res.content, 'html.parser', from_encoding='utf-8')

text = soup.find('span',{'class':'Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)'}) #<class 'bs4.element.Tag'>

print(text.getText())

#<태그>를 없애고 Text만 남김

import re #정규 표현식을 지원하는 re 모듈

...

soup = BeautifulSoup(res.content, 'html.parser', from_encoding='utf-8')

text = soup.find('span',{'class':'Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)'}) #<class 'bs4.element.Tag'>

text = str(text)

text = re.sub('<.+?>', '', text, 0).strip() # 정규식 https://wikidocs.net/4308#match

print(text)

. : 모든문자

+ : 최소 1번이상 반복

? : 앞에 문자가 있거나 없거나 둘 다 매치

'컴퓨터 > Python' 카테고리의 다른 글

[파이썬] 소스코드.py 파일을 exe 실행 파일로 만들기 (pyinstaller) 메모 (0)	2020.10.27
[파이썬] 정규표현식 (유튜브 보면서 메모) (0)	2020.09.05
[파이썬] argv[], args, *kwargs 메모 (0)	2020.08.26
[파이썬] 전자공시(Open DART) 재무제표 크롤링 (스크랩) (2)	2020.08.25
[전자공시 DART API] 기업 고유번호 크롤링 결과 (고유번호, 회사명, 종목코드, 수정일) (0)	2020.08.24

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

끄적끄적

[파이썬] BeautifulSoup 웹크롤링 하다가 메모...

'컴퓨터 > Python' 카테고리의 다른 글

티스토리툴바