반응형
Scraping (Beautiful Soup)
Review
1. Scraping
import re
import requests
from bs4 import BeautifulSoup
BASE_URL = "https://movie.naver.com/movie"
def get_page(page_url):
page = requests.get(page_url)
soup = BeautifulSoup(page.content, 'html.parser')
return soup, page
def scrape_by_review_num(movie_title, review_num):
for i in range(page_range):
review_to_update = get_reviews(movie_code, page_num=i+1)
temp.update(review_to_update)
i = 0
for k,v in zip(temp.keys(),temp.values()):
if i == review_num:
break
review_list[k]=v
i+=1
return review_list
def scrape_by_page_num(movie_title, page_num=10):
reviews_con = {}
for i in range(1,page_num+1):
reviews_con.update(get_reviews(get_movie_code(movie_title),i))
reviews = [{k:v} for k,v in reviews_con.items()]
return reviews
2. DB
import os
import sqlite3
DATABASE_PATH = os.path.join(os.getcwd(), 'scrape_data.db')
conn = sqlite3.connect(DATABASE_PATH)
def store_by_page_num(movie_title, page_num=10, conn=conn):
cur = conn.cursor()
reviews_con = {}
for title in range(1,page_num+1):
reviews_con.update(get_reviews(get_movie_code(movie_title),title))
for k,v in reviews_con.items():
row = tuple([k,v,movie_title])
cur.execute(f"""INSERT INTO Review
(review_text,review_star, movie_title)
Values {row}""")
cur.execute("select * from Review")
print(cur.execute("SELECT COUNT(*) FROM Review;").fetchone()[0])
# print(row)
def init_db(conn=conn):
create_table = """CREATE TABLE Review (
id INTEGER,
review_text TEXT,
review_star FLOAT,
movie_title VARCHAR(128),
PRIMARY KEY (id)
);"""
drop_table_if_exists = "DROP TABLE IF EXISTS Review;"
cur = conn.cursor()
cur.execute(drop_table_if_exists)
cur.execute(create_table)
cur.close()
www.crummy.com/software/BeautifulSoup/bs4/doc/
Beautiful Soup Documentation — Beautiful Soup 4.9.0 documentation
Non-pretty printing If you just want a string, with no fancy formatting, you can call str() on a BeautifulSoup object (unicode() in Python 2), or on a Tag within it: str(soup) # ' I linked to example.com ' str(soup.a) # ' I linked to example.com ' The str(
www.crummy.com
반응형
'인공지능 > 데이터' 카테고리의 다른 글
트위터 API, tweepy 사용하기 (0) | 2021.03.31 |
---|---|
weather API 사용하기 (0) | 2021.03.31 |
ORM (SQL Alchemy) (0) | 2021.03.16 |
SQL DB with Python 간단사용법 (0) | 2021.03.12 |
GIT DOCKER SQL MongoDB (0) | 2021.03.09 |
댓글