본문 바로가기
인공지능/데이터

Scraping (Beautiful Soup)

by EXUPERY 2021. 3. 16.
반응형

Scraping (Beautiful Soup)

Review

 

 


 

1. Scraping

import re
import requests
from bs4 import BeautifulSoup

BASE_URL = "https://movie.naver.com/movie"

def get_page(page_url):

    page = requests.get(page_url)
    soup = BeautifulSoup(page.content, 'html.parser')

    return soup, page



def scrape_by_review_num(movie_title, review_num):
    for i in range(page_range):
        review_to_update = get_reviews(movie_code, page_num=i+1)
        temp.update(review_to_update)

    i = 0
    for k,v in zip(temp.keys(),temp.values()):
        if i == review_num:
            break
        review_list[k]=v
        i+=1

    return review_list
    
    
    
def scrape_by_page_num(movie_title, page_num=10):
    reviews_con = {}
    for i in range(1,page_num+1):
        reviews_con.update(get_reviews(get_movie_code(movie_title),i))
    reviews = [{k:v} for k,v in reviews_con.items()]

    return reviews

 

 

2. DB

 

import os
import sqlite3

DATABASE_PATH = os.path.join(os.getcwd(), 'scrape_data.db')
conn = sqlite3.connect(DATABASE_PATH)


def store_by_page_num(movie_title, page_num=10, conn=conn):
    cur = conn.cursor()
    reviews_con = {}

	for title in range(1,page_num+1):
        reviews_con.update(get_reviews(get_movie_code(movie_title),title))

	for k,v in reviews_con.items():
        row = tuple([k,v,movie_title])
        cur.execute(f"""INSERT INTO Review 
                    (review_text,review_star, movie_title)
                    Values {row}""")
        cur.execute("select * from Review")

        print(cur.execute("SELECT COUNT(*) FROM Review;").fetchone()[0])
        # print(row)
        
        
def init_db(conn=conn):

    create_table = """CREATE TABLE Review (
                        id INTEGER,
                        review_text TEXT,
                        review_star FLOAT,
                        movie_title VARCHAR(128),
                        PRIMARY KEY (id)
                        );"""

    drop_table_if_exists = "DROP TABLE IF EXISTS Review;"

    cur = conn.cursor()

    cur.execute(drop_table_if_exists)
    cur.execute(create_table)
    cur.close()

 

 

 

www.crummy.com/software/BeautifulSoup/bs4/doc/

 

Beautiful Soup Documentation — Beautiful Soup 4.9.0 documentation

Non-pretty printing If you just want a string, with no fancy formatting, you can call str() on a BeautifulSoup object (unicode() in Python 2), or on a Tag within it: str(soup) # ' I linked to example.com ' str(soup.a) # ' I linked to example.com ' The str(

www.crummy.com

 

반응형

'인공지능 > 데이터' 카테고리의 다른 글

트위터 API, tweepy 사용하기  (0) 2021.03.31
weather API 사용하기  (0) 2021.03.31
ORM (SQL Alchemy)  (0) 2021.03.16
SQL DB with Python 간단사용법  (0) 2021.03.12
GIT DOCKER SQL MongoDB  (0) 2021.03.09

댓글