2018142125 조정빈

“I, 조정빈, pledge that this assignment is my own work. I am committed to upholding the highest standards of integrity in all academic endeavors. I understand that any form of dishonesty, such as plagiarism, will not be tolerated and may result in disciplinary action.”

Graph-Based Text Analysis Report

(a) Explanation of Stop Words and Their Importance

Stop Words

Stop words are commonly used words in a language that are filtered out before processing text data. Examples include "is," "and," "the," "to," etc. These words are often insignificant in the context of extracting meaningful information from a text.

Importance in Keyword Extraction and Text Summarization

(b) TextRank Algorithm for Keyword Extraction

Process Description

  1. Preprocessing: Tokenize the text and remove stop words and punctuation.
  2. Graph Construction: Build a graph where each node represents a word, and edges represent co-occurrence within a fixed window of words.
  3. Edge Weights: Assign weights to edges based on the frequency and proximity of co-occurrence.
  4. TextRank Calculation: Apply the TextRank algorithm, which is similar to PageRank, to calculate the importance of each word (node) in the graph. This involves iterative computation until convergence.
  5. Keyword Extraction: Extract top-ranked words as keywords based on their TextRank scores.

(c) Tokenization and Part-of-Speech Tagging Analysis

Tokenization

Part-of-Speech (POS) Tagging