Question

Develop a crawler that collects the email addresses in the visited web pages. You can write...

Develop a crawler that collects the email addresses in the visited web pages. You can write a function emails() that takes a document (as a string) as input and returns the set of email addresses (i.e., strings) appearing in it. You should use a regular expression to find the email addresses in the document. Explain the code and how it works.

0 0
Add a comment Improve this question Transcribed image text
Answer #1

the code is written in the python file

It will be very easy to make you understand the python code. Just follow the below code and execute it as shown below.

import urllib.request#it is imported because we have to make a request to a site

import re# we have to use the regular expression therefore imported regular expression lib


def email(str):#function email has an argument as str which is your site url

    text=urllib.request.urlopen(str).read().decode(errors='replace')

    regex=re.compile(r'[\w.-]+@[\w.-]+')

    # \w means all characters[A-Z][a-z] and followed by it @ and after that . or domains

    # Examples. .com , .uk , .co.in

    #+ indicates that they should occur atleast once

    emaillist=re.findall(regex,text)#Findall Occurences

    print("All emails are kept in a list format shown below\n",emaillist)

    #Print all emails in the list format

    f=open("email.txt","a+")#Append all emails to text file a+

    f.write("Title:-"+str+"\n")# from which site you got email will be appended as title

    for i in emaillist:

        f.write(i+" ")

url=input("Enter your site :-")

email(url)

Execution

site containing all emails for sample data

Site url:- http://e-mailid.blogspot.com/

Create a file email.txt in the same directory of the python file

Output written to email.txt

Add a comment
Know the answer?
Add Answer to:
Develop a crawler that collects the email addresses in the visited web pages. You can write...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • (Python) Develop a crawler that collects the email addresses in the visited web pages. You can...

    (Python) Develop a crawler that collects the email addresses in the visited web pages. You can write a function emails() that takes a document (as a string) as input and returns the set of email addresses (i.e., strings) appearing in it. You should use a regular expression to find the email addresses in the document. Explain the code and how it works.

  • Below is the PHP function to sanitize incoming values from web pages. Write code to call...

    Below is the PHP function to sanitize incoming values from web pages. Write code to call this function in the if(isset(x)) code block. Example is given to you below. function mysql_fix_string($conn, $string){ if(get_magic_quotes_gpc(i) $string = stripslashes($string); } return $conn->real_escape_string($string); Il Il Il Il Il II II Il 11 11 = 11 11 = 11 global $conn; if(isset($_POST['name'])) { $name = $_POST['name']; al [Select] he function $name = mysql_fix_string($string); $name = mysql_fix_string($conn, $string); $name = mysql_fix_string($conn, $name); $name = get_magic_quotes_gpc($name);

  • In C++ write a complete and correct x++ program that generates student email addresses and prints...

    In C++ write a complete and correct x++ program that generates student email addresses and prints them to the screen based upon the data found in an input file, called students . txt. Each line of the space, followed by the student’s last name. Every email address is to be of the form [email protected]. In general, each username has four parts in this order: (i) the student; last name in lowercase letters; (ii) a number between 10- 99 generated at...

  • Assignment 14.3: Valid Email (10 pts) image source Write a program that takes as input an...

    Assignment 14.3: Valid Email (10 pts) image source Write a program that takes as input an email address, and reports to the user whether or not the email address is valid. For the purposes of this assignment, we will consider a valid email address to be one that contains an @ symbol The program must allow the user to input as many email addresses as desired until the user enters "q" to quit. For each email entered, the program should...

  • Problem 1 In this problem, you will write two functions. The first function takes in a...

    Problem 1 In this problem, you will write two functions. The first function takes in a string and returns that string without any dashes. The second function takes in the first and last name and returns a string that is lastname_firstname and uses the previous function to remove any dashes (-) in the name. Note that you’ll be testing it by calling it from the command line; you’ll call it from a script in problem 3. Deliverables: Function file that...

  • Please write the code in a text editor and explain thank you! 1. LINKED-LIST: Implement a...

    Please write the code in a text editor and explain thank you! 1. LINKED-LIST: Implement a function that finds if a given value is present in a linked-list. The function should look for the first occurrence of the value and return a true if the value is present or false if it is not present in the list. Your code should work on a linked-list of any size including an empty list. Your solution must be RECURSIVE. Non-recursive solutions will...

  • Creating the Home and Template Pages Overview In this assignment, you will start building your Web...

    Creating the Home and Template Pages Overview In this assignment, you will start building your Web site for your fictional organization by creating a homepage using HTML5 and some of the key elements that define a Web page. You are required to use either a simple text editor to write your code, or an enhanced text editor such as Brackets. Note: Microsoft Word is not a good tool for developing code because it is a document processor and not a...

  • Python! Input: For this first part you will be given a set of strings that represents a fully par...

    python! Input: For this first part you will be given a set of strings that represents a fully parenthesized infix expression that eventually (next assignment) be differentiated. The expression will be composed of single digit integers ("A"-"E"), the variable "X", parenthesis "(" and ")', and the binary operators +, -, *, /, and ^ (exponentiation). No spaces. Process: Generate a binary parse tree from the given input. Output: (1) Echo print the input string. (2) Print out the parse tree...

  • Can I see that source code in python Write the following value-returning function: encoded - Takes...

    Can I see that source code in python Write the following value-returning function: encoded - Takes a string as parameter and returns the string with each vowel substituted by its corresponding number as follows: a -1 e-2 0-4 。u_5 The function MUST USE string slicing Write a main program that will take an input string (text) from command line and it will display the input text encoded. You will call the encoded function to get the text encoded.

  • Write a definition for a class named Book with attributes title, price and author, where author...

    Write a definition for a class named Book with attributes title, price and author, where author is a Contact object and title is a String, and price is a float number. You also need to define the Contact class, which has attributes name and email, where name and email are both String. Instantiate a couple of Book objects that represents books with title, price and author (You can come up with the attributes values for each object) Write a function...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT