Question

program is called grabing website data, to scrap all data from a website ( for example...

program is called grabing website data, to scrap all data from a website ( for example google ) with tabs (hyperlinks)and copying them in a file using java their is HTML code but i can't remeber it must be added and we must get these data in an array then we copy it in the file and please simply without using advanced java libraries , is there any help

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Used no advance / third party libraries just basic code divided into multiple methods.

in main method find todo section and update

save the below code as GrabData.java

import java.io.BufferedReader;
import java.io.FileWriter;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class GrabData {

    /**
     * function to write links to text file
     * @param fileName
     * @param links
     */
    public static void writeTofFile(String fileName,List<String>links){
        try (FileWriter writer = new FileWriter(fileName)) {
            for(String link:links)
                writer.write(link+"\n");
            writer.flush();
        }catch (Exception e){
            e.printStackTrace();
            System.exit(0);
        }

    }

    /**
     * function to retrieve html htmldata from given url
     * @param theUrl
     * @return
     */
    public static String getUrlContents(String theUrl) {
        // empty stringbuilder for adding string data for html
        StringBuilder htmldata = new StringBuilder();
        try {
            // building url object  with given url
            URL url = new URL(theUrl);
            URLConnection connection = url.openConnection();
            BufferedReader bf = new BufferedReader(new InputStreamReader(connection.getInputStream()));
            String temp;
            // reading line by line html content
            while ((temp = bf.readLine()) != null) {
                htmldata.append(temp + "\n");
            }
            bf.close();
        }
        catch(Exception e) {
            e.printStackTrace();
            System.exit(0);
        }
        return htmldata.toString();
    }

    /**
     * function to retrieve all links from html page
     * @param HTMLPage
     * @return
     */
    public static ArrayList<String> getAllLinks( String HTMLPage){
        Pattern linkPat =  Pattern.compile("(<a[^>]+>.+?<\\/a>)",  Pattern.CASE_INSENSITIVE|Pattern.DOTALL);
        Matcher pageMatcher = linkPat.matcher(HTMLPage);
        // Array for adding all links
        ArrayList<String> links = new ArrayList<String>();
        while(pageMatcher.find()){
            String s =pageMatcher.group();
            if(s.contains("https://") || s.contains("http://")) {
                String link=s.split("href=")[1].split(">")[0].split("\"")[1];
                links.add(link);
            }
        }
        return links;
    }
    public static void main(String[] args) {
        // TODO: change the URL name as
        final String URL = "https://www.google.com";
        // TODO: change the file name as per needed
        final String fileName = "urls.txt";
        String htmlPage = getUrlContents(URL);
        List<String> links = getAllLinks(htmlPage);
        writeTofFile(fileName,links);
        System.out.println("OUTPUT WRITTEN TO "+fileName);
    }

}

//OUTPUT

OUTPUT WRITTEN TO urls.txt

a txt file will be generated that contains all urls

Please do comment in case of any concern...

Add a comment
Know the answer?
Add Answer to:
program is called grabing website data, to scrap all data from a website ( for example...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • i need a Java program to grab all website content with hyperlinks and all contents using...

    i need a Java program to grab all website content with hyperlinks and all contents using array and loop and thank u

  • For this program, you will be working with data from the NASA website which lists Near...

    For this program, you will be working with data from the NASA website which lists Near Earth Objects detected by the JPL Sentry System. You are given a text file listing the designation and impact probability (with Earth, generally within the next 100 years) of 585 Near Earth Objects. Your job will be to sort these objects by their impact probabilities. Input File Format The input file contains 585 records. Each record is on a separate line. Each line contains...

  • Write a complete Java program in a file caled Module1Progrom.java that reads all the tokens from...

    Write a complete Java program in a file caled Module1Progrom.java that reads all the tokens from a file named dota.txt that is to be found in the same directory as the running program. The program should ignore al tokens that cannot be read as an integer and read only the ones that can. After reading all of the integers, the program should print all the integers back to the screen, one per line, from the smalest to the largest. For...

  • Write a C++ program that demonstrates use of programmer-defined data structures (structs), an array of structs, passing...

    Write a C++ program that demonstrates use of programmer-defined data structures (structs), an array of structs, passing an array of structs to a function, and returning a struct as the value of a function. A function in the program should read several rows of data from a text file. (The data file should accompany this assignment.) Each row of the file contains a month name, a high temperature, and a low temperature. The main program should call another function which...

  • This program has an array of floating point numbers as a private data member of a...

    This program has an array of floating point numbers as a private data member of a class. The data file contains floating point temperatures which are read by a member function of the class and stored in the array. Exercise 1: Why does the member function printList have a const after its name but getList does not? Exercise 2: Fill in the code so that the program reads in the data values from the temperature file and prints them to...

  • Write a program called CountFlips whose main method flips a coin 100 times and counts how...

    Write a program called CountFlips whose main method flips a coin 100 times and counts how many times each side comes up. Make sure to include comments of what is being done in the code. Verify whether the head comes up or not by calling isHeads method of Coin class. Increment the heads count if head comes up. Increment the tails count if tail comes up. Display the head and tails counts. Print the results SEND AS A TEXT FILE...

  • Write a simple Java program with the following naming structure: Open Eclipse Create a workspace called...

    Write a simple Java program with the following naming structure: Open Eclipse Create a workspace called hw1 Create a project called hw1 (make sure you select the “Use project folder as root for sources and class files”) Create a class called Hw1 in the hw1 package (make sure you check the box that auto creates the main method). Add a comment to the main that includes your name Write code that demonstrates the use of each of the following basic...

  • The name of the C++ file must be search.cpp Write a program that will read data...

    The name of the C++ file must be search.cpp Write a program that will read data from a file. The program will allow the user to specify the filename. Use a loop that will check if the file is opened correctly, otherwise display an error message and allow the user to re-enter a filename until successful. Read the values from the file and store into an integer array. The program should then prompt the user for an integer which will...

  • 1. Pleae choose one stock data from Yahoo Finance website. 2. Please use RStudio program for...

    1. Pleae choose one stock data from Yahoo Finance website. 2. Please use RStudio program for data analysis. 3. Please do (1) Polinomial equation modeling, (2)ARIMA modeling, (3)Model diagonatics, (4) GARCH modeling with your chosen stock data. 4. Please compare and intrepret of outputs of the four models. use these r codes to get the data, please use RStudio to answer these questiones, and please provides the R code. install.packages("TSA") install.packages("tseries") library(tseries) library(TSA) con <- url("https://finance.yahoo.com") if(!inherits(try(open(con), silent = TRUE),...

  • Write a program that demonstrates use of programmer - defined data structures. Please provide code! Thank...

    Write a program that demonstrates use of programmer - defined data structures. Please provide code! Thank you. Here are the temps given: January 47 36 February 51 37 March 57 39 April 62 43 May 69 48 June 73 52 July 81 56 August 83 57 September 81 52 October 64 46 November 52 41 December 45 35 Janual line Iranin Note: This program is similar to another recently assigned program, except that it uses struct and an array of...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT