Skip to main content

Read pdf using ItextSharp

Hello


Friends

I am going to show how you can use Itextsharp to read a table or paragraph from pdf
In last article I have shown how to download and include itextsharp in project How to download ItextSharp 


Follow this step to read a pdf using Itextsharp

Step1: Create a project and include a dll reference.

Step 2: I have created folder name PDF to store pdf file, through which we will be read PDFs file.

Step 3: on code behind we have created a method getParagraphByCoOrdinate which read your pdf

public string[] getParagraphByCoOrdinate(string filepath,int pageno,int cordinate1,int coordinate2,int coordinate3,int coordinate4)
    {
        PdfReader reader = new PdfReader(filepath);
        iTextSharp.text.Rectangle rect = new iTextSharp.text.Rectangle(cordinate1, coordinate2, coordinate3, coordinate4);
        RenderFilter[] renderFilter = new RenderFilter[1];
        renderFilter[0] = new RegionTextRenderFilter(rect);
        ITextExtractionStrategy textExtractionStrategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), renderFilter);
        string text = PdfTextExtractor.GetTextFromPage(reader,pageno, textExtractionStrategy);
        string[] words = text.Split('\n');
        return words;


   
    }


This method takes six parameters

 File path: path to pdf file

 Page no: pdf page number to which page we want to read.

We are going to read pdf by creating a rectangle or for a particular area for that we are Rectangle constructor which takes four values
        public Rectangle(float llx, float lly, float urx, float ury);



llx - lower left x
lly - lower left y
urx - upper right x
ury - upper right y





Now on page load we are going to pass these values
 protected void Page_Load(object sender, EventArgs e)
    {

        //code to get file path
        string filepath = Server.MapPath("PDF");

        filepath = filepath + "\\ak2.pdf";
        // passing cordinate to  method
        string[] strarray = getParagraphByCoOrdinate(filepath, 1, 50, 850, 450, 600);

        //code to display data on webapge
        Response.Write("<b>Table 1</b> <br/>");
        foreach (string str in strarray)
        {
            Response.Write(str + "<br/>");
        }


       
        string[] strarray1 = getParagraphByCoOrdinate(filepath, 1, 50, 590, 450, 505);
        Response.Write("<b>Table 2</b> <br/>");
        foreach (string str in strarray1)
        {
            Response.Write(str + "<br/>");
        }
    }

You can View Online 



You can download code

Comments

  1. Hello, this is very nice example. But I cannot understand how the llx lly urx and ury co-ordinates work! please help this is perfect solution

    ReplyDelete
    Replies
    1. Hello these are co-ordinates, anything between this co-ordinates are read by getParagraphByCoOrdinate

      Delete
  2. The coordinates I don't understand too. The program works well, but when I try to change the area for different parts in the PDF, it doesn't find them. It does not look like normal x y co-ords.

    ReplyDelete
    Replies
    1. yes it not like normal x y co-ordinates i don't konw is that where it takes origin just try by hit and trial it easy to identify

      Delete
  3. hello, does exist a method that can retreive data from table with respecting each column, in other words to obtain the same structure of excel file? thinks

    ReplyDelete
  4. you can try this free online pdf to text converter(http://www.online-code.net/pdf-to-word.html) to convert pdf to txt file online.

    ReplyDelete
  5. rasteredge can provide youc# add comments to pdf reader, and download it to try it free on rasteredge page http://www.rasteredge.com/how-to/csharp-imaging/pdf-html5-feature-annotate/

    ReplyDelete

Post a Comment