Hello
Friends
I am going to show how you can use Itextsharp to
read a table or paragraph from pdf
In last article I have shown how to download and include
itextsharp in project How to download ItextSharp
Follow this step to read a pdf using Itextsharp
Step1:
Create
a project and include a dll reference.
Step 2:
I have created folder name PDF to store pdf file, through which we will be read PDFs file.
Step 3:
on code behind we have created a method getParagraphByCoOrdinate
which read your pdf
public string[] getParagraphByCoOrdinate(string filepath,int
pageno,int cordinate1,int
coordinate2,int coordinate3,int coordinate4)
{
PdfReader
reader = new PdfReader(filepath);
iTextSharp.text.Rectangle rect = new
iTextSharp.text.Rectangle(cordinate1,
coordinate2, coordinate3, coordinate4);
RenderFilter[]
renderFilter = new RenderFilter[1];
renderFilter[0] = new RegionTextRenderFilter(rect);
ITextExtractionStrategy
textExtractionStrategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(),
renderFilter);
string
text = PdfTextExtractor.GetTextFromPage(reader,pageno,
textExtractionStrategy);
string[]
words = text.Split('\n');
return
words;
}
This method takes six parameters
File path: path to pdf file
Page no: pdf
page number to which page we want to read.
We are going to read pdf by creating a rectangle or
for a particular area for that we
are Rectangle
constructor which takes four values
public
Rectangle(float llx, float
lly, float urx, float
ury);
llx - lower left x
lly - lower left y
urx - upper right x
ury - upper right y
Now on page load we are going to pass these values
protected void
Page_Load(object sender, EventArgs e)
{
//code to get
file path
string
filepath = Server.MapPath("PDF");
filepath = filepath + "\\ak2.pdf";
// passing
cordinate to method
string[]
strarray = getParagraphByCoOrdinate(filepath, 1, 50, 850, 450, 600);
//code to
display data on webapge
Response.Write("<b>Table
1</b> <br/>");
foreach
(string str in
strarray)
{
Response.Write(str + "<br/>");
}
string[]
strarray1 = getParagraphByCoOrdinate(filepath, 1, 50, 590, 450, 505);
Response.Write("<b>Table
2</b> <br/>");
foreach
(string str in
strarray1)
{
Response.Write(str + "<br/>");
}
}
Nice tutorial
ReplyDeleteglad to hear it.
DeleteThanks
Hello, this is very nice example. But I cannot understand how the llx lly urx and ury co-ordinates work! please help this is perfect solution
ReplyDeleteHello these are co-ordinates, anything between this co-ordinates are read by getParagraphByCoOrdinate
DeleteThe coordinates I don't understand too. The program works well, but when I try to change the area for different parts in the PDF, it doesn't find them. It does not look like normal x y co-ords.
ReplyDeleteyes it not like normal x y co-ordinates i don't konw is that where it takes origin just try by hit and trial it easy to identify
Deletehello, does exist a method that can retreive data from table with respecting each column, in other words to obtain the same structure of excel file? thinks
ReplyDeleteyou can try this free online pdf to text converter(http://www.online-code.net/pdf-to-word.html) to convert pdf to txt file online.
ReplyDeleterasteredge can provide youc# add comments to pdf reader, and download it to try it free on rasteredge page http://www.rasteredge.com/how-to/csharp-imaging/pdf-html5-feature-annotate/
ReplyDeleteOcadeMliada-North Las Vegas Ana Brown https://wakelet.com/wake/249Bwb5Ub45dnP8tPjuz9
ReplyDeleteabomuntran
Ncompcorduo-ze1982 Andrea Demchuk https://www.mastercandleacademy.com/profile/Tenggelamnya-Kapal-Van-Der-Wijck-Extended-720p-39/profile
ReplyDeleteonzansofe
VnoliYliaro Adam Frank 4K Video Downloader
ReplyDeleteAVG PC TuneUp
Microsoft Visio Professional
taigrovucen
MtheophragMta_ge Erica Cain there
ReplyDeleteprograms
scoutareshos
Great and I have a neat provide: What Is In House Renovation Loan home exterior renovation contractors
ReplyDelete