Picking out text from a screenshot

Metspitzer · Aug 12, 2013

I take screenshots that contain a lot of text. Is there a built in
program (Win7) that will convert the image to text?

Peter Jason · Aug 12, 2013

I take screenshots that contain a lot of text. Is there a built in
program (Win7) that will convert the image to text?

Perhaps if you load it into AcrobatX and OCR it?

Ed Cryer · Aug 12, 2013

Metspitzer said:
I take screenshots that contain a lot of text. Is there a built in
program (Win7) that will convert the image to text?

What you want is an OCR program. I'll leave you to google for one, but
I've just hit on this one;
http://www.free-ocr.com/
I've never tried it, but your situation seems ideal for a test run.
Try one and let us know.

Ed

James Silverton · Aug 12, 2013

Perhaps if you load it into AcrobatX and OCR it?

PureText may do what you want. It's free and I use it a lot.

J. P. Gilliver (John) · Aug 12, 2013

Metspitzer said:
I take screenshots that contain a lot of text. Is there a built in
program (Win7) that will convert the image to text?

I don't know the answer (though I suspect not), but if you have Office,
that has some OCR ability.

What are you going to do with the non-text parts? How are you going to
handle overlapping window parts? Is there a reason you can't just use
highlight-and-copy anyway?
--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)Ar@T+H+Sh0!:`)DNAf

"I'd give my right arm to be ambidextrous"

I already am largely ambisinistral.

John · Aug 12, 2013

I take screenshots that contain a lot of text. Is there a built in
program (Win7) that will convert the image to text?

Anything that says "OCR" or "Optical Character Reading" should work.
Something called "ABBYY FineReader" came with one of my scanners.
I can't find anything inbuilt into Win7 but this:
http://www.sevenforums.com/software/217440-victory-irfanview-ocr.html
might be useful.
J.

John · Aug 12, 2013

Anything that says "OCR" or "Optical Character Reading" should work.
Something called "ABBYY FineReader" came with one of my scanners.
I can't find anything inbuilt into Win7 but this:
http://www.sevenforums.com/software/217440-victory-irfanview-ocr.html
might be useful.
J.

Or you could have a read of :

http://answers.microsoft.com/en-us/...ionality/0c90f381-40cb-41ad-8e5e-25831dd8989f
which is very authoritative.
Sort of.
J.

Paul · Aug 12, 2013

Metspitzer said:
I take screenshots that contain a lot of text. Is there a built in
program (Win7) that will convert the image to text?

You're looking for OCR. (That's a general function,
to go from a pixmap, to a string of text, perhaps
output in Word format.)

And generally that's something you pay for. I don't
know if any of free ones are "worthy" or not.

http://en.wikipedia.org/wiki/List_of_optical_character_recognition_software

*******

But another area that tries to do things like that,
are "screen readers" or text to voice functions. They
need to vocalize the text they seen on the screen,
for the visually impaired. This doesn't immediately
solve your problem, but the article shows there are
other "hooks" in the system, that can help acquire
the text strings you want.

http://en.wikipedia.org/wiki/Screen_reader

You would need a screen reader, that happens to keep a
text copy of "what it saw". That then, would be a
"poor man's OCR", relying on messages from the system
for the details. That is better than starting from
scratch, picking apart pixmaps.

Paul

J. P. Gilliver (John) · Aug 12, 2013

James Silverton said:
PureText may do what you want. It's free and I use it a lot.

He did say built in - or is AcrobatX part of 7?

(As others have said, what you need is OCR: screenshots of plain text
should give near 100% accuracy. There are a few free ones, or if you
have a scanner, I'd be slightly surprised if it didn't come with some.)

Robin Bignall · Aug 13, 2013

He did say built in - or is AcrobatX part of 7?

(As others have said, what you need is OCR: screenshots of plain text
should give near 100% accuracy. There are a few free ones, or if you
have a scanner, I'd be slightly surprised if it didn't come with some.)

Mine came with the ABBYY OCR program, that has quite a clever screen
copier. Very good, so I bought the Pro version (not cheap).

Robin Bignall · Aug 13, 2013

I don't know the answer (though I suspect not), but if you have Office,
that has some OCR ability.

I use the ABBYY OCR program that came with my scanner.*

What are you going to do with the non-text parts?

Copy them to a graphics program.

How are you going to handle overlapping window parts?

ABBYY allows just text, or just graphics or both, to a whole bunch of
places: clipboard, file, Word etc. You can choose whole screen or bits
of it, such as a window.

Is there a reason you can't just use highlight-and-copy anyway?

Dunno. Never tried.

* Any decent OCR program with a screen copier should be able to do what you asked.

Peter Jason · Aug 13, 2013

He did say built in - or is AcrobatX part of 7?

.......uh, these technical matters confuse me. I
use the Acrobat thing because it's fast and has a
very good search. It is compatible with Win7.

Metspitzer · Aug 13, 2013

I don't know the answer (though I suspect not), but if you have Office,
that has some OCR ability.

What are you going to do with the non-text parts? How are you going to
handle overlapping window parts? Is there a reason you can't just use
highlight-and-copy anyway?

Highlight and copy is all I want to do. Is there a way to do that
with a jpg image?
Win7 defaults to Windows photo viewer. What should I be using?

Metspitzer · Aug 13, 2013

Anything that says "OCR" or "Optical Character Reading" should work.
Something called "ABBYY FineReader" came with one of my scanners.
I can't find anything inbuilt into Win7 but this:
http://www.sevenforums.com/software/217440-victory-irfanview-ocr.html
might be useful.
J.

I bookmarked this. I have used IrfanView before. It seemed useful.
Thanks

Metspitzer · Aug 13, 2013

Or you could have a read of :

http://answers.microsoft.com/en-us/...ionality/0c90f381-40cb-41ad-8e5e-25831dd8989f
which is very authoritative.
Sort of.

I have a Canon Image class scanner. I may give the OCR a look.
Thanks

Metspitzer · Aug 13, 2013

You're looking for OCR. (That's a general function,
to go from a pixmap, to a string of text, perhaps
output in Word format.)

And generally that's something you pay for. I don't
know if any of free ones are "worthy" or not.

http://en.wikipedia.org/wiki/List_of_optical_character_recognition_software

*******

But another area that tries to do things like that,
are "screen readers" or text to voice functions. They
need to vocalize the text they seen on the screen,
for the visually impaired. This doesn't immediately
solve your problem, but the article shows there are
other "hooks" in the system, that can help acquire
the text strings you want.

http://en.wikipedia.org/wiki/Screen_reader

You would need a screen reader, that happens to keep a
text copy of "what it saw". That then, would be a
"poor man's OCR", relying on messages from the system
for the details. That is better than starting from
scratch, picking apart pixmaps.

Paul

OCR. Got it.
Thanks

Paul · Aug 13, 2013

Metspitzer said:
OCR. Got it.
Thanks

I did a test, and you can see a "partial" result here.

http://imageshack.us/a/img849/3530/mak3.png

There is a problem with your idea. The problem with screen
captures, is things like ClearType. If your OS has
ClearType enabled, it puts "color fringes" around
the letters.

http://en.wikipedia.org/wiki/Cleartype

*******

For my experiment, I chose to view some text in a web browser
(rather than some dialog box).

I chose a couple ways to capture the web page. One was "Export to PDF",
which avoids ClearType and renders the web page into a PDF. That
gives a clean copy of the screen. I converted the PDF to an image, so
I could pretend that test file, came from a paper scanner.

The second method, I used "screen capture" of the web page,
to capture it. Doing screen capture, also captures the
effects of ClearType.

In my Imageshack screenshot, the upper left is an "Export To PDF"
method, while the lower left is via screen capture. You can see
the color fringes around the text in the lower left.

When I ran OCR on the image in the lower left (with the color
fringes), the recognition rate was 0%. Nothing got captured.
There was no text to wipe over and copy/paste.

For the view in the upper right, there I took a picture copy
of the PDF (so the OCR could work on it), and brought it over
to my OCR tool. You can see in the upper right "results",
I managed to wipe over some selections. In Acrobat Paper Capture,
if you can wipe the text cursor over the surface of the document,
and things highlight, that means the OCR step worked properly.
Since Adobe Paper Capture (in Acrobat), layers the text strings
on top of the original image, you can check for proper character
recognition, by looking for differences between the string
on top the image, and the image itself underneath. In my upper-right
example, you can see there are no differences, or 100% recognition
in the sample area. (I zoomed in, to make those examples easier
to see, but the whole document on the upper right, was clean like that.)

Summary: Screen capture sucks as an information source, unless you're
very careful to turn off any screen anti-aliasing method.

Paul

Metspitzer · Aug 13, 2013

I did a test, and you can see a "partial" result here.

http://imageshack.us/a/img849/3530/mak3.png

There is a problem with your idea. The problem with screen
captures, is things like ClearType. If your OS has
ClearType enabled, it puts "color fringes" around
the letters.

http://en.wikipedia.org/wiki/Cleartype

*******

For my experiment, I chose to view some text in a web browser
(rather than some dialog box).

I chose a couple ways to capture the web page. One was "Export to PDF",
which avoids ClearType and renders the web page into a PDF. That
gives a clean copy of the screen. I converted the PDF to an image, so
I could pretend that test file, came from a paper scanner.

The second method, I used "screen capture" of the web page,
to capture it. Doing screen capture, also captures the
effects of ClearType.

In my Imageshack screenshot, the upper left is an "Export To PDF"
method, while the lower left is via screen capture. You can see
the color fringes around the text in the lower left.

When I ran OCR on the image in the lower left (with the color
fringes), the recognition rate was 0%. Nothing got captured.
There was no text to wipe over and copy/paste.

For the view in the upper right, there I took a picture copy
of the PDF (so the OCR could work on it), and brought it over
to my OCR tool. You can see in the upper right "results",
I managed to wipe over some selections. In Acrobat Paper Capture,
if you can wipe the text cursor over the surface of the document,
and things highlight, that means the OCR step worked properly.
Since Adobe Paper Capture (in Acrobat), layers the text strings
on top of the original image, you can check for proper character
recognition, by looking for differences between the string
on top the image, and the image itself underneath. In my upper-right
example, you can see there are no differences, or 100% recognition
in the sample area. (I zoomed in, to make those examples easier
to see, but the whole document on the upper right, was clean like that.)

Summary: Screen capture sucks as an information source, unless you're
very careful to turn off any screen anti-aliasing method.

Paul

Thanks for that info. It is really a shame. I would have thought
that a computer would be pretty good at recognizing typed text.

Gene E. Bloch · Aug 13, 2013

Thanks for that info. It is really a shame. I would have thought
that a computer would be pretty good at recognizing typed text.

And you would have been right.

Ed Cryer · Aug 13, 2013

Gene said:
And you would have been right.

I've just put FreeOCR to a rigorous test, and it passed with flying colours.

Ed

Picking out text from a screenshot

Metspitzer

Peter Jason

Ed Cryer

James Silverton

J. P. Gilliver (John)

John

John

Paul

J. P. Gilliver (John)

Robin Bignall

Robin Bignall

Peter Jason

Metspitzer

Metspitzer

Metspitzer

Metspitzer

Paul

Metspitzer

Gene E. Bloch

Ed Cryer