ÿþK in text files

T

Todd

Hi All,

I asked about this once before and found the answer but the
article had expired. :-(

I can open certain text files in notepad. If I open them
in Leafpad (A Linux text editor), all I get is "ÿþK".

Would the kind individual who told me was coding was missing
please re-inform me so I can ask the Leafpad folks to include
it?

Many thanks,
-T

Funny thing, I open the file in Hexedit and it look just fine.
Hmmmm.

For the time being, I am stuck with Notepad in Wine.
(I would use Notepad in Wine a lot more, except for the bug
where the Open With path leaves off the first two letters of
the path. I reported it to Wine.)
 
S

Sunny Bard

Todd said:
I can open certain text files in notepad. If I open them
in Leafpad (A Linux text editor), all I get is "ÿþK".
Sounds like some of your text files use unicode character encoding, but
leafpad doesn't handle them, only your 8 bit ASCII files ...
 
P

Paul

Todd said:
Hi All,

I asked about this once before and found the answer but the
article had expired. :-(

I can open certain text files in notepad. If I open them
in Leafpad (A Linux text editor), all I get is "ÿþK".

Would the kind individual who told me was coding was missing
please re-inform me so I can ask the Leafpad folks to include
it?

Many thanks,
-T

Funny thing, I open the file in Hexedit and it look just fine.
Hmmmm.

For the time being, I am stuck with Notepad in Wine.
(I would use Notepad in Wine a lot more, except for the bug
where the Open With path leaves off the first two letters of
the path. I reported it to Wine.)
When you have a URL in your bookmarks, where the original
article has been delete, take the URL here and try the
archive. If a original web site uses "No Robots", the site cannot
be archived. But if the web site is a regular one, you can
go back several years, and be able to read the original page.

http://www.archive.org

I'm finding fewer matches on that site, than I used to,
and I don't know exactly what that means. The archive.org
server has room for (5500) 1TB disks, and you'd think they'd
never have to throw out old content.

Paul
 
T

Todd

Sounds like some of your text files use unicode character encoding, but
leafpad doesn't handle them, only your 8 bit ASCII files ...
Hi Sunny,

If memory serves me, and I don't think it is at the moment,
I think is is Unicode 16 or some such.

-T
 
J

Jeff Layman

Hi All,

I asked about this once before and found the answer but the
article had expired. :-(

I can open certain text files in notepad. If I open them
in Leafpad (A Linux text editor), all I get is "ÿþK".

Would the kind individual who told me was coding was missing
please re-inform me so I can ask the Leafpad folks to include
it?

Many thanks,
-T

Funny thing, I open the file in Hexedit and it look just fine.
Hmmmm.

For the time being, I am stuck with Notepad in Wine.
(I would use Notepad in Wine a lot more, except for the bug
where the Open With path leaves off the first two letters of
the path. I reported it to Wine.)
Just wondering why you chose to post this in a Win7 newsgroup. Wouldn't
it be better to post in a linux group?

If you got the answer previously, would it be available through Google
Groups?
 
T

Todd

Just wondering why you chose to post this in a Win7 newsgroup. Wouldn't
it be better to post in a linux group?
Actually no. The offending file came from W7's regedit. No one
over on the Linux group would know what I am talking about.

A note about my office server/workstation. It is Linux. I run
several Virtual Machines to support the various other OS'es
that my customer's use: XP, Vista, W7, Fedora, others.
When I am doing my own work, I try to stay in the host: the VM's
are slower, although XP does a good job of keeping up.

I was modifying a .reg file for a customer that I had exported
from W7 and was annoyed that my favorite Linux text editor
(Leafpad) wouldn't read the darned thing. So I ask this group again
what the encoding was called so I could ask the Leafpad guys
to support it. Funny thing about Open Source. Open Office
(you all should switch to Libre Office ASAP) being the exception,
if you write a well documented, respectful letter to the author,
you usually get what you want.

I mention my full set up, Linux and all, because I thought it
was best to disclose everything that was going on, in case
others knew of something I was missing. Thought, when the
troll/evangelists find out you are using other OS'es as well
(Xp, Linux, etc.), you do run the risk of getting snotted on.
But, these trolls' knowledge is usually very pedestrian, so
they are never very helpful anyway. And, real deals (non-posers
and other experts) don't care. Or they just ask me what
is going on, like you did, like a professional.

A tip on getting rid of the troll/evangelists is to abbreviate
Microsoft as M$ enough times and they will eventually kill
file you. Then you are left with helpful folks -- snot free.
And, the trolls seldom add to the knowledge of man kind, other
than M$ can do no wrong.

If you got the answer previously, would it be available through Google
Groups?
For some reason, I have never been able to find my posting to
this group over on Google. Other groups, but not this one.
Do you have a tip on this?

Many thanks,
-T
 
P

Paul

Todd said:
For some reason, I have never been able to find my posting to
this group over on Google. Other groups, but not this one.
Do you have a tip on this?

Many thanks,
-T
Alt.windows7.general is a "new" group, in terms of date created.
It is carried on AIOE and Eternal-September, and probably servers
like them.

Google, on the other hand, is deaf-dumb-blind. They don't have
an effective "abuse" address. The only way alt.windows7.general
would get added to their archive, is if a valid, signed, newgroup
request (server to server messaging) of some sort was received.
And because Google is clueless, we don't even know if anyone monitors
that stuff or not. There isn't any external signs of intelligence
at Google.

Not just any "newgroup" command will work, because in the past,
hundreds of thousands of them have been created, to the point
server admins just ignored them. (When they can't be authenticated.)
It means alternatives have to be used, to manage groups.

This also caused problems for microsoft.* , because it wasn't
created by normal server to server messaging. A guy used to "fake"
the necessary messages, to make it look like Microsoft was
managing their connection to USENET. When Microsoft shut down
their own USENET server, they had the option of emitting
a couple thousand "rmgroup" messages, to cause other server
admins to consider deleting all those groups from their servers.
At least one server administrator claimed, if a valid signed
set of messages had been received, he would have considered
removing microsoft.* . As you can see, microsoft.* still
exists, and as far as I know, Google is still archiving it.

So there are proper ways to do things, and a lot of "epic fails"
along the way. And it takes a public presence (working "abuse"
address, or server admin address to send requests to), to make
a properly functioning USENET operation.

*******

Based on your description, that this is a text file from Regedit,
I was able to recreate the condition here. I tested creation of
both .txt and .reg , by exporting from Windows 7 regedit. I used
a hexeditor, to examine the file, which is where I got the hex
code from (the 0xFE thing).

The sequence in the .txt is actually

0xFF 0xFE K e y N a m e

implying this is a sixteen bit wide text encoding. And the first
two characters are a declaration of the encoding.

In the .reg file I can see

0xFF 0xFE W i n d o w s R e g i s t r y E d i t o r

so the same thing is happening.

Notepad seems to be aware of this encoding, which is why
everything appears "normal". It is even possible, if you
installed WINE on the Linux box, it comes with a
"Notepad" lookalike, which likely supports whatever encoding
that is as well. (Just the basic WINE install, should
give you a working Notepad lookalike, without actually
copying a Notepad over from elsewhere.)

And here is the answer, with regard to the encoding -

http://en.wikipedia.org/wiki/Byte_order_mark

"The byte order mark (BOM) is a Unicode character used to
signal the endianness (byte order) of a text file or stream.
Its code point is U+FEFF. BOM use is optional, and, if used,
should appear at the start of the text stream. Beyond its specific
use as a byte-order indicator, the BOM character may also indicate
which of the several Unicode representations the text is encoded in."

HTH,
Paul
 
D

Dave \Crash\ Dummy

Todd said:
Actually no. The offending file came from W7's regedit. No one over
on the Linux group would know what I am talking about.
There is your answer. Windows 7 Regedit exports files in 16 bit Unicode
format, probably to accommodate the many languages Windows supports. You
need to convert the files to 8 bit ANSI or use a Linux compatible text
editor that will accommodate Unicode.
 
T

Todd

Alt.windows7.general is a "new" group, in terms of date created.
It is carried on AIOE and Eternal-September, and probably servers
like them.

Google, on the other hand, is deaf-dumb-blind. They don't have
an effective "abuse" address. The only way alt.windows7.general
would get added to their archive, is if a valid, signed, newgroup
request (server to server messaging) of some sort was received.
And because Google is clueless, we don't even know if anyone monitors
that stuff or not. There isn't any external signs of intelligence
at Google.

Not just any "newgroup" command will work, because in the past,
hundreds of thousands of them have been created, to the point
server admins just ignored them. (When they can't be authenticated.)
It means alternatives have to be used, to manage groups.

This also caused problems for microsoft.* , because it wasn't
created by normal server to server messaging. A guy used to "fake"
the necessary messages, to make it look like Microsoft was
managing their connection to USENET. When Microsoft shut down
their own USENET server, they had the option of emitting
a couple thousand "rmgroup" messages, to cause other server
admins to consider deleting all those groups from their servers.
At least one server administrator claimed, if a valid signed
set of messages had been received, he would have considered
removing microsoft.* . As you can see, microsoft.* still
exists, and as far as I know, Google is still archiving it.

So there are proper ways to do things, and a lot of "epic fails"
along the way. And it takes a public presence (working "abuse"
address, or server admin address to send requests to), to make
a properly functioning USENET operation.

*******

Based on your description, that this is a text file from Regedit,
I was able to recreate the condition here. I tested creation of
both .txt and .reg , by exporting from Windows 7 regedit. I used
a hexeditor, to examine the file, which is where I got the hex
code from (the 0xFE thing).

The sequence in the .txt is actually

0xFF 0xFE K e y N a m e

implying this is a sixteen bit wide text encoding. And the first
two characters are a declaration of the encoding.

In the .reg file I can see

0xFF 0xFE W i n d o w s R e g i s t r y E d i t o r

so the same thing is happening.

Notepad seems to be aware of this encoding, which is why
everything appears "normal". It is even possible, if you
installed WINE on the Linux box, it comes with a
"Notepad" lookalike, which likely supports whatever encoding
that is as well. (Just the basic WINE install, should
give you a working Notepad lookalike, without actually
copying a Notepad over from elsewhere.)

And here is the answer, with regard to the encoding -

http://en.wikipedia.org/wiki/Byte_order_mark

"The byte order mark (BOM) is a Unicode character used to
signal the endianness (byte order) of a text file or stream.
Its code point is U+FEFF. BOM use is optional, and, if used,
should appear at the start of the text stream. Beyond its specific
use as a byte-order indicator, the BOM character may also indicate
which of the several Unicode representations the text is encoded in."

HTH,
Paul
Excellent response. Thank you! Now I can ask the Leafpad
guys to support it.

-T

Wine does have a Notepad. I am using it, but try to avoid it
due to a bug where Notepad drops the first two letter from
the "Open With" path.
 
T

Todd

There is your answer. Windows 7 Regedit exports files in 16 bit Unicode
format, probably to accommodate the many languages Windows supports. You
need to convert the files to 8 bit ANSI or use a Linux compatible text
editor that will accommodate Unicode.
Thank you. Thank was what I was looking for!

-T
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top