Jose Figueroa-O'Farrill
j.m.f****@ed*****
Tue Sep 13 19:20:28 JST 2005
Hi, I am having some problems with encoding which I hope someone in this list can help me with. I am not sure that they are Carbon Emacs specific, hence if you think this post is off-topic, please direct me to a more appropriate list. Let me preface by stating that despite previous NeXTStep experiences in the early 90's, I have been using Mac OS X (and the "Japanese" Carbon Emacs) for only a couple of months. I have to say that it's on the whole a very satisfying working environment and I am very grateful to the maintainers. Domo arigato gozaimass! Now, since late 2001 and until my switch to Mac OS X I was using XEmacs (native) on a notebook running Windows XP [I'm not proud, but there you go: there were good reasons at the time]. This XEmacs was major version 21 and did not have MULE support. As a result my expertise with coding systems is minimal and hence my present woes. I have been using iso-accents-mode with default encoding iso-8859-1. As Carbon Emacs reminds me once in a while, iso-accents-mode is now deprecated, but I'm not sure what exactly has replaced it. In any case, on occasion I have used the Mac keyboard sequences (Option-E e for é, for example) and my problems persist. Here are two problems that I'm experiencing: Problem 1 --------- Suppose I type accented characters (like é,ü,î,...) either using their Mac keyboard sequences or the ones in iso-accents-mode into a Carbon Emacs buffer. If I then copy and paste them from the Carbon Emacs buffer into, e.g., Mail.app then I get this -------------- next part -------------- A non-text attachment was scrubbed... Name: Picture 1.png Type: image/png Size: 9913 bytes Desc: not available Url : http://lists.sourceforge.jp/mailman/archives/macemacsjp-english/attachments/20050913/9b65a55d/attachment.png -------------- next part -------------- If I now copy this from Mail.app and paste it back into a Carbon Emacs buffer I get this: -------------- next part -------------- A non-text attachment was scrubbed... Name: Picture 2.png Type: image/png Size: 5092 bytes Desc: not available Url : http://lists.sourceforge.jp/mailman/archives/macemacsjp-english/attachments/20050913/9b65a55d/attachment-0001.png -------------- next part -------------- which is not what I started with. It is not just Mail.app this happens with, but also with my university's webmail using browsers like Camino or Safari. Problem 2 --------- I often edit HTML files which reside in a remote host. These files are encoded using utf-8. When I used XEmacs under Windows XP, the files were saved by XEmacs in iso-8859-1 and then I used 'iconv' to convert them to utf-8. When I edit these files now using Carbon Emacs, I don't see the accented letters. For example, instead of -------------- next part -------------- A non-text attachment was scrubbed... Name: Picture 4.png Type: image/png Size: 2748 bytes Desc: not available Url : http://lists.sourceforge.jp/mailman/archives/macemacsjp-english/attachments/20050913/9b65a55d/attachment-0002.png -------------- next part -------------- I see this. -------------- next part -------------- A non-text attachment was scrubbed... Name: Picture 3.png Type: image/png Size: 2947 bytes Desc: not available Url : http://lists.sourceforge.jp/mailman/archives/macemacsjp-english/attachments/20050913/9b65a55d/attachment-0003.png -------------- next part -------------- It seems to me that Emacs is assuming that this is an 8-bit encoding and finding a 16-bit character breaks it up into 2 characters. Shouldn't Emacs recognise the encoding of the file and act accordingly? Or is this happening because I'm somehow preventing it from doing so, which brings me to my last question: what should I have in the .emacs file? Right now, the relevant lines in my .emacs file seem to be the following: ;; coding system nightmare (custom-set-variables '(enable-multibyte-characters t) '(keyboard-coding-system (quote mac-roman)) '(utf-translate-cjk-mode nil) ) (setq unibyte-display-via-language-environment t) (set-language-environment 'latin-1) (set-buffer-file-coding-system 'iso-8859-1) (setq default-buffer-file-coding-system 'iso-8859-1) (set-default-coding-systems 'iso-8859-1) (modify-coding-system-alist 'file "\\.html\\'" 'utf-8) (modify-coding-system-alist 'file "\\.tex\\'" 'iso-8859-1) (modify-coding-system-alist 'file "\\.xml\\'" 'iso-8859-1) I have played with many such settings and the only reason I have chosen iso-8859-1 as default is that this seems to work, more or less. I forget exactly where I got many of these lines. (Although I've been using some Emacs or other since 1980, I have to admit that I'm very much an Emacs consumer, to quote a recent post to this list.) In principle I would like to understand encoding in Emacs (or in general) since I suspect I have many misconceptions, but at this point, with term starting in less than a week, I would settle for a fix :-) Many thanks in advance for your attention, José -- Prof José M Figueroa-O'Farrill | Phone: +44 (0) 131 6505066 School of Mathematics | Fax: +44 (0) 131 6506553 University of Edinburgh | Mobile: +44 (0) 7870 239186 Edinburgh EH9 3JZ, Scotland, UK | URL: http://www.maths.ed.ac.uk/~jmf