MySQL charset/collation of databases created by Virtualmin and script installers

This one is one I'm stumped upon: ending up thinking it could be a virtualmin bug, unless once more i didn't see a setting...

In Site Templates -> MySql we have set UTF-8 in Default, and inherit from default template for other templates.

I would expect datasbases created by Virtualmin and by Virtualmin installers (at least the Joomla 1.5 one as Joomla 1.5 requires UTF-8, maybe scripts requiring something else could use something else, but for Joomla 1.5 it's needed) to create databases with utf-8 charset and utf-8-general-ci collation by default, so that new tables created would be utf-8 by default too (and not latin1-swedish-ci like now).

Interestingly the script installer of Joomla 1.5 results in utf8 Mysql tables. But any extension installed after that has latin1-swedish, which gives a mixture of collation, breaking MySQL queries and international text stored, which is less fun.

(Btw: needed phpmyadmin to see that as the virtualmin mysql browser doesn't show that)

I don't see where we can set that, as it's not recommended to change general MySQLserver global default collation from the default latin1-swedish.

I also only see a setting for charset but not for exact collation in mysql settings of site templates.

... OR....:

Do we need to set following ? : Configuration
For module MySQL Database Server: Configurable options: Character set for MySQL data: now Default (default what btw?)

But that seems only for displaying data ?

To reproduce:

create sub-server, install joomla 1.5 into new database, then install any component with table, then take a look at tables with PhpMyAdmin: joomla core tables are utf-8 and the tables of the installed component are latin1-sweedish (and joomla installer DOES NOT specify the collation, so it's really the database default collation which is default latin-1 (as displayed by PhpMyAdmin at bottom of collations column of tables display table).

As those databases are created by Virtualmin, my conclusion is that the Site Template setting of Mysql of UTF-8 is NOT respected at database creation.

As usual, I might be wrong, so I'm not marking this as bug but as Support Request. Feel free to change to bug, in which case it's a major one, and fix should include fixing databases's default collation and possibly tables and rows' collation (ouch!).

Status: 
Active

Comments

Ok .. so it sounds like you need a way to set the "collate =" option in the MySQL "create database" command Virtualmin issues. Is that correct?

If so, it would be pretty easy to add..

Such a statement is needed and I was supposing it was there.

There is ALREADY in create new database:

Additional options for MySQL Character set for new database:

and that setting defaults to UTF-8 correctly according to virtualmin settings. But then now it seems that it is not applied ???

But here the database got created automatically as Joomla installation was INTO NEW DATABASE. And there the collation is also not set (possibly too, when creating a database).

BTW: charset and collation are 2 different things, as collation additionally does set the searches equivalencies, e.g.: e = é = è = ê = E if it is -ci and e != E if it's not -ci...

With collation, you can determine charset, but not the other way around....But you could at least suppose the default collation for each charset e.g. utf-8 charset = utf-8-general-ci collation by default.

So probably 2 fixes:

  • 1) Set database collation on creation in create database:

  • a) in create database command (with explicit selector visible and not hidden as now, as it's important

  • b) when creating new virtual server (if template says auto-create db, using site template's default), maybe even when installing into existing but empty database !

  • c) when auto-creating database on script install (using template default, OR a setting of installer script: e.g. Joomla 1.5 MUST have utf8 wit case-insensitive (ci), most commonly used: utf8-general-ci)

  • d) wherever creating database for a virtual server, ideally showing the collation with correct default collation (not charset, but collation).

  • 2) Replace all mysql charset settings by existing charset (defaulting to default collation for charset) + list of all available collations.

  • And a per virtual server and per database feature welcome to help fix the current mess in virtual servers: to change the collation of existing database+tables+columns

I'm changing ticket to "bug".

Thanks, Beat

What I have implemented for the next Virtualmin release is support for setting the collation order when creating a new database, and for setting a default at the server template level - just like the character set can be chosen now.

Does that work OK for you?

Also, does Joomla really need the utf-8 collation order? I've installed it without setting this explicitly..

Sorry, not sure to understand what do you mean by "setting the collation order" ??

Would that mean that you separate the MySql collation into charset and ordering ? what about the searching-case-sensitivity part (-ci) ?

While that could be interesting approach, I guess it could look confusing to mysql-users used to collation as a whole. You can extract charset from collation.

"Also, does Joomla really need the utf-8 collation order? I've installed it without setting this explicitly.."

Yes, Joomla can run in English with the wrong collation, and sometimes storing utf8 into 2-3 chars of latin1 charsets works too, but then searches don't work correctly, and sometimes, latin1 won't store some of the utf8 bytes too, so it doesn't work well, although looking to work at first glance. Moreover, the database dump that gets installed by virtualmin contains the utf-8 collation, so even if your database is in non-utf-8 collation, it still installs utf8 data for core joomla. However for extensions installed by joomla they will use the database's default, and that's where the things start going bad: e.g. doing joins between a utf-8 joomla column and an extension column in latin 1.

(most do not contain the collation, which is correct as the collation ordering and case-sensitivity could depend on installation particulars).

"What I have implemented for the next Virtualmin release is support for setting the collation order when creating a new database, and for setting a default at the server template level - just like the character set can be chosen now.

Does that work OK for you?"

As not sure to understand, re-precising my thoughts of report and of previous coment I guess if an install script like Joomla can also specify the collation for database creation (in joomla's case utf-8-general-ci) or if the script doesn't specify anything that the server template's default is also taken it could work too.

Then just replace the existing charsets settings everywhere by the longer list of collations (keeping of course the compatiblity of existing charsets mapped to the corresponding default collation language ordering and by default case-sensitivity (-ci)

Am I saying same than you ? :-)

Also a way to change the database default collation and if possible a way to convert a database data from one collation to another would be cool in Virtualmin.

Converting needs 2 different methods: 1) converting inside Mysql by letting Mysql translate character codes, to fix just the storing method of correctly stored characters

2) converting outside Mysql by reading the characters using "interpretation" e.g. 1-3 latin-1 chars interpreted a a utf-8 character, to fix existing databases having wrongly stored characters.

Perhaps if I were to send you a Virtualmin minor revision that includes this new feature, so you can see how it works?

Sure, you have my email. Just send exact instructions how to update an existing installation without breaking the aptitude-controlled update to next version ;-)

Finally tried testing the package in the mail, but the package seems to be the wrong one: no changes in the MySQL settings. Emailed you back.

Ok, this time got the right package, and installed on a almost not used server (with its unlimited VirtualminPro license...).

Regarding UI, seems ok, although I'm not sure if you should propose all collations with all charsets. Not sure you can have utf-8 charset and a latin1 collation e.g.... A single drop-down would be simpler imho, and avoid user errors. You can find charset from collation, and just map the existing charset settings to their default collation for displaying existing charsets.

Regarding function: seems to work fine creating a new database, but still not work for Joomla installation when the install script installs into a new database that he creates, then the database default collation is NOT taken from the site template.

I both tried changing the server template and plan for an existing domain, and also creating a new virtual server, in both cases, when I install a new joomla with a new database, the default database does not take the utf8-general-ci setting from Default server template, but is still latin-swedish.

You can easily reproduce as follows:
- set default server template as utf-8 + utf8-general-ci
- create new server at top level
- install script Joomla 1.5.15 (and phpMyAdmin if needed) INTO a new database
- display the new database in phpmyadmin: at bottom of the list of tables you will see "latin1_swedish_ci" below all tables listed, and that's the default collation for the database with which all new tables will be created if the SQL query doesn't precise utf-8-genera-ci or any other collation...

And by contrast, this works:
- create new database
- do same as above, but install into existing database just created.
- in that case phpmyadmin displays utf8-general-ci at bottom, and new tables get that collation by default too.

Also how to change the default collation for existing users, and for existing databases isn't clear to me yet ?

Best Regards,
Beat

Oh, you are right .. I didn't cover that case where a DB is created as part of a script install. I'll fix that in the next version.

You should also give ability to script to force a given charset and default collation (IF THE DEFAULT SERVER COLLATION's charset doesn't match the charset needed by the script), and leave collation to the choice of the user, depending on version chosen.

E.g.: - Joomla 1.0 ha by default installation latin1-charset, but can be used in other collations too, depending on language file.

  • Joomla 1.5 must have UTF-8 charset, and should be case-independant (-ci collation). By default, utf8-general-ci makes lots of sense in most cases and for international sites. But if we want to go into advanced details, collation could be specific to the main language of the site.

If I may suggest before it gets released:

Generally, I would really recommend NOT having TWO separate drop-downs for CHARSET and for COLLATION, as it's confusing, prone to errors, redundant, and never seen yet, so not usability-wise not familiar.

It's really easy to have only ONE drop-down for collation, then know the charset from the first part before '-' of the collation chosen.

You could map the existing charset setting to default collation of that charset. That would perfectly match existing installations, and just add correct and full MySQL collation support.

In a second step, It would then be useful to be able to view inside Virtualmin the collations of columns, and default collations of tables and database, and possibly to change them, as is already the case with PhpMyAdmin, but hard to find and to do COLUMN by COLUMN in PhpMyAdmin.

We really need a way to correct existing sites which are now mixed collations due to this issue of Virtualmin. ;-) Thanks for reading and listening :-)

Ideally, a "Change Database collation" function would really ROCK:

  • displays existing collation of columns (can be more than one in mixed databases)
  • drop-down with target collation
  • 2 modes: A) "change collation in MySQL and transcode inside MySQL: MySQL will translate character's storage" B) "change collation in MySQL but transcode outside MySQL: characters's bit-wise storage will stay same"
  • if mode "B" is selected, a collation-drop-down for the MySQL CONNECTION collation to use to do the charset translation in perl.

Mode A) is trivial: go trough each column of each table and if has collation, change it with Mysql query to change collation

Mode B) is not very hard either: export the database, then change collation in database, then sed the file to change collations in the file, then import the database.

Mode A is for cases like we had here: wrong collation for storage, but character encoding correct in storage, just not able to store out-of-collation chars.

Mode B is for cases where the character encoding is correct in the application, but due to wrong CONNECTION collation and collations in database, the characters storage is wrong.

Hope that text hepls to understand, and feel free to use it for the help texts, I'm releasing it public domain herewith. ;-)

Are you sure Joomla needs utf-8 ? I wasn't able to find a mention of this in their docs ..

Actually, I will make new DBs created for Joomla use utf-8 as the charset, as per your recommendation.

Does the collation order matter though? Or can it just be the default?

I'm 100% certain on that one: :-)

Joomla 1.5 uses UTF-8 only, you can see in its installer that it creates a database with utf8 character-set:

installer/helper.php: $sql = "CREATE DATABASE `$DBname` CHARACTER SET `utf8`";

It's documented e.g. here: http://docs.joomla.org/Migrating_from_1.0.x_to_1.5_Stable Joomla 1.0 uses latin1 as default, but is also able to use utf-8 by uncommenting a line: http://forum.joomla.org/viewtopic.php?f=11&t=55065

So I would say reasonably that you can use utf8-general-ci as default collation (and utf8 charset) for joomla 1.5 (but not for 1.0 !).

EDIT: added:

and it does same for its tables too:

sql/mysql/joomla.sql (that your installer uses too):

CREATE TABLE `#__banner` (
....
) TYPE=MyISAM CHARACTER SET `utf8`;

edit2: the < c o d e > in the query isn't in my comment but seems a bug of the tracker here...

Ok, great .. Virtualmin 3.77 will create a utf-8 character set DB for Joomla installs.

One little suggestion while at it:

Today I needed to "Create a new database" for a domain, and in charset was desperately looking for "latin1" , or maybe "ISO-8859-1" but none of those 2 were there. I had to go to the test-install with the new collations to find out that it's named in virtualmin "cp1252 West European"...

Maybe a charset naming which is same as Mysql would be easier to understand ? (at least as main name, and in brackets () the "virtualmin name equivalent" ?)

Webmin/Virtualmin just use the character set names displayed by MySQL's show character set command.

Output from SQL command show character set ..

Charset   Description   Default collation   Maxlen  
...
latin1 cp1252 West European latin1_swedish_ci 1
latin2 ISO 8859-2 Central European latin2_general_ci 1
...
latin7 ISO 8859-13 Baltic latin7_general_ci 1
cp1251 Windows Cyrillic cp1251_general_ci 1
cp1256 Windows Arabic cp1256_general_ci 1
cp1257 Windows Baltic cp1257_general_ci 1
...

But only description is outputed, not the charset. Just trying to make the UI as easy as possible to our users... Format "Charset (Description)" would help quite a bit ;-)

Cool for the new joomla 1.5 installs with utf8. Many thanks.

  • Coppermine: too?

Btw, installed today a new Coppermine gallery: by default they also seem to use utf8 now, and had also to change database charset to utf8 for it to work properly with a valid imported utf8 tables, otherwise rendering is wrong.

I also had to convert the previous wrong storage of utf8 bytes as latin1 characters...

  • btw on Coppermine there is cpg1.4.26 Security release - upgrade mandatory! since Feb 1st, not yet in auto-upgrades...

  • Slightly Unrelated:

Oh, while fighting for collations, we use the API to automatically create a virtual server, and today one user with Frédéric in his name got created, but it looks like (not guaranteed) but most probably, that virtualmin's url API does interpret the name as latin1 instead of utf8 like it's on urls ? in Virtualmin it appears as "FrA©dA©ric" instead of "Frédéric" as in our ordering system.

Sure, I will change the UI to show the character set code as well as the description ..

Thank you very much Jamie, really appreciated.

I'm suddenly seeing Mysql usernames being created for domains in user@domain format (often domain is truncated due to limit somewhere). That seems new, and I have even seem MySQL users created in old and in new formats for same databse. Mysql seems to accept @ but i'm not sure if it's a good idea, as @ is sometimes also used to specify on which mysql server you want to be. Just thought to mention it, not sure if bug/wanted/unwanted, so just appending here in case it's not wanted...

Are your mailboxes also in user@domain format? I think by default Virtualmin just uses the same name for MySQL and email ..

Yes, they are: in Default Site Template we have:

Format for usernames that include domain: username@domain

But we set the admin username using the provisioning system, and that one does not have the @domain. So it doesn't happen with the @ in those.

But, that's the case with domains created within Virtualmin, like sub-domains, or top-domains created by resellers. Sometimes it looks really odd, like a mysql username "thisusername@" which is pretty confusing, as the username lenght is really not very long for mysql.

It's strange that I didn't see it earlier, but maybe didn't look until yesterday, when i had to re-create a user "fred" while we had fred@domain1 and fred@domain2 but fred alone disappeared suddenly in mysql users and was used in a site of a reseller that he didn't touch. He didn't recall exactly when it happened and what he did, but it was amongst: renaming domain, moving to top-level, creating one with same username, then deleting it). So something made an untouched domain user "fred" disappear from mysql usernames. Anyways, happened only once so far, but was high-prio ticket, as that site was obviously down...

Do you have full file and SQL logging enabled (at Webmin -> Webmin Configuration -> Logging -> Log file changes) ? If so, you can go to Webmin -> Webmin Actions Log, and see what SQL was run for each Virtualmin action .. that would let you track down how the fred user got deleted.

  • Logging active? Enable logging
  • Log changes made to files by each action? : No
  • Record all modified files before actions, for rollbacks? No

I now turned these 2 to On, looks really nice, in the hope it doesn't consume large disk space.

In logs for last week for that reseller user in most recent to least recent order i see (of course with no file changes logs):

Action   Module   User   Client Address   Date   Time  
Deleted virtual server xn--...........ch Virtualmin Virtual Servers
Created virtual server xn--...........ch Virtualmin Virtual Servers
Renamed virtual server ......p.org Virtualmin Virtual Servers
Created MySQL database p...... in server ......p.org Virtualmin Virtual Servers
Renamed virtual server ......g.org Virtualmin Virtual Servers

Guess that won't help, but now I activated that option so we will see if it comes again...

I guess I'll never finish get impressed by Webmin depth and width... cool stuff all around.

(btw, shouldn't those 2 settings be enabled by default in virtualminPro installs ;) ?)

Ok, let us know what you find in the logs if this happens again..

Disk space consumption depends on the type and rate of actions performed. I didn't want to enable it by default, out of concern that it may consume too much disk space ..

Read this entire post. I am searching for a way to change collation of databases at creation time when virtual servers are set up. I searched everywhere but i can't seem to find that feature. Currently all databases are created with the latin1_swedish_ci but i need them to be utf8_unicode_ci. How and where do i switch that?

Thanks.