MariaDB troubles, only running after reboot, times out when trying restart, not stable when running. [#60828]

Submitted by WNYmathGuy on Sun, 01/20/2019 - 14:39 Pro Licensee

I really wish we got a choice during a Virtualmin install of which databases we wanted to be installed.

I'm looking at some software to run on my server and all of the things I'm looking at urge the use of MariaDB to avoid problems encountered in MySQL 5.7.

After doing the MariaDB changeover about a year ago under different circumstances, I knew I was going to lose all my data, so I ran the MySQL dump backups. I tried to install at first with a Bourne shell command of sudo apt-get install mariadb-server mariadb-client, but that threw an error message so I looked at the versions Ubuntu 18.04 was supporting and ran, sudo apt-get install mariadb-server10.1 mariadb-client10.1. That seemed to work and did delete the data directory cause it was "binary", but MariaDB didn't actually work at this point. I noticed that the Webmin > System > Software Packages > "Search Apt.." had both mariadb-server and mariadb-client listed so I thought that they would blow over the non-working one. After a reboot, I installed them and still had no working MariaDB.

At this point, I'm ready to,
"piss on a spark-plug if I thought it would help"
--WarGames.

Since I'm at a #FTW point, I decided to use the Webmin > System > Software Packages > [Search For Package] to find MariaDB and MySQL to uninstall as well as "Purge configuration files" and "Removed unused dependencies as well". After clearing all those programs, I rebooted and installed only mariadb-server using Webmin > System > Software Packages > Package from APT and that seemed to go very well. It put on the client, core's and common programs automatically. It was running after a reboot. I was able to restore the databases after some struggles with utf8mb4 vs. utf8.

I tested two of the sites I run that used MySQL and neither worked so I rebooted again, but quit for the day before It finished coming back online. I realized after going to bed that I forgot to put the users back in. Today I saw MariaDB wasn't running again, and nothing I tried seemed to work. One change I made was to set the Webmin > Servers > MySQL Database Server > {System configuration} "Path to MySQL shared libraries directory" = None and "MySQL configuration file" = /etc/mysql/mariadb.conf.d/50-server.cnf. The main page for Webmin > Servers > MySQL Database Server just says this now:

 MySQL is not running on your system - database list could not be retrieved.

The full MySQL error message was : DBI connect failed : Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)

I rebooted and it looked like the server wasn't running (see attached photo), but when I clicked the "MySQL Database Server" link on that page, it went into the fully running module of Webmin > Servers > MySQL Database Server. I went to set up the missing users and when I clicked the thing to add a new user, MariaDB stopped running again. Here's an error message after trying to add a user:

Failed to save user : DBI connect failed : Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)

I looked around on the web a lot to find answers, but I got nothing. What should I try next?

update on Jan 21, 2019

Not sure if this helps, but It's about a 1-minute window where MariaDB is running before it stops running. I don't know much about sockets, but could something be killing the socket for MariaDB within a few minutes of rebooting?

Status:

Closed (fixed)

Files:

screenshot_from_2019-01-20_15-24-56.png

screenshot_from_2019-01-20_15-40-44.png

Comments

Submitted by WNYmathGuy on Sun, 01/20/2019 - 14:42 Pro Licensee Comment #1

Submitted by WNYmathGuy on Sun, 01/20/2019 - 15:30 Pro Licensee Comment #2

Body:

View changes

Submitted by WNYmathGuy on Mon, 01/21/2019 - 11:52 Pro Licensee Comment #3

Body:

View changes

Submitted by andreychek on Mon, 01/21/2019 - 19:00 Comment #4

Howdy -- thanks for contacting us!

We unfortunately only do any testing with the default databases on a given distro/version.

While other databases should work, that's not really a process we're familiar with performing.

You may want to try reviewing any database logs though to see if any errors are listed there.

Also, how much RAM does your system have? As you could just be running out of RAM.

After the issue occurs next, you may want to try running "dmesg | tail -30", and see if there's any RAM/OOM messages listed there.

Submitted by WNYmathGuy on Mon, 01/21/2019 - 19:33 Pro Licensee Comment #5

@andreychek

31.40 GB total / 29.71 GB free / 1.36 GB cached Swap space: 54 GB total

Want to guess the path to my database logs? I've looked into a lot of logs recently and they aren't helping me out. I've also done the journalctl -xe and systemctl status mariadb.service numerous times then googled the error's and gotten nowhere.

Your dmesg | tail -30 has this possibly related message:

[   61.295614] audit: type=1400 audit(1548092812.745:21): apparmor="DENIED" operation="sendmsg" info="Failed name lookup - disconnected path" error=-13 profile="/usr/sbin/mysqld" name="run/systemd/notify" pid=2414 comm="mysqld" requested_mask="w" denied_mask="w" fsuid=113 ouid=0

That error repeated 11 times.

Following a reboot, right before those 11 nearly identical apparmor errors was this:

[   28.212731] audit: type=1400 audit(1548120125.668:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default-with-mounting" pid=1034 comm="apparmor_parser"
[   28.212733] audit: type=1400 audit(1548120125.668:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default-with-nesting" pid=1034 comm="apparmor_parser"
[   28.222187] audit: type=1400 audit(1548120125.676:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/man" pid=1038 comm="apparmor_parser"
[   28.222191] audit: type=1400 audit(1548120125.676:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_filter" pid=1038 comm="apparmor_parser"
[   28.222194] audit: type=1400 audit(1548120125.676:10): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_groff" pid=1038 comm="apparmor_parser"
[   28.239010] audit: type=1400 audit(1548120125.692:11): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/lib/snapd/snap-confine" pid=1039 comm="apparmor_parser"
[   38.444003] new mount options do not match the existing superblock, will be ignored
[   43.999087] ip6_tables: (C) 2000-2006 Netfilter Core Team
[   44.100012] Ebtables v2.0 registered
[   44.409808] bnx2 0000:02:00.0 enp2s0f0: using MSIX
[   44.410119] IPv6: ADDRCONF(NETDEV_UP): enp2s0f0: link is not ready
[   44.600648] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
[   45.280596] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[   45.493559] Netfilter messages via NETLINK v0.30.
[   45.501174] ip_set: protocol 6
[   48.228661] bnx2 0000:02:00.0 enp2s0f0: NIC Copper Link is Up, 1000 Mbps full duplex
[   48.228679] , receive & transmit flow control ON
[   48.228774] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0f0: link becomes ready
[   64.210462] kauditd_printk_skb: 9 callbacks suppressed

Submitted by andreychek on Mon, 01/21/2019 - 23:48 Comment #6

I unfortunately don't know where the logs would be, except that they'd likely be somewhere under /var/log.

I don't see anything unusual in that "dmesg" output, though note that I'd only expect to see something there immediately after the issue occurred.

That said, it does seem like you have plenty of RAM, I kind of doubt that would be the issue.

Submitted by WNYmathGuy on Mon, 01/28/2019 - 08:50 Pro Licensee Comment #7

I may have solved it in a responsible way.

The people at AppArmor need to get their shizzle straight. I disabled AppArmor on boot and rebooted my server and MariaDB worked perfectly. I looked for better ways to configure usr.sbin.mysqld in /usr/lib/x86_64-linux-gnu/mariadb18/plugin/ to make AppArmor stop killing the /var/run/mysqld/mysqld.sock socket but found none. There were some that looked good, but they were dated so far back in distro's that I kept the default usr.sbin.mysqld. If you look in that one, you see it has no directives of any kind, and it explains that they did that so that MariaDB would be a seamless drop-in type replacement. Well, it wasn't. AppArmor apparently has some odd living memory of the uninstalled MySQL that gets a second life after reboot. The AppArmor sees an irregularity and kills the socket. I can only assume that disabling and stopping AppArmor prior to installing MariaDB is the trick that made me whole again.

As of this writing, I reinstated the AppArmor's start on boot status, then rebooted the server various times over a course of days and MariaDB is running without interruption.

It seems the best way to switch from MySQL to MariaDB on Ubuntu 18.04 LTS server is to turn off AppArmor, then install the MariaDB:

sudo systemctl stop apparmor.service
sudo update-rc.d -f apparmor remove

sudo apt-get remove --purge mysql-server mysql-client mysql-common
sudo apt-get autoremove
sudo apt-get autoclean
sudo apt-get install mariadb-server

sudo systemctl start apparmor.service
sudo update-rc.d apparmor defaults

I'd love to hear other opinions on the validity of my claim. Just because it worked doesn't mean there isn't something wrong with it.

Submitted by andreychek on Mon, 01/28/2019 - 10:03 Comment #8

That's some good detective work you have there!

While we hadn't run into that before, if you're seeing improvement after restarting Apparmor, it certainly seems like that could be the culprit.

Now, we're not sure if it's an issue with Apparmor, or an issue with the MySQL/MariaDB pre or post install scripts, as it would be their responsibility to interact with Apparmor to setup and remove those rules.

But either way, we're glad to hear things are working better for you now!

Submitted by WNYmathGuy on Mon, 01/28/2019 - 14:37 Pro Licensee Comment #9

I did post similarly on MariaDB's community forum, so they have been notified of something.

Submitted by WNYmathGuy on Mon, 01/28/2019 - 14:38 Pro Licensee Comment #10

Calling this one done.

Submitted by WNYmathGuy on Thu, 01/31/2019 - 13:31 Pro Licensee Comment #11

This may help me or others that get here with DuckDuckGo or Google.
It has to do with data conversion problems from MqSQL 5.7 to MariaDB 10.1 and how to mitigate them.
In the initial post I mentioned:

I was able to restore the databases after some struggles with [utf8mb4] vs. [utf8].

I discovered later that I solved those problems wrong, and don't know if the following is perfect, but it's reasonable. My wrong solution was to run a find and replace on a database dump text file replacing the utf8mb4 collations with utf8. Later on, after having some apps in my Nextcloud server malfunction, I had to convert tables and columns back to utf8mb4 where possible. The error I was initially searching on when I first failed to restore my non-Virtualmin managed databases on that server was:
#1071 - Specified key was too long; max key length is 767 bytes
What I wasn't understanding had to do with indexes that were PRIMARY or UNIQUE on CHAR(#) fields. Apparently, MySQL 5.7 has no problem with building a primary key or unique index on fields that are CHAR(255) sized, but older versions of MySQL were not happy with that. They had a size limitation of 767 bytes per entry in the index and that was fine for the utf8 encoding, but not the 4-times as big utf8mb4 encoding. While we wait for MariaDB to be 100% compatible with MySQL, I found that altering the character size of my table entries that were CHAR(255) to CHAR(191) (because the 767/4 rounded down equals 191). That worked. For all I know, it can be bigger than 191, but I don't care because those fields were only 255 because I didn't specify their size during creation so the default goes to max size; I barely exceed 30 characters in the fields that I'm having problems with.

The other problems I had are completely unrelated to Virtualmin. I developed a fairly large database for eventual commercial use. As of now it's dump file is 26.35 GB and I'm not done yet. This is a handcrafted database, not one made automagically by a CMS site like Drupal. I use a lot of Referential Integrity, and I have a number of calculated stored fields. Those were problematic when trying to restore in MariaDB from a MySQL generated dump file. Because of the size of my dump file, I had to go all Unix sys-admin on the fixes.

The table entries that had STORED fields in MySQL output two things in the dump file; GENERATED ALWAYS and STORED. Like this line:
field_name char(15) COLLATE utf8mb4_bin GENERATED ALWAYS AS (replace(another_field_name,'abc','ABC')) STORED COMMENT 'Reason for the field'
The MariaDB equivalent should be:
field_name char(15) COLLATE utf8mb4_bin AS (replace(another_field_name,'abc','ABC')) PERSISTENT COMMENT 'Reason for the field'

Another problem seemed to come from collation specifics. Columns in MySQL 5.7 that were vaguely called CHARACTER SET utf8mb4 were not okay in MariaDB. Here are examples:
a_field_name char(255) CHARACTER SET utf8mb4 NOT NULL
The MariaDB equivalent should be:
a_field_name char(191) COLLATE utf8mb4_bin NOT NULL
I'm good with that because the SET keyword is inappropriately used there in a field making statement.

The last problem I had was a trick column. I made a virtual column for some weird statistical reason that just stores the value 1. I don't know why a normal field with a default of 1 didn't work but it didn't. So here's what it changed to because MariaDB doesn't permit a VIRTUAL or PERSISTENT field that has a constant in its evaluation definition:
freqCount float UNSIGNED GENERATED ALWAYS AS (1) STORED COMMENT 'This is for AVG(SUM(1)) calculations.',
The MariaDB equivalent should be:
freqCount float UNSIGNED AS (1+0*any_numerical_field) PERSISTENT COMMENT 'This is for AVG(SUM(1)) calculations.',

To sum this all up and use the hyper-efficient text replacement tool in the Linux/Unix operating system called SED (https://linux.die.net/man/1/sed), I made a small SQL_Dump_Fixes.sed file with all my text replacements in the appropriate order:

s/ char(255) / char(191) /  
s/ GENERATED ALWAYS / /  
s/STORED/PERSISTENT/  
s/AS (1) PERSISTENT/AS (1+0*`any_numerical_field`) PERSISTENT/ 
s/ CHARACTER SET utf8mb4 / COLLATE utf8mb4_bin /

Then in a bourne shell or other terminal window run that set of fixes on the database dump file:
sed -i -f SQL_Dump_Fixes.sed dump.sql, or more generally
sed -i -f /path_to_script/SQL_Dump_Fixes.sed /path_to_database_dump_file/dump.sql.
When I ran mine it took quite a few minutes to finish; perhaps 15 or less. If you don't have a massive database with many tables holding millions of records, you could open the dump in a text editor and do the Ctrl+H way of converting to the MariaDB style.

One thing a responsible person should do prior to executing that SED command is to be sure you won't accidentally replace matches you didn't want to. The GREP program (https://linux.die.net/man/1/grep), is your go-to if it's an unmanageably large file like mine. See https://unix.stackexchange.com/a/206910 for one good Q&A example.

I forgot something important that I discovered in hindsight so I edited this post. I should have had a SED line in that file to replace the old username with the new one. e.g. s/oldusername/newusername/
This also might not affect most people, but when you write stored PROCEDURES or FUNCTIONS the rights to who can execute those functions are defined. I couldn't run my own functions. A fix I stumbled into was to make the oldusername account in the new database as it was, then to edit functions where the system cleared out the old rights stored in the DEFINER field. After it started letting me CALL functions, I was able to delete that oldusername account.

Submitted by IssueBot on Thu, 02/14/2019 - 13:42 Comment #12

Status:

Fixed

» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.