Distracted Engineer


A view on stuff from an easily distracted engineer

Troubleshooting Proxmox LXC Container Backup Errors

Today, I encountered a perplexing issue while attempting to back up two LXC containers in Proxmox. These containers threw errors during the backup process, and after a bit of trial and error, I managed to resolve the issue. I hope that by sharing my experience, others may avoid the same pitfalls.

The Problem: Backup Errors and Permission Denied Messages

My journey began when I noticed repeated failures in my Proxmox backup logs for container 206. The error message was as follows:

INFO: Starting Backup of VM 206 (lxc)
INFO: Backup started at 2024-12-22 18:00:20
INFO: tar: ./home/ansible/.ansible: Cannot open: Permission denied
INFO: tar: ./home/ansible/.ssh: Cannot open: Permission denied
ERROR: Backup of VM 206 failed - command [...] failed: exit code 2

The logs indicated that files within the container’s home directory could not be opened due to permission issues. Despite my best efforts searching through articles, none provided the clarity I needed. Fortunately, a Proxmox forum post here pointed me in the right direction.

The Root Cause: Altered User ID Mapping

Upon further investigation, I discovered that the issue stemmed from altered user ID mappings after the containers’ creation. Essentially, I adjusted the user mappings in my LXC containers such that user 1000 inside the container was mapped to a different user on the Proxmox host. This change wasn’t initially problematic, but it inadvertently led to ownership conflicts because the files created before modifying the user mappings retained the original user ID ownership.

For container 206, the user mapping configuration in /etc/pve/lxc/206.conf looked like this:

lxc.idmap: u 0 100000 1000
lxc.idmap: g 0 100000 1000
lxc.idmap: u 1000 1000 1
lxc.idmap: g 1000 1000 1
lxc.idmap: u 1001 101001 64535
lxc.idmap: g 1001 101001 64535

The Initial State of Directory Permissions

The misalignment between the expected and actual ownership meant that the files had the wrong UID, preventing the backup tool from accessing them. Before fixing, the files in the container’s home directory had the following permissions:

root@real4:\~# ls -la /tank/subvol-206-disk-0/home/ansible
total 31
drwxr-xr-x 4 101000 101000    7 May  5  2024 .
drwxr-xr-x 3 100000 100000    3 May  5  2024 ..
drwx------ 3 101000 101000    3 May  5  2024 .ansible
-rw-r--r-- 1 101000 101000  220 Mar 28  2022 .bash_logout
-rw-r--r-- 1 101000 101000 3526 Mar 28  2022 .bashrc
-rw-r--r-- 1 101000 101000  807 Mar 28  2022 .profile
drwx------ 2 101000 101000    3 May  5  2024 .ssh

The UIDs were 101000 instead of 1000, which caused the permission issues during the backup process.

The Solution: Correcting File Ownership

To resolve the permission issues, I had to correct the ownership of the files in question. Here’s what I did:

# chown -R 1000:1000 /tank/subvol-206-disk-0/home/ansible

This command recursively changed the ownership of the files in the directory to the correct UID and GID (1000:1000). Post-adjustment, my containers backed up successfully, and the permissions were as they should be:

root@real4:\~# ls -lan /tank/subvol-206-disk-0/home/ansible
total 31
drwxr-xr-x 4   1000   1000    7 May  5  2024 .
drwxr-xr-x 3 100000 100000    3 May  5  2024 ..
drwx------ 3   1000   1000    3 May  5  2024 .ansible
-rw-r--r-- 1   1000   1000  220 Mar 28  2022 .bash_logout
-rw-r--r-- 1   1000   1000 3526 Mar 28  2022 .bashrc
-rw-r--r-- 1   1000   1000  807 Mar 28  2022 .profile
drwx------ 2   1000   1000    3 May  5  2024 .ssh

Conclusion

If you encounter backup errors in Proxmox with similar permission denied messages, consider checking your user ID mappings, particularly if these have been altered post-container creation. Correcting file ownership based on the adjusted UID mappings can potentially save your day. I hope this writeup helps someone else dealing with a similar conundrum. Have questions or tips to share? Feel free to leave a comment below!


Creating calendar events on the nth business day each month

Problem

We have monthly reports that are due each month and I wanted to create an automated reminder that they were due using Slack and Google Calendar using Zapier.

Unfortunately, the reports are due in a pattern that Google Calendar repeating events doesn’t support:

  • The 10th business day of the month. E.g. in February 2022 the 10th business day is Monday 14th
  • The first business day that is on or after the 10th day of the month. E.g. 10th October 2021 is a Sunday, so the report would be due the next day on Monday 11th October

Solution

Luckily, while Google Calendar doesn’t let you create these types of events in web app, it does let you add a custom event that has been crafted somewhere else.

I found a great article that really explained the process well and how these custom events needed to work. Using this I was able to create the calendar events for the two scenarios I was after

10th Business Day of the month

BEGIN:VCALENDAR
VERSION:2.0
BEGIN:VEVENT
RRULE:FREQ=MONTHLY;INTERVAL=1;BYDAY=MO,TU,WE,TH,FR;BYSETPOS=10
SUMMARY:Brisbane Report Due
LOCATION:msa-reporting
DTSTART;VALUE=DATE:20210701T090000
SEQUENCE:0
DESCRIPTION:@here The Brisbane report is due today! 
END:VEVENT
END:VCALENDAR

The RRULE is what defines the recurrence behaviour. Breaking that line down to see how it works:

  • FREQ=MONTHLY - repeat monthly
  • INTERVAL=1 - repeat each month
  • BYDAY=MO,TU,WE,TH,FR - only on weekdays
  • BYSETPOS=10 - the 10th day that matches the above rule

First Business Day on or after the 10th day of the month

BEGIN:VCALENDAR
VERSION:2.0
BEGIN:VEVENT
RRULE:FREQ=MONTHLY;INTERVAL=1;BYDAY=MO,TU,WE,TH,FR;BYMONTHDAY=10,11,12,13;BYSETPOS=1
SUMMARY:Melbource Report Due
LOCATION:msa-reporting
DTSTART;VALUE=DATE:20210701T090000
SEQUENCE:0
DESCRIPTION:@here The Melbourne report is due today! 
END:VEVENT
END:VCALENDAR

This one is a bit more complicated. Breaking that line down to see how it works:

  • FREQ=MONTHLY - repeat monthly
  • INTERVAL=1 - repeat each month
  • BYDAY=MO,TU,WE,TH,FR - only on weekdays
  • BYMONTHDAY=10,11,12,13 - AND only on the 10th - 13th day of the month (handling when the 10th lands on a weekend)
  • BYSETPOS=1 - the first day the matches the previous two constrains (weekday && 10-13 day of the month)

Constraints

Unfortunately, neither of these rules don’t account for weekdays that aren’t business days such as public holidays. For example, in January 2022 the custom event for the 10th business day will show the Friday 14th, however, Monday the 3rd is a public holiday so, in fact, the 10th business day is Monday 17th

References

I found the following links helpful when working on this problem:


Handling syslog clients sending blank fields

Problem

I recently starting centrally collecting logs on my home network using Promtail, Loki and Grafana This lets you easily search and analyse logs for all sorts of things.

One of the services I wanted to collect logs for was my TP-Link Wireless Access Points managed by TP-Link Omada. However, it turns out that the APs weren’t setting the app-name field correctly when sending logs to the remote server.

This resulted in errors in promtail that frequently reset the connection to rsyslogd that was acting as a syslog relay.

This was the error I could see in the logs:

promtail:

level=warn ts=2022-01-04T04:08:21.013648461Z caller=syslogtarget.go:216 msg="error parsing syslog stream" err="expecting an app-name (from 1 to max 48 US-ASCII characters) or a nil value [col 50]

rsyslog:

rsyslogd: omfwd: TCPSendBuf error -2027, destruct TCP Connection to promtail:514 [v8.36.0 try http://www.rsyslog.com/e/2027 ]
rsyslogd-2027: omfwd: TCPSendBuf error -2027, destruct TCP Connection to promtail:514 [v8.36.0 try http://www.rsyslog.com/e/2027 ]
rsyslogd-2007: action 'action 2' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.36.0 try http://www.rsyslog.com/e/2007 ]
rsyslogd-2359: action 'action 2' resumed (module 'builtin:omfwd') [v8.36.0 try http://www.rsyslog.com/e/2359 ]

Environment

Syslog Clients:

  • Omada Controller 4.4.8
  • TP-Link EAP 620 HD (x2)

rsyslogd configured as a forwarder to promtail. docker-compose extract below:

version: '3'

services:
#    -- snip --
  loki:
    image: grafana/loki:2.4.1
    restart: unless-stopped
    volumes:
      - ./loki-config.yaml:/mnt/config/loki-config.yaml
    command:
      - --config.file=/mnt/config/loki-config.yaml
  promtail:
    image: grafana/promtail:2.4.1
    restart: unless-stopped
    volumes:
      - /var/log:/var/log
      - ./promtail-config.yaml:/mnt/config/promtail-config.yaml
    command:
      - --config.file=/mnt/config/promtail-config.yaml
  rsyslog:
    image: rsyslog/syslog_appliance_alpine
    restart: unless-stopped
    ports:
      - 514:514/udp
      - 514:514/tcp
    environment:
      - RSYSLOG_CONF=/config/rsyslog.conf
    volumes:
      - ./rsyslog.conf:/config/rsyslog.conf

promtail configuration:

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
- job_name: system
  static_configs:
  - targets:
      - localhost
    labels:
      job: varlogs
      __path__: /var/log/*log
  pipeline_stages:
    - regex:
        expression: "^[\\w]+\\s+[\\d]+\\s+[\\d|:]+ (?P<host>[^\\s]+) (?P<tag>[^:\\[]+)"
    - labels:
        host:
        tag:
- job_name: syslog
  syslog:
    listen_address: "0.0.0.0:514"
    idle_timeout: 10m
    label_structured_data: yes
    labels:
      job: "syslog"
  relabel_configs:
    - source_labels: ["__syslog_connection_ip_address"]
      target_label: "ip_address"
    - source_labels: ["__syslog_message_severity"]
      target_label: "severity"
    - source_labels: ["__syslog_message_facility"]
      target_label: "facility"
    - source_labels: ["__syslog_message_app_name"]
      target_label: "appname"
    - source_labels: ["__syslog_message_hostname"]
      target_label: "host"

rsyslog.conf configured according to: https://grafana.com/docs/loki/latest/clients/promtail/scraping/#rsyslog-output-configuration

Solution

The way I solved this problem was by configuring rsyslog to use a modified template that sets the app-name to - (nil) when the app-name field is blank. The resulting configuration is shown below. Note that this solution should work for any field (e.g. hostname) send from mis-behaving syslog clients.

...

:app-name, !isequal, "" {
    action(type="omfwd" protocol="tcp" target= "promtail" port="514" Template="RSYSLOG_SyslogProtocol23Format" TCP_Framing="octet-counted" KeepAlive="on")
}

# RSYSLOG_SyslogProtocol23Format but with app-name hard-coded to '-'
template(name="missingAppName" type="string" string="<%PRI%>1 %TIMESTAMP:::date-rfc3339% %HOSTNAME% - %PROCID% %MSGID% %STRUCTURED-DATA% %msg%\n")

:app-name, isequal, "" {
    action(type="omfwd" protocol="tcp" target= "promtail" port="514" Template="missingAppName" TCP_Framing="octet-counted" KeepAlive="on")
}

...

References

I found the following links useful while working through this problem: