Bug 820448 - systemd-notify doesn't work, since it lives too short
Summary: systemd-notify doesn't work, since it lives too short
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: openstack-keystone
Version: 19
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Alan Pevec (Fedora)
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 812219
TreeView+ depends on / blocked
 
Reported: 2012-05-10 00:03 UTC by Derek Higgins
Modified: 2020-09-10 09:16 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 982376 (view as bug list)
Environment:
Last Closed: 2014-02-11 20:14:16 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Derek Higgins 2012-05-10 00:03:04 UTC
Using a test systemd service unit with 
Type=notify

and using a test script that contains
systemd-notify --ready
the notify is sometimes successfull and sometimes it fails with the following message in /var/log/messages
May 10 00:37:31 laptop systemd[1]: Cannot find unit for notify message of PID 3495.

when the command is successfull
> sudo systemctl start test1.service

returns almost immediately, when it fails it times out after 90 seconds

below are the files I am using along package versions

$ rpm -qa | grep -i -e systemd
systemd-units-37-19.fc16.x86_64
systemd-37-19.fc16.x86_64
systemd-sysv-37-19.fc16.x86_64

$ getenforce 
Permissive

$ cat /lib/systemd/system/test1.service 
[Unit]
Description=Tests

[Service]
User=derekh
Type=notify
ExecStart=/tmp/testservice
NotifyAccess=all

$ cat /tmp/testservice
#!/bin/bash

sleep 1

systemd-notify --ready

echo Sleeping
sleep 300
echo Done

Comment 1 Michal Schmidt 2012-05-10 08:36:44 UTC
(In reply to comment #0)
> May 10 00:37:31 laptop systemd[1]: Cannot find unit for notify message of PID
> 3495.

systemd-notify sends a message to $NOTIFY_SOCKET and then exits.

When systemd receives the notification, the systemd-notify process may have already exited and been reaped by bash.

Ideally the cgroups membership information would be delivered with the message over the socket.

From http://0pointer.de/blog/projects/plumbers-wishlist-3.html:

AF_UNIX:

* An auxiliary meta data message for AF_UNIX called SCM_CGROUPS (or something like that), i.e. a way to attach sender cgroup membership to messages sent via AF_UNIX. This is useful in case services such as syslog shall be shared among various containers (or service cgroups), and the syslog implementation needs to be able to distinguish the sending cgroup in order to separate the logs on disk. Of course stm SCM_CREDENTIALS can be used to look up the PID of the sender followed by a check in /proc/$PID/cgroup, but that is necessarily racy, and actually a very real race in real life.


As an alternative fix, the notification could be made synchronous.

Comment 2 Alan Pevec 2012-06-27 17:41:54 UTC
(In reply to comment #1)
> systemd-notify sends a message to $NOTIFY_SOCKET and then exits.
> 
> When systemd receives the notification, the systemd-notify process may have
> already exited and been reaped by bash.

So, as a work-around, daemon process itself should send READY=1 to $NOTIFY_SOCKET
instead of forking "systemd-notify" command.
We need this for openstack daemons, here's _untested_ code in Python
(after sd_notify implementation in sd-daemon.c):

import socket
import os
s = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)
e = os.getenv('NOTIFY_SOCKET')
s.connect(e)
s.sendall("READY=1")
s.close()

Comment 3 Alan Pevec 2012-09-18 14:43:15 UTC
Script in comment 2 doesn't work with systemd-188-3.fc18 where NOTIFY_SOCKET is abstract namespace socket:
http://cgit.freedesktop.org/systemd/systemd/commit/?id=29252e9e5bad3b0bcfc45d9bc761aee4b0ece1da

It needs special handling if notification socket starts with @: convert to bytes and replace '@' with 0.

Comment 4 Alan Pevec 2012-09-18 22:36:01 UTC
Patch for example script in comment 2:

 e = os.getenv('NOTIFY_SOCKET')
+if e.startswith('@'):
+    # abstract namespace socket
+    e = '\0%s' % e[1:]
 s.connect(e)

Comment 5 Fedora End Of Life 2013-04-03 17:44:40 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 6 Alan Pevec 2014-02-11 20:14:16 UTC
This has been merged upstream and included in Fedora openstack-keystone since 2012.2 release.

Comment 7 Theodore Cowan 2016-08-17 16:54:35 UTC
a work-around with Python:

```
python -c "import systemd.daemon, time; systemd.daemon.notify('READY=1'); time.sleep(5)"
```

Used in a NodeJs application:

```
import { exec } from 'child_process'
exec('python -c "import systemd.daemon, time; systemd.daemon.notify(\'READY=1\'); time.sleep(5)"')
```


Note You need to log in before you can comment on or make changes to this bug.