FreeIPA Identity Management planet - technical blogs

February 18, 2019

Fraser Tweedale

IP address SAN support in FreeIPA

IP address SAN support in FreeIPA

The X.509 Subject Alternative Name (SAN) certificate extension carries subject names that cannot (or cannot easily) be expressed in the Subject Distinguished Name field. The extension supports various name types, including DNS names (the most common), IP addresses, email addresses (for users) and Kerberos principal names, among others.

When issuing a certificate, FreeIPA has to validate that requested SAN name values match the principal to whom the certificate is being issued. There has long been support for DNS names, Kerberos and Microsoft principal names, and email addresses. Over the years we have received many requests to support IP address SAN names. And now we are finally adding support!

In this post I will explain the context and history of this feature, and demonstrate how to use it. At time of writing the work is not yet merged, but substantive changes are not expected.

Acknowledgement

First and foremost, I must thank Ian Pilcher who drove this work. DNS name validation is tricky, but Ian proposed a regime that was acceptable to the FreeIPA team from a philosophical and security standpoint. Then he cut the initial patch for the feature. The work was of a high quality; my subsequent changes and enhancements were minor. Above all, Ian and others had great patience as the pull request sat in limbo for nearly a year! Thank you Ian.

IP address validation

There is a reason we kicked the SAN IP address support can down the road for so long. Unlike some name types, validating IP addresses is far from straightforward.

Let’s first consider the already-supported name types. FreeIPA is an identity management system. It knows the various identities (principal name, email address, hostname) of the subjects/principals it knows about. Validation of these name types reduces to the question “does this name belong to the subject principal object?”

For IP addresses is not so simple. There are several complicating factors:

  • FreeIPA can manage DNS entries, but it doesn’t have to. If FreeIPA is not a source of authoritative DNS information, should it trust information from external resolvers? Only with DNSSEC?
  • There may be multiple, conflicting sources of DNS records. The DNS view presented to FreeIPA clients may differ from that seen by other clients. The FreeIPA DNS may “shadow” public (or other) DNS records.
  • For validation, what should be the treatment of forward (A / AAAA) and reverse (PTR) records pertaining to the names involved?
  • Should CNAME records be followed? How many times?
  • The issued certificate may be used in or presented to clients in environments with a different DNS view from the environment in which validation was performed.
  • Does the request have to come from, or does the requesting entity have to prove control of, the IP address(es) requested for inclusion in the certificate?
  • IP addresses often change and a reassigned much more often than the typical lifetime of a certificate.
  • If you query external DNS systems, how do you handle failures or slowness?
  • The need to mitigate DNS or BGP poisoning attacks

Taking these factors into account, it is plain to see why we put this feature off for so long. It is just hard to determine what the correct behaviour should be. Nevertheless use cases exist so the feature request is legitimate. The difference with Ian's RFE was that he proposed a strict validation regime that only uses data defined in FreeIPA. It is a fair assumption that the data managed by a FreeIPA instance is trustworthy. That assumption, combined with some sanity checks, gives the validation requirements:

  1. Only FreeIPA-managed DNS records are considered. There is no communication with external DNS resolvers.
  2. For each IP address in the SAN, there is a DNS name in the SAN that resolves to it. (As an implementation decision, we permit one level of CNAME indirection).
  3. For each IP address in the SAN, there is a valid PTR (reverse DNS) record.
  4. SAN IP addresses are only supported for host and service principals.

Requirement 1 avoids dealing with any conflicts or communication issues with external resolvers. Requirements 2 and 3 together enforce a tight association between the subject principal (every DNS name is verified to belong to it) and the IP address (through forward and reverse resolution to the DNS name(s)).

Caveats and limitations

FreeIPA’s SAN IP address validation regime leads to the following caveats and limitations:

  • The FreeIPA DNS component must be used. (It can be enabled during installation, or at any time after installation.)
  • Forward and reverse records of addresses to be included in certificates must be added and maintained.
  • SAN IP addresses must be accompanied by at least one DNS name. Requests with only IP addresses will be rejected.

SAN IP address names in general have some limitations, too:

  • The addresses in the certificate were correct at validation time, but might have changed. The only mitigations are to use short-lived certificates, or revoke certificates if DNS changes render them invalid. There is no detection or automation to assist with that.
  • The certificate could be misused by services in other networks with the same IP address. A well-behaved client would still have to trust the FreeIPA CA in order for this impersonation attack to work.

Comparison with the public PKI

SAN IP address names are supported by browsers. The CA/Browser Forum’s Baseline Requirements permit publicly-trusted CAs to issue end-entity certificates with SAN IP address values. CAs have to verify that the applicant controls (or has been granted the right to use) the IP address. There are several acceptable verification methods:

  1. The applicant make some agreed-upon change to a network resource at the IP address in question;
  2. Consulting IANA or regional NIC assignment information;
  3. Performing reverse lookup then verifying control over the DNS name.

The IETF Automated Certificate Management Environment (ACME) working group has an Internet-Draft for automated IP address validation in the ACME protocol. It defines an automated approach to method 1 above. SAN IP addresses are not yet supported by the most popular ACME CA, Let’s Encrypt (and might never be).

Depending on an organisation’s security goals, the verification methods mentioned above may or may not be appropriate for enterprise use (i.e. behind the firewall). Likewise, the decision about whether a particular kind of validation could or should be automated might have different answers for different organisations. It is not really a question of technical constraints; rather, one of philosophy and security doctrine. When it comes to certificate request validation, the public PKI and FreeIPA are asking different questions:

  • FreeIPA asks: does the indicated subject principal own the requested names?
  • The public PKI asks: does the (potentially anonymous) applicant control the names they’re requestion?

In a few words, it’s ownership versus control. In the future it might be possible for a FreeIPA CA to ask the latter question and issue certificates (or not) accordingly. But that isn’t the focus right now.

Demonstration

Preliminaries

The scene is set. Let’s see this feature in action! The domain of my FreeIPA deployment is ipa.local. I will add a host called iptest.example.com, with the IP address 192.168.2.1. The first step is to add the reverse zone for this IP address:

% ipa dnszone-add --name-from-ip 192.168.2.1
Zone name [2.168.192.in-addr.arpa.]:
  Zone name: 2.168.192.in-addr.arpa.
  Active zone: TRUE
  Authoritative nameserver: f29-0.ipa.local.
  Administrator e-mail address: hostmaster
  SOA serial: 1550454790
  SOA refresh: 3600
  SOA retry: 900
  SOA expire: 1209600
  SOA minimum: 3600
  BIND update policy: grant IPA.LOCAL krb5-subdomain 2.168.192.in-addr.arpa. PTR;
  Dynamic update: FALSE
  Allow query: any;
  Allow transfer: none;

If the reverse zone for the IP address already exists, there would be no need to do this first step.

Next I add the host entry. Supplying --ip-address causes forward and reverse records to be added for the supplied address (assuming the relevant zones are managed by FreeIPA):

% ipa host-add iptest.ipa.local \
      --ip-address 192.168.2.1
-----------------------------
Added host "iptest.ipa.local"
-----------------------------
  Host name: iptest.ipa.local
  Principal name: host/iptest.ipa.local@IPA.LOCAL
  Principal alias: host/iptest.ipa.local@IPA.LOCAL
  Password: False
  Keytab: False
  Managed by: iptest.ipa.local

CSR generation

There are several options for creating a certificate signing request (CSR) with IP addresses in the SAN extension.

  • Lots of devices (routers, middleboxes, etc) generate CSRs containing their IP address. This is the significant driving use case for this feature, but there’s no point going into details because every device is different.
  • The Certmonger utility makes it easy to add DNS names and IP addresses to a CSR, via command line arguments. Several other name types are also supported. See getcert-request(1) for details.
  • OpenSSL requires a config file to specify SAN values for inclusing in CSRs and certificates. See req(1) and x509v3_config(5) for details.
  • The NSS certutil(1) command provides the --extSAN option for specifying SAN names, including DNS names and IP addresses.

For this demonstration I use NSS and certutil. First I initialise a new certificate database:

% mkdir nssdb ; cd nssdb ; certutil -d . -N
Enter a password which will be used to encrypt your keys.
The password should be at least 8 characters long,
and should contain at least one non-alphabetic character.

Enter new password:
Re-enter password:

Next, I generate a key and create CSR with the desired names in the SAN extension. We do not specify a key type or size we get the default (2048-bit RSA).

% certutil -d . -R -a -o ip.csr \
      -s CN=iptest.ipa.local \
      --extSAN dns:iptest.ipa.local,ip:192.168.2.1
Enter Password or Pin for "NSS Certificate DB":

A random seed must be generated that will be used in the
creation of your key.  One of the easiest ways to create a
random seed is to use the timing of keystrokes on a keyboard.

To begin, type keys on the keyboard until this progress meter
is full.  DO NOT USE THE AUTOREPEAT FUNCTION ON YOUR KEYBOARD!


Continue typing until the progress meter is full:

|************************************************************|

Finished.  Press enter to continue:


Generating key.  This may take a few moments...

The output file ip.csr contains the generated CSR. Let’s use OpenSSL to pretty-print it:

% openssl req -text < ip.csr
Certificate Request:
    Data:
        Version: 1 (0x0)
        Subject: CN = iptest.ipa.local
        Subject Public Key Info:
            < elided >
        Attributes:
        Requested Extensions:
            X509v3 Subject Alternative Name:
                DNS:iptest.ipa.local, IP Address:192.168.2.1
    Signature Algorithm: sha256WithRSAEncryption
         < elided >

It all looks correct.

Issuing the certificate

I use the ipa cert-request command to request a certificate. The host iptest.ipa.local is the subject principal. The default profile is appropriate.

% ipa cert-request ip.csr \
      --principal host/iptest.ipa.local \
      --certificate-out ip.pem
  Issuing CA: ipa
  Certificate: < elided >
  Subject: CN=iptest.ipa.local,O=IPA.LOCAL 201902181108
  Subject DNS name: iptest.ipa.local
  Issuer: CN=Certificate Authority,O=IPA.LOCAL 201902181108
  Not Before: Mon Feb 18 03:24:48 2019 UTC
  Not After: Thu Feb 18 03:24:48 2021 UTC
  Serial number: 10
  Serial number (hex): 0xA

The command succeeded. As requested, the issued certificate has been written to ip.pem. Again we’ll use OpenSSL to inspect it:

% openssl x509 -text < ip.pem
Certificate:                                                                                                                                                                                               [42/694]
    Data:
        Version: 3 (0x2)
        Serial Number: 10 (0xa)
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: O = IPA.LOCAL 201902181108, CN = Certificate Authority
        Validity
            Not Before: Feb 18 03:24:48 2019 GMT
            Not After : Feb 18 03:24:48 2021 GMT
        Subject: O = IPA.LOCAL 201902181108, CN = iptest.ipa.local
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                RSA Public-Key: (2048 bit)
                Modulus:
                    < elided >
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Authority Key Identifier:
                keyid:70:C0:D3:02:EA:88:4A:4D:34:4C:84:CD:45:5F:64:8A:0B:59:54:71

            Authority Information Access:
                OCSP - URI:http://ipa-ca.ipa.local/ca/ocsp

            X509v3 Key Usage: critical
                Digital Signature, Non Repudiation, Key Encipherment, Data Encipherment
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 CRL Distribution Points:

                Full Name:
                  URI:http://ipa-ca.ipa.local/ipa/crl/MasterCRL.bin
                CRL Issuer:
                  DirName:O = ipaca, CN = Certificate Authority

            X509v3 Subject Key Identifier:
                3D:A9:7E:E3:05:D6:03:6A:9E:85:BB:72:69:E1:E7:11:92:6F:29:08
            X509v3 Subject Alternative Name:
                DNS:iptest.ipa.local, IP Address:192.168.2.1
    Signature Algorithm: sha256WithRSAEncryption
         < elided >

We can see that the Subject Alternative Name extension is present, and included the expected values.

Error scenarios

It’s nice to see that we can get a certificate with IP address names. But it’s more important to know that we cannot get an IP address certificate when the validation requirements are not satisfied. I’ll run through a number of scenarios and show the results (without showing the whole procedure, which would repeat a lot of information).

If we omit the DNS name from the SAN extension, there is nothing linking the IP address to the subject principal and the request will be rejected. Note that the Subject DN Common Name (CN) attribute is ignored for the purposes of SAN IP address validation. The CSR was generated using --extSAN ip:192.168.2.1.

% ipa cert-request ip-bad.csr --principal host/iptest.ipa.local
ipa: ERROR: invalid 'csr': IP address in
  subjectAltName (192.168.2.1) unreachable from DNS names

If we reinstate the DNS name but add an extra IP address that does not relate to the hostname, the request gets rejected. The CSR was generated using --extSAN dns:iptest.ipa.local,ip:192.168.2.1,ip:192.168.2.2.

% ipa cert-request ip-bad.csr --principal host/iptest.ipa.local
ipa: ERROR: invalid 'csr': IP address in
  subjectAltName (192.168.2.2) unreachable from DNS names

Requesting a certificate for a user principal fails. The CSR has Subject DN CN=alice and the SAN extension contain an IP address. The user principal alice does exist.

% ipa cert-request ip-bad.csr --principal alice
ipa: ERROR: invalid 'csr': subject alt name type
  IPAddress is forbidden for user principals

Let’s return to our original, working CSR. If we alter the relevant PTR record so that it no longer points a DNS name in the SAN (or the canonical name thereof), the request will fail:

% ipa dnsrecord-mod 2.168.192.in-addr.arpa. 1 \
      --ptr-rec f29-0.ipa.local.
  Record name: 1
  PTR record: f29-0.ipa.local.

% ipa cert-request ip.csr --principal host/iptest.ipa.local
ipa: ERROR: invalid 'csr': IP address in
  subjectAltName (192.168.2.1) does not match A/AAAA records

Similarly if we delete the PTR record, the request fails (with a different message):

% ipa dnsrecord-del 2.168.192.in-addr.arpa. 1 \
      --ptr-rec f29-0.ipa.local.
------------------
Deleted record "1"
------------------

% ipa cert-request ip.csr --principal host/iptest.ipa.local
ipa: ERROR: invalid 'csr': IP address in
  subjectAltName (192.168.2.1) does not have PTR record

IPv6

Assuming the relevant reverse zone is managed by FreeIPA and contains the correct records, FreeIPA can issue certificates with IPv6 names. First I have to add the relevant zones and records. I’m using the machine’s link-local address but the commands will be similar for other IPv6 addresses.

% ipa dnsrecord-mod ipa.local. iptest \
      --a-rec=192.168.2.1 \
      --aaaa-rec=fe80::8f18:bdab:4299:95fa
  Record name: iptest
  A record: 192.168.2.1
  AAAA record: fe80::8f18:bdab:4299:95fa

% ipa dnszone-add \
      --name-from-ip fe80::8f18:bdab:4299:95fa
Zone name [0.0.0.0.0.0.0.0.0.0.0.0.0.8.e.f.ip6.arpa.]:
  Zone name: 0.0.0.0.0.0.0.0.0.0.0.0.0.8.e.f.ip6.arpa.
  Active zone: TRUE
  Authoritative nameserver: f29-0.ipa.local.
  Administrator e-mail address: hostmaster
  SOA serial: 1550468242
  SOA refresh: 3600
  SOA retry: 900
  SOA expire: 1209600
  SOA minimum: 3600
  BIND update policy: grant IPA.LOCAL krb5-subdomain 0.0.0.0.0.0.0.0.0.0.0.0.0.8.e.f.ip6.arpa. PTR;
  Dynamic update: FALSE
  Allow query: any;
  Allow transfer: none;

% ipa dnsrecord-add \
      0.0.0.0.0.0.0.0.0.0.0.0.0.8.e.f.ip6.arpa. \
      a.f.5.9.9.9.2.4.b.a.d.b.8.1.f.8 \
      --ptr-rec iptest.ipa.local.
  Record name: a.f.5.9.9.9.2.4.b.a.d.b.8.1.f.8
  PTR record: iptest.ipa.local.

With these in place I’ll generate the CSR and issue the certificate. (This time I’ve used the -f and -z options to reduce user interaction.)

% certutil -d . -f pwdfile.txt \
    -z <(dd if=/dev/random bs=2048 count=1 status=none) \
    -R -a -o ip.csr -s CN=iptest.ipa.local \
    --extSAN dns:iptest.ipa.local,ip:fe80::8f18:bdab:4299:95fa


Generating key.  This may take a few moments...

% ipa cert-request ip.csr \
      --principal host/iptest.ipa.local \
      --certificate-out ip.pem
  Issuing CA: ipa
  Certificate: < elided >
  Subject: CN=iptest.ipa.local,O=IPA.LOCAL 201902181108
  Subject DNS name: iptest.ipa.local
  Issuer: CN=Certificate Authority,O=IPA.LOCAL 201902181108
  Not Before: Mon Feb 18 05:49:01 2019 UTC
  Not After: Thu Feb 18 05:49:01 2021 UTC
  Serial number: 12
  Serial number (hex): 0xC

The issuance succeeded. Observe that the IPv6 address is present in the certificate:

% openssl x509 -text < ip.pem | grep -A 1 "Subject Alt"
    X509v3 Subject Alternative Name:
      DNS:iptest.ipa.local, IP Address:FE80:0:0:0:8F18:BDAB:4299:95FA

Of course, it is possible to issue certificates with multiple IP addresses, including a mix of IPv4 and IPv6. Assuming all the necessary DNS records exist, with

--extSAN ip:fe80::8f18:bdab:4299:95fa,ip:192.168.2.1,dns:iptest.ipa.local

The resulting certificate will have the SAN:

IP Address:FE80:0:0:0:8F18:BDAB:4299:95FA, IP Address:192.168.2.1, DNS:iptest.ipa.local

Conclusion

In this post I discussed the challenges of verifying IP addresses for inclusion in X.509 certificates. I discussed the approach we are taking in FreeIPA to finally support this, including its caveats and limitations. For comparison, I outlined how IP address verification is done by CAs on the open internet.

I then demonstrated how the feature will work in FreeIPA. Importantly, I showed (though not exhaustively), that FreeIPA refuses to issue the certificate if the verification requirements are not met. It is a bit hard to demonstrate, from a user perspective, that we only consult FreeIPA’s own DNS records and never consult another DNS server. But hey, the code is open source so you can satisfy yourself that the behaviour fulfils the requirements (or leave a review / file an issue if you find that it does not!)

When will the feature land in master? Before the feature can be merged, I still need to write acceptance tests and have the feature reviewed by another FreeIPA developer. I am hoping to finish the work this week.

As a final remark, I must again acknowledge Ian Pilcher’s significant contribution. Were it not for him, it is likely that this longstanding RFE would still be in our “too hard” basket. Ian, thank you for your patience and I hope that your efforts are rewarded very soon with the feature finally being merged.

February 18, 2019 12:00 AM

February 11, 2019

William Brown

Meaningful 2fa on modern linux

Meaningful 2fa on modern linux

Recently I heard of someone asking the question:

“I have an AD environment connected with <product> IDM. I want to have 2fa/mfa to my linux machines for ssh, that works when the central servers are offline. What’s the best way to achieve this?”

Today I’m going to break this down - but the conclusion for the lazy is:

This is not realistically possible today: use ssh keys with ldap distribution, and mfa on the workstations, with full disk encryption.

Background

So there are a few parts here. AD is for intents and purposes an LDAP server. The <product> is also an LDAP server, that syncs to AD. We don’t care if that’s 389-ds, freeipa or vendor solution. The results are basically the same.

Now the linux auth stack is, and will always use pam for the authentication, and nsswitch for user id lookups. Today, we assume that most people run sssd, but pam modules for different options are possible.

There are a stack of possible options, and they all have various flaws.

  • FreeIPA + 2fa
  • PAM TOTP modules
  • PAM radius to a TOTP server
  • Smartcards

FreeIPA + 2fa

Now this is the one most IDM people would throw out. The issue here is the person already has AD and a vendor product. They don’t need a third solution.

Next is the fact that FreeIPA stores the TOTP in the LDAP, which means FreeIPA has to be online for it to work. So this is eliminated by the “central servers offline” requirement.

PAM radius to TOTP server

Same as above: An extra product, and you have a source of truth that can go down.

PAM TOTP module on hosts

Okay, even if you can get this to scale, you need to send the private seed material of every TOTP device that could login to the machine, to every machine. That means any compromise, compromises every TOTP token on your network. Bad place to be in.

Smartcards

Are notoriously difficult to have functional, let alone with SSH. Don’t bother. (Where the Smartcard does TLS auth to the SSH server this is.)

Come on William, why are you so doom and gloom!

Lets back up for a second and think about what we we are trying to prevent by having mfa at all. We want to prevent single factor compromise from having a large impact and we want to prevent brute force attacks. (There are probably more reasons, but these are the ones I’ll focus on).

So the best answer: Use mfa on the workstation (password + totp), then use ssh keys to the hosts.

This means the target of the attack is small, and the workstation can be protected by things like full disk encryption and group policy. To sudo on the host you still need the password. This makes sudo MFA to root as you need something know, and something you have.

If you are extra conscious you can put your ssh keys on smartcards. This works on linux and osx workstations with yubikeys as I am aware. Apparently you can have ssh keys in TPM, which would give you tighter hardware binding, but I don’t know how to achieve this (yet).

To make all this better, you can distributed your ssh public keys in ldap, which means you gain the benefits of LDAP account locking/revocation, you can remove the keys instantly if they are breached, and you have very little admin overhead to configuration of this service on the linux server side. Think about how easy onboarding is if you only need to put your ssh key in one place and it works on every server! Let alone shutting down a compromised account: lock it in one place, and they are denied access to every server.

SSSD as the LDAP client on the server can also cache the passwords (hashed) and the ssh public keys, which means a disconnected client will still be able to be authenticated to.

At this point, because you have ssh key auth working, you could even deny password auth as an option in ssh altogether, eliminating an entire class of bruteforce vectors.

For bonus marks: You can use AD as the generic LDAP server that stores your SSH keys. No additional vendor products needed, you already have everything required today, for free. Everyone loves free.

Conclusion

If you want strong, offline capable, distributed mfa on linux servers, the only choice today is LDAP with SSH key distribution.

Want to know more? This blog contains how-tos on SSH key distribution for AD, SSH keys on smartcards, and how to configure SSSD to use SSH keys from LDAP.

February 11, 2019 02:00 PM

February 08, 2019

Adam Young

Ansible and FreeIPA Part 2

After some discussion with Bill Nottingham I got a little further along with what it would take to integrate Ansible Tower and FreeIPA. Here are the notes from that talk.

FreeIPA work best when you can use SSSD to manage the user and groups of the application. Since Ansible is a DJango Application running behind NGinx, this means using REMOTE_USER configuration. However, Ansible Tower already provides integration with SAML and OpenIDC using Python Social Auth. If an administrator wants to enable SAML, they do so in the database layer, and that provides replication to all of the Ansible Tower instances in a cluster.

The Social integration provides the means to map from the SAML/OpenIDC assertion to the local user and groups. An alternative based on the REMOTE_USER section would have the same set of mappings, but from Variables exposed by the SSSD layer. The variables available would any exposed from an Nginx module, such as those documented here.

Some configuration of the Base OS would be required beyond enrolling the system as an IPA client. Specifically, any variables that the user wishes to expose would be specified in /etc/sssd/sssd.conf.

This mirrors how I set up SSSD Federation in OpenStack Keystone. The configuration of SSSD is the same.

by Adam Young at February 08, 2019 01:09 AM

February 07, 2019

Adam Young

Ansible and FreeIPA Part-1

Ansible is a workflow engine. I use it to do work on my behalf.

FreeIPA is an identity management system. It allows me to manage the identities of users in my organization

How do I get the two things to work together? The short answer is that it is trivial to do using Ansible Engine. It is harder to do using Ansible tower.

Edit: Second part is here. Third part is coming.

Engine


Lets start with engine. Lets say that I want to execute a playbook on a remote system. Both my local and remote systems are FreeIPA clients. Thus, I can use Kerberos to authenticate when I ssh in to the remote system. This same mechanism is reused by Ansible when I connect to the system. The following two commands are roughly comparable

scp myfile.txt  ayoung@hostname:  
ansible  --user ayoung hostname -m copy -a /
"src=myfile.txt dest=/home/ayoung"  

Ignoring all the extra work that the copy module does, checking hashes etc.

Under the covers, the ssh layer checks the various authentication mechanism available to communicate with the remote machine. If I have run kinit (successfully) prior to executing the scp command, it will try the Kerberos credentials (via GSSAPI, don’t get me started on the acronym soup) to authenticate to the remote system.

This is all well and good if I am running the playbook interactively. But, what if I want to kick off the playbook from an automated system, like cron?

Keys

The most common way that people use ssh is using asymmetric keys with no certificated. On a Linux system, these keys are kept in ~/.ssh. If I am using rsa, then the private key is kept in ~/.ssh/id_rsa. I can use a passphrase to protect this file. If I want to script using that key, I need to remove the passphrase, or I need to store the passphrase in a file that automates submitting it. While there are numerous ways to handle this, a very common pattern is to have a second set of credentials, stored in a second file, and a configuration option that says to use them. For example, I have a directory ~/keys that contains an id_rsa file. I can use it with ssh like this:

ssh cloud-user@128.31.24.146 -i ~/keys/id_rsa

And with Ansible:

 ansible -i inventory.py ayoung_resources --key-file ~/keys/id_rsa  -u cloud-user   -m ping

Ansible lacks knowledge of Kerberos. There is no way to say “kinit blah” prior to the playbook. While you can add this to a script, you are now providing a wrapper around Ansible.

Automating via Kerberos

Kerberos has a different way to automate credentials: You can use a keytab ( a file with symmetric keys stored in it) to get a Ticket Granting Ticket (TGT) and you can place that TGT in a special directory: /var/kerberos/krb5/user/<uid>

I wrote this up a few years back: https://adam.younglogic.com/2015/05/auto-kerberos-authn/

Lets take this a little bit further. Lets say that I don’t want to perform the operation as me. Specifically, I don’t want to create a TGT for my user that has all of my authority in an automated fashion. I want to create some other, limited scope principal (the Kerberos term for users and things that are like users that can do things) and use that.

Service Principals

I’d prefer to create a service principal from my machine. If my machine is testing.demo1.freeipa.org and I create on it a service called ansible, I’ll end up with a principal of:

anisble/testing.demo1.freeipa.org@DEMO1.FREEIPA.ORG

A user can allocate to this principal a Keytab, an X509 Certificate, or both. These credentials can be used to authenticate with a remote machine.

If I want to allow this service credential to get access to a host that I set up as some specified user, I can put an entry in the file ~/.k5login that will specify what principals are allowed to login. So I add the above principal line and now that principal can log in.

Lets assume, however, that we want to limit what that user can do. Say we want to restrict it only to be able to perform git operations. Instead of ~/.k5login, we would use ~/.k5users. This allows us to put a list of commands on the line. It would look like this:

anisble/testing.demo1.freeipa.org@DEMO1.FREEIPA.ORG /usr/bin/git

Ansible Tower

Now that we can set up delegations for the playbooks to use, we can turn our eyes to Ansible Tower. Today, when a user kicks off a playbook from Tower, they have to reuse a set of credentials stored in Ansible tower. However, that means that any external identity management must be duplicated inside tower.

What if we need to pass through the user that logs in to Tower in order to use that initial users identity for operations? We have a few tools available.

Lets start with the case where the user logs in to the Tower instance using Kerberos. We can make use of a mechanism that goes by the unwieldy name of Service-for-User-to-Proxy, usually reduced to S4U2Proxy. This provides a constrained delegation.

What if a user is capable of logging in via some mechanism that is not Kerberos? There is a second mechanism called Service-for-User-to-Self. This allows a system to convert from, say, a password based mechanism, to a Kerberos ticket.

Simo Sorce wrote these up a few years back.

https://ssimo.org/blog/id_011.html

And the Microsoft RFC that describe the mechanisms in detail

https://msdn.microsoft.com/en-us/library/cc246071.aspx

In the case of Ansible Tower, we’d have to specify at the playbook level what user to use when executing the template: The AWX account that runs tower, or the TGT fetched via the S4U* mechanism.

What would it take to extend Tower to do use S4U? Tower can already user Kerberos from the original user:

https://docs.ansible.com/ansible-tower/latest/html/administration/kerberos_auth.html.

The Tower web application would then need to be able to perform the S4U transforms. Fortunately, iot is Python cade. The FreeIPA server has to perform these transforms itself, and it would be comparable transforms.

Configuring the S4U mechanisms in FreeIPA is fairly manual process, as documented by https://vda.li/en/posts/2013/07/29/Setting-up-S4U2Proxy-with-FreeIPA/ I would suggest using Ansible to automate it.

Wrap Up

Kerberos provides a distributed authentication scheme with validation that the user is still active. The is a powerful combination. Ansible should be able to take advantage of the Kerberos support in ssh to greatly streaml;ine the authorization decisions in provisioning and orchestration.

by Adam Young at February 07, 2019 08:25 PM

Fraser Tweedale

staticmethod considered beneficial

staticmethod considered beneficial

Some Python programmers hold that the staticmethod decorator, and to a lesser extent classmethod, are to be avoided where possible. This view is not correct, and in this post I will explain why.

This post will be useful to programmers in any language, but especially Python.

The constructions

I must begin with a brief overview of the classmethod and staticmethod constructions and their uses.

classmethod is a function that transforms a method into a class method. The class method receives the class object as its first argument, rather than an instance of the class. It is typically used as a method decorator:

By idiom, the class object argument is bound to the name cls. You can invoke a class method via an instance (C().f()) or via the class object itself (C.f()). In return for this flexibility you give up the ability to access instance methods or attributes from the method body, even when it was called via an instance.

staticmethod is nearly identical to classmethod. The only difference is that instead of receiving the class object as the first argument, it does not receive any implicit argument:

How are the classmethod and staticmethod constructions used? Consider the following (contrived) class:

There are some places we could use staticmethod and classmethod. Should we? Let’s just do it and discuss the impact of the changes:

forty_two became a static method, and it no longer takes any argument. answer became a class method, and its self argument became cls. It cannot become a static method, because it references cls.forty_two. modified_answer can’t change at all, because it references an instance attribute (self.delta). forty_two could have been made a class method, but just as it had no need of self, it has no need cls either.

There is an alternative refactoring for forty_two. Because it doesn’t reference anything in the class, we could have extracted it as a top-level function (i.e. defined not in the class but directly in a module). Conceptually, staticmethod and top-level functions are equivalent modulo namespacing.

Was the change I made a good one? Well, you already know my answer will be yes. Before I justify my position, let’s discuss some counter-arguments.

Why not staticmethod or classmethod?

Most Python programmers accept that alternative constructors, factories and the like are legitimate applications of staticmethod and classmethod. Apart from these applications, opinions vary.

  • For some folks, the above are the only acceptable uses.
  • Some accept staticmethod for grouping utility functions closely related to some class, into that class; others regard this kind of staticmethod proliferation as a code smell.
  • Some feel that anything likely to only ever be called on an instance should use instance methods, i.e. having self as the first argument, even when not needed.
  • The decorator syntax “noise” seems to bother some people

Guido van Rossum, author and BDFL of Python, wrote that static methods were an accident. History is interesting, sure, but not all accidents are automatically bad.

I am sympathetic to some of these arguments. A class with a lot of static methods might just be better off as a module with top-level functions. It is true that staticmethod is not required for anything whatsoever and could be dispensed with (this is not true of classmethod). And clean code is better than noisy code. Surely if you’re going to clutter your class with decorators, you want something in return right? Well, you do get something in return.

Deny thy self

Let us put to the side the side-argument of staticmethod versus top-level functions. The real debate is instance methods versus not instance methods. This is the crux. Why avoid instance methods (where possible)? Because doing so is a win for readability.

Forget the contrived Foo class from above and imagine you are in a non-trivial codebase. You are hunting a bug, or maybe trying to understand what some function does. You come across an interesting function. It is 50 lines long. What does it do?

If you are reading an instance method, in addition to its arguments, the module namespace, imports and builtins, it has access to self, the instance object. If you want to know what the function does or doesn’t do, you’ll have to read it.

But if that function is a classmethod, you now have more information about this function—namely that it cannot access any instance methods, even if it was invoked on an instance (including from within a sibling instance method). staticmethod (or a top-level function) gives you a bit more than this: not even class methods can be accessed (unless directly referencing the class, which is easily detected and definitely a code smell). By using these constructions when possible, the programmer has less to think about as they read or modify the function.

You can flip this scenario around, too. Say you know a program is failing in some instance method, but you’re not sure how the problematic code is reached. Well, you can rule out the class methods and static methods straight away.

These results are similar to the result of parametricity in programming language theory. The profound and actionable observation in both settings is this: knowing less about something gives the programmer more information about its behaviour.

These might not seem like big wins. Because most of the time it’s only a small win. But it’s never a lose, and over the life of a codebase or the career of a programmer, the small readability wins add up. To me, this is a far more important goal than avoiding extra lines of code (decorator syntax), or spurning a feature because its author considers it an accident or it transgresses the Zen of Python or whatever.

But speaking of the Zen of Python…

Readability counts.

So use classmethod or staticmethod wherever you can.

February 07, 2019 12:00 AM

February 04, 2019

Fraser Tweedale

How does Dogtag PKI spawn?

How does Dogtag PKI spawn?

Dogtag PKI is a complex program. Anyone who has performed a standalone installation of Dogtag can attest to this (to say nothing of actually using it). The program you invoke to install Dogtag is called pkispawn(8). When installing standalone, you invoke pkispawn directly. When FreeIPA installs a Dogtag instance, it invokes pkispawn behind the scenes.

So what does pkispawn actually do? In this post I’ll explain how pkispawn actually spawns a Dogtag instance. This post is not intended to be a guide to the many configuration options pkispawn knows about (although we’ll cover several). Rather, I’ll explain the actions pkispawn performs (or causes to be performed) to go from a fresh system to a working Dogtag CA instance.

This post is aimed at developers and support associates, and to a lesser extent, people who are trying to diagnose issues themselves or understand how to accomplish something fancy in their Dogtag installation. By explaining the steps involved in spawning a Dogtag instance, I hope to make it easier for readers to diagnose issues or implement fixes or enhancements.

pkispawn overview

pkispawn(8) is provided by the pki-server RPM (which is required by the pki-ca RPM that provides the CA subsystem).

You can invoke pkispawn without arguments, and it will prompt for the minimal data it needs to continue. These data include the subsystem to install (e.g. CA or KRA), and LDAP database connection details. For a fresh installation, most defaults are acceptable.

There are many ways to configure or customise an installation. A few important scenarios are:

  • installing a KRA, OCSP, TKS or TPS subsystem associated with the existing CA subsystem (typically on the same machine as the CA subsystem).
  • installing a clone of a subsystem (typically on a different machine)
  • installing a CA subsystem with an externally-signed CA certificate
  • non-interactive installation

For the above scenarios, and for many other possible variations, it is necessary to give pkispawn a configuration file. The pki_default.cfg(5) man page describes the format and available options. Some options are relevant to all subsystems, and others are subsystem-specific (i.e. only for CA, or KRA, etc.) Here is a basic configuration:

[DEFAULT]
pki_server_database_password=Secret.123

[CA]
pki_admin_email=caadmin@example.com
pki_admin_name=caadmin
pki_admin_nickname=caadmin
pki_admin_password=Secret.123
pki_admin_uid=caadmin

pki_client_database_password=Secret.123
pki_client_database_purge=False
pki_client_pkcs12_password=Secret.123

pki_ds_base_dn=dc=ca,dc=pki,dc=example,dc=com
pki_ds_database=ca
pki_ds_password=Secret.123

pki_security_domain_name=EXAMPLE

pki_ca_signing_nickname=ca_signing
pki_ocsp_signing_nickname=ca_ocsp_signing
pki_audit_signing_nickname=ca_audit_signing
pki_sslserver_nickname=sslserver
pki_subsystem_nickname=subsystem

The -f option tells pkispawn the configuration file to use. -s CA tell it install the CA subsystem.

$ pkispawn -f ca.cfg -s CA

For many more examples of how to install Dogtag subsystems for particular scenarios, see the PKI 10 Installation guide on the Dogtag wiki.

Terminology

It is worthwhile to clarify the meaning of some terms:

instance or installation

An installation of Dogtag on a particular machine. An instance may contain one or more subsystems. There may be more than one Dogtag instance on a single machine, although this is uncommon (and each instance must use a disjoint set of network ports). The default instance name is pki-tomcat.

subsystem

Each main function in Dogtag is provided by a subsystem. The subsystems are: CA, KRA, OCSP, TKS and TPS. Every Dogtag instance must have a CA subsystem (hence, the first subsystem installed must be the CA subsystem).

clone

For redundancy, a subsystem may be cloned to a different instance (usually on a different machine; this is not a technical requirement but it does not make sense to do otherwise). Different subsystems may have different numbers of clones in a topology.

topology or deployment

All of the clones of all subsystems derived from some original CA subsystem form a deployment or topology. Typically, each instance in the topology would have a replicated copy of the LDAP database.

pkispawn implementation

Two main phases

pkispawn has two main phases:

  1. set up the Tomcat server and Dogtag application
  2. send configuration requests to the Dogtag application, which performs further configuration steps.

(This is not to be confused with a two step externally-signed CA installation.)

Of course there are many more steps than this. But there is an important reasons I am making such a high-level distinction: debugging. In the first phase pkispawn does everything. Any errors will show up in the pkispawn log file (/var/log/pki/pki-<subsystem>-<timestamp>.log). It is usually straightforward to work out what failed. Why it failed is sometimes easy to work out, and sometimes not so easy.

But in the second phase, pkispawn is handing over control to Dogtag to finish configuring itself. pkispawn sends a series of requests to the pki-tomcatd web application. These requests tell Dogtag to configure things like the database, security domain, and so on. If something goes wrong during these steps, you might see something useful in the pkispawn log, but you will probably also need to look at the Dogtag debug log, or even the Tomcat or Dogtag logs of another subsystem or clone. I detailed this (in the context of debugging clone installation failures) in a previous post.

Scriptlets

pkispawn is implemented in Python. The various steps of installation are implemented as scriptlets: small subroutines that take care of one part of the installation. These are:

  1. initialization: sanity check and normalise installer configuration, and sanity check the system environment.
  2. infrastructure_layout: create PKI instance directories and configuration files.
  3. instance_layout: lay out the Tomcat instance and configuration files (skipped when spawning a second subsystem on an existing instance).
  4. subsystem_layout: lay out subsystem-specific files and directories.
  5. webapp_deployment: deploy the Tomcat web application.
  6. security_databases: set up the main Dogtag NSS database, and a client database where the administrator key and certificate will be created.
  7. selinux_setup: establish correct SELinux contexts on instance and subsystem files.
  8. keygen: generate keys and CSRs for the subsystem (for the CA subsystem, this inclues the CA signing key and CSR for external signing).
  9. configuration: For external CA installation, import the externally-signed CA certificate and chain. (Re)start the pki-tomcatd instance and send configuration requests to the Java application. The whole second phase discussed in the previous section occurs here. It will be discussed in more detail in the next section.
  10. finalization: enable PKI to start on boot (by default) and optionally purge client NSS databases that were set up during installation.

For a two-step externally-signed CA installation, the configuration and finalization scriptlets are skipped during step 1, and in step 2 the scriptlets up to and including keygen are skipped. (A bit of hand-waving here; they not not really skipped but return early).

In the codebase, scriptlets are located under base/server/python/pki/server/deployment/scriptlets/<name>.py. The list of scriptlets and the order in which they’re run is given by the spawn_scriplets variable in base/server/etc/default.cfg. Note that scriplet there is not a typo. Or maybe it is, but it’s not my typo. In some parts of the codebase, we say scriplet, and in others it’s scriptlet. This is mildly annoying, but you just have to be careful to use the correct class or variable name.

Some other Python files contain a lot of code used during deployment. It’s not reasonable to make an exhaustive list, but pki.server.deployment.pkihelper and pki.server.deployment.pkiparser in particular include a lot of configuration processing code. If you are implementing or changing pkispawn configuration options, you’ll be defining them and following changes around in these files (and possibly others), as well as in base/server/etc/default.cfg.

Scriptlets and uninstallation

The installation scriptlets also implement corresponding uninstallation behaviours. When uninstalling a Dogtag instance or subsystem via the pkidestroy command, each scriptlets’ uninstallation behaviour is invoked. The order in which they’re invoked is different from installation, and is given by the destroy_scriplets variable in base/server/etc/default.cfg.

Configuration requests

The configuration scriptlet sends a series of configuration requests to the Dogtag web API. Each request causes Dogtag to perform specific configuration behaviour(s). Depending on the subsystem being installed and whether it is a clone, these steps may including communication with other subsystems or instances, and/or the LDAP database.

The requests performed, in order, are:

  1. /rest/installer/configure: configure (but don’t yet create) the security domain. Import and verify certificates. If creating a clone, request number range allocations from the master.
  2. /rest/installer/setupDatabase: add database connection configuration to CS.cfg. Enable required DS plugins. Populate the database. If creating a clone, initialise replication (this can be suppressed if replication is managed externally, as is the case for FreeIPA in Domain Level 1). Populate VLV indices.
  3. /rest/installer/configureCerts: configure system certificates, generating keys and issuing certificates where necessary.
  4. /rest/installer/setupAdmin (skipped for clones): create admin user and issue certificate.
  5. /rest/installer/backupKeys (optional): back up system certificates and keys to a PKCS #12 file.
  6. /rest/installer/setupSecurityDomain: create the security domain data in LDAP (non-clone) or add the new clone to the security domain.
  7. /rest/installer/setupDatabaseUser: set up the LDAP database user, including certificate (if configured). This is the user that Dogtag uses to bind to LDAP.
  8. /rest/installer/finalizeConfiguration: remove preop configuration entries (which are only used during installation) and perform other finalisation in CS.cfg.

For all of these requests, the configuration scriptlet builds the request data according to the pkispawn configuration. Then it sends the request to the current hostname. Communications between pkispawn and Tomcat are unlikely to fail (connection failure would suggest a major network configuration problem).

If something goes wrong during processing of the request, errors should appear in the subsystem debug log (/etc/pki/pki-tomcat/ca/debug.YYYY-MM-DD.log; /etc/pki/pki-tomcat/ca/debug on older versions), or the system journal. If the local system had to contact other subsystems or instances on other hosts, it may be necessary to look at the debug logs, system journal or Tomcat / Apache httpd logs of the relevant host / subsystem. I wrote about this at length in a previous post so I won’t say more about it here.

In terms of the code, the resource paths and servlet interface are defined in com.netscape.certsrv.system.SystemConfigResource. The implementation is in com.netscape.certsrv.system.SystemConfigService, with a considerable amount of behaviour residing as helper methods in com.netscape.cms.servlet.csadmin.ConfigurationUtils. If you are investigating or fixing configuration request failures, you will spend a fair bit of time grubbing around in these classes.

Conclusion

As I have shown in this post, spawning a Dogtag PKI instance involves a lot of steps. There are many, many ways to customise the installation and I have glossed over many details. But my aim in this post was not to be a comprehensive reference guide or how-to. Rather the intent was to give a high-level view of what happens during installation, and how those behaviours are implemented. Hopefully I have achieved that, and as a result you are now able to more easily diagnose issues or implement changes or features in the Dogtag installer.

February 04, 2019 12:00 AM

January 29, 2019

William Brown

Using the latest 389-ds on OpenSUSE

Using the latest 389-ds on OpenSUSE

Thanks to some help from my friend who works on OBS, I’ve finally got a good package in review for submission to tumbleweed. However, if you are impatient and want to use the “latest” and greatest 389-ds version on OpenSUSE (docker anyone?).

WARNING: This is NOT PRODUCTION READY, so comes with all warnings about backups, and due care with your data and uses cases.

docker run -i -t opensuse/tumbleweed:latest
zypper ar obs://home:firstyear:branches:network:ldap firstyear_ldap
zypper in 389-ds

Now, we still have an issue with “starting” from dsctl (we don’t really expect you to do it like this ….) so you have to make a tweak to defaults.inf:

vim /usr/share/dirsrv/inf/defaults.inf
# change the following to match:
with_systemd = 0

After this, you should now be able to follow our new quickstart guide on the 389-ds website.

I’ll try to keep this repo up to date as much as possible, which is great for testing and early feedback to changes!

January 29, 2019 02:00 PM

Fraser Tweedale

X.509 Name Constraints and FreeIPA

X.509 Name Constraints and FreeIPA

The X.509 Name Constraints extension is a mechanism for constraining the name space(s) in which a certificate authority (CA) may (or may not) issue end-entity certificates. For example, a CA could issue to Bob’s Widgets, Inc a contrained CA certificate that only allows the CA to issue server certificates for bobswidgets.com, or subdomains thereof. In a similar way, an enterprise root CA could issue constrained certificates to different departments in a company.

What is the advantage? Efficiency can be improved without sacrificing security by enabling scoped delegation of certificate issuance capability to subordinate CAs controlled by different organisations. The name constraints extension is essential for the security of such a mechanism. The Bob’s Widgets, Inc CA must not be allowed to issue valid certificates for google.com (and vice versa!)

FreeIPA supports installation with an externally signed CA. It is possible that such a CA certificate could have a name constraints extension, defined and imposed by the external issuer. Does FreeIPA support this? What are the caveats? In this blog post I will describe in detail how Name Constraints work and the state of FreeIPA support. Along the way I will dive into the state of Name Constraints verfication in the NSS security library. And I will conclude with a discussion of limitations, alternatives and complementary controls.

Name Constraints

The Name Constraints extension is defined in RFC 5280. Just as the Subject Alternative Name (SAN) is a list of GeneralName values with various possible types (DNS name, IP address, DN, etc), the Name Constraints extension also contains a list of GeneralName values. The difference is in interpretation. In the Name Constraints extension:

  • A DNS name means that the CA may issue certificates with DNS names in the given domain, or a subdomain of arbitrary depth.
  • An IP address is interpreted as a CIDR address range.
  • A directory name is interpreted as a base DN.
  • An RFC822 name can be a single mailbox, all mailboxes at a particular host, or all mailboxes at a particular domain (including subdomains).
  • The SRVName name type, and corresponding Name Constraints matching rules, are defined in RFC 4985.

There are other rules for other name types, but I won’t elaborate them here.

In X.509 terminology, these name spaces are called subtrees. The Name Constraints extension can define permitted subtrees and/or excluded subtrees. Permitted subtrees is more often used because it defines what is allowed, and anything not explicitly allowed is prohibited. It is possible for a single Name Constraints extension to define both permitted and excluded subtrees. But I have never seen this in the wild, and I will not bother explaining the rules.

When validating a certificate, the Name Constraints subtrees of all CA certificates in the certification path are merged, and the certificate is checked against the merged results. Name values in the SAN extension are compared to Name Constraint subtrees of the same type (the comparison rules differ for each name type.)

In addition to comparing SAN names against Name Constraints, there are a couple of additional requirements:

  • directoryName constraints are checked against the whole Subject DN, in additional to directoryName SAN values.
  • rfc822Name constraints are checked against the emailAddress Subject DN attribute (if present) in addition to rfc822Name SAN values. (Use of the emailAddress attribute is deprecated in favour of rfc822Name SAN values.)

Beyond this, because of the legacy de facto use of the Subject DN CN attribute to carry DNS names, several implementations check the CN attribute against dnsName constraints. This behaviour is not defined (let alone required) by RFC 5280. It is reasonable behaviour when dealing with server certificates. But we will see that this behaviour can lead to problems in other scenarios.

It is important to mention that nothing prevents a constrained CA from issuing a certificate that violates its Name Constraints (either direct or transitive). Validation must be performed by a client. If a client does not validate Name Constraints, then even a (trusted) issuing CA with a permittedSubtrees dnsName constraint of bobswidgets.com could issue a certificate for google.com and the client will accept it. Fortunately, modern web browsers strictly enforce DNS name constraints. For other clients, or other name types, Name Constraint enforcement support is less consistent. I haven’t done a thorough survey yet but you should make your own investigations into the state of Name Constraint validation support in libraries or programs relevant to your use case.

FreeIPA support for constrained CA certificates

It is common to deploy FreeIPA with a subordinate CA certificate signed by an external CA (e.g. the organisation’s Active Directory CA). If the FreeIPA deployment controls the ipa.bobswidgets.com subdomain, then it is reasonable for the CA administrator to issue the FreeIPA CA certificate with a Name Constraints permittedSubtree of ipa.bobswidgets.com. Will this work?

The most important thing to consider is that all names in all certificates issued by the FreeIPA CA must conform to whatever Name Constraints are imposed by the external CA. Above all else, the constraints must permit all DNS names used by the IPA servers across the whole topology. Support for DNS name constraint enforcement is widespread, so if this condition is not met, nothing with work. Most likely not even installation with succeed. So if the permitted dnsName constraint is ipa.bobswidgets.com, then every server hostname must be in that subtree. Likewise for SRV names, RFC822 names and so on.

In a typical deployment scenario this is not a burdensome requirement. And if the requirements change (e.g. needing to add a FreeIPA replica with a hostname excluded by Name Constraints) then the CA certificate could be re-issued with an updated Name Constraints extension to allow it. In some use cases (e.g. FreeIPA issuing certificates for cloud services), Name Constraints in the CA certificate may be untenable.

If the external issuer imposes a directoryName constraint, more care must be taken, because as mentioned above, these constraints apply to the Subject DN of issued certificates. The deployment’s subject base (an installation parameter that defines the base subject DN used in all default certificate profiles) must correspond to the directoryName constraint. Also, the Subject DN configuration for custom certificate profiles must correspond to the constraint.

If all of these conditions are met, then there should be no problem having a constrained FreeIPA CA.

A wild Name Constraint validation bug appears!

You didn’t think the story would end there, did you? As is often the case, my study of some less commonly used feature of X.509 was inspired by a customer issue. The customer’s external CA issued a CA certificate with dnsName and directoryName constraints. The permittedSubtree values were reasonable. Everything looked fine, but nothing worked (not even installation). Dogtag would not start up, and the debug log showed that the startup self-test was complaining about the OCSP signing certificate:

The Certifying Authority for this certificate is not
permitted to issue a certificate with this name.

Adding to the mystery, when the certutil(1) program was used to validate the certificate, the result was success:

# certutil -V -e -u O \
  -d /etc/pki/pki-tomcat/alias \
  -f /etc/pki/pki-tomcat/alias/pwdfile.txt \
  -n "ocspSigningCert cert-pki-ca"
certutil: certificate is valid

Furthermore, the customer was experiencing (and I was also able to reproduce) the issue on RHEL 7, but I could not reproduce the issue on recent versions of Fedora or the RHEL 8 beta.

directoryName constraints are uncommon (relative to dnsName constraints). And having in my past encountered many issues caused by DN string encoding mismatches (a valid scenario, but some libraries do not handle it correctly), my initial theory was that this was the cause. Dogtag uses the NSS security library (via the JSS binding for Java), and a search of the NSS commit log uncovered an interesting change that supported my theory:

Author: David Keeler <dkeeler@mozilla.com>
Date:   Wed Apr 8 16:17:39 2015 -0700

  bug 1150114 - allow PrintableString to match UTF8String
                in name constraints checking r=briansmith

On closer examination however, this change affected code in the mozpkix library (part of NSS), which is not invoked by the certificate validation routines used by Dogtag and certutil program. But if the mozpkix Name Constraint validation code was not being used, where was the relevant code.

Finding the source of the problem

Some more reading of NSS code showed that the error originated in libpkix (also part of NSS).

To work out why certutil was succeeding where Dogtag was failing, I launched certutil in a debugger to see what was going on. Eventually I reached the following routine:

SECStatus
cert_VerifyCertChain(CERTCertDBHandle *handle, CERTCertificate *cert,
                     PRBool checkSig, PRBool *sigerror,
                     SECCertUsage certUsage, PRTime t, void *wincx,
                     CERTVerifyLog *log, PRBool *revoked)
{
  if (CERT_GetUsePKIXForValidation()) {
    return cert_VerifyCertChainPkix(cert, checkSig, certUsage, t,
                                    wincx, log, sigerror, revoked);
  }
  return cert_VerifyCertChainOld(handle, cert, checkSig, sigerror,
}

OK, now I was getting somewhere. It turns out that during library initialisation, NSS reads the NSS_ENABLE_PKIX_VERIFY environment variable and sets a global variable, the value of which determines the return value of CERT_GetUsePKIXForValidation(). The behaviour can also be controlled explicitly via CERT_SetUsePKIXForValidation(PRBool enable).

When invoking certutil ourselves, this environment variable was not set so “old” validation subroutine was invoked. Both routines performs cryptographic validation of a certification path to a trusted CA, and several other important checks. But it seems that the libpkix routine is more thorough, performing Name Constraints checks, as well as OCSP and perhaps other checks that are not also performed by the “old” subroutine.

If an environment variable or explicit library call is required to enable libpkix validation, why was the error occuring in Dogtag? The answer is simple: as part of ipa-server-install, we update /etc/sysconfig/pki-tomcat to set NSS_ENABLE_PKIX_VERIFY=1 in Dogtag’s process environment. This was implemented a few years ago to support OCSP validation of server certificates in connections made by Dogtag (e.g. to the LDAP server).

The bug

Stepping through the code revealed the true nature of the bug. libpkix Name Constraints validation treats the Common Name (CN) attribute of the Subject DN as a DNS name for the purposes of name constraints validation. I already mentioned that this is reasonable behaviour for server certificates. But libpkix has this behaviour for all end-entity certiticates. For an OCSP signing certificate, whose CN attribute carries no special meaning (formally or conventially), this behaviour is wrong. And it is the bug at the root of this problem. I filed a bug in the Mozilla tracker along with a patch—my attempt at fixing the issue. Hopefully a fix can be merged soon.

Why no failure on newer releases?

The issue does not occur on Fedora >= 28 (or maybe earlier, but I haven’t tested), nor the RHEL 8 beta. So was there already a fix for the issue in NSS, or did something change in Dogtag, FreeIPA or elsewhere?

In fact, the change was in Dogtag. In recent versions we switched to a less comprehensive certificate validation routine—one that does not use libpkix. This is just the default behaviour; the old behaviour can still be enabled. We made this change because in some scenarios the OCSP checking performed by libpkix causes Dogtag startup to hang. Because the OCSP server it is trying to reach to validate certificates during start self-test is the same Dogtag instance that is starting up! Because of the change to the self-test validation behaviour, FreeIPA deployments on Fedora >= 28 and RHEL 8 beta do not experience this issue.

Workaround?

If you were experiencing this issue in an existing release (e.g. because you renewed the CA certificate on your existing FreeIPA deployment, and the Name Constraints appeared on the new certificate), an obvious workaround would be to remove the environment variable from /etc/sysconfig/pki-tomcat. That would work, and the change will persist even after an ipa-server-upgrade. But that assumes you already had a working installation. Which the customer doesn’t have, becaues installation itself is failing. So apart from modifying the FreeIPA code to avoid setting this environment variable in the first place, I don’t yet know of a reliable workaround.

This concludes the discussion of constrained CA certificate support in FreeIPA.

Name Constraints only constrains names. There are other ways you might want to constrain a CA. For example: can only issue certificates with validity period <= δ, or can only issue certificates with Extended Key Usages ∈ S. But there exists no mechanism for constraining CAs in such ways.

Not all defined GeneralName types have Name Constraints syntax and semantics defined for them. Documents that define otherName types may define corresponding Name Constraints matching rules, but are not required to. For example RFC 4985, which defines the SRVName type, also defines Name Constraints rules for it. But RFC 4556, which specifies the Kerberos PKINIT protocol, defines the KRB5PrincipalName otherName type but no Name Constraints semantics.

For applications where the set of domains (or other names) is volatile, a constrained CA certificate is likely to be more of a problem than a solution. An example might be a cloud or Platform-as-a-Service provider wanting to issue certificates on behalf of customers, who bring their own domains. For this use case it would be better to use an existing CA that supports automated domain validation and issuance, such as Let’s Encrypt.

Name Constraints say which names a CA is or is not allowed to issue certificates for. But this restriction is controlled by the superior CA(s), not the end-entity. Interestingly there is a way for a domain owner to indicate which CAs are authorised to issue certificates for names in the domain. The DNS CAA record (RFC 6844) can anoint one more CAs, implicitly prohibiting other CAs from issuing certificates for that domain. The CA itself can check for these records, as a control against mis-issuance. For publicly-trusted CAs, the CA-Browser Forum Baseline Requirements requires CAs to check and obey CAA records. DNSSEC is recommended but not required.

CAA is an authorisation control—relying parties do not consult or care about CAA records when verifying certificates. The verification counterpart of CAA is DANE—DNS-based Authentication of Named Entities, defined in RFC 6698. Like CAA, DANE uses DNS (the TLSA record type), but DNSSEC is required. TLSA records can be used to indicate the authorised CA(s) for a certificate. Or they can specify the exact certificate(s) for the domain, a kind of certificate pinning. So DANE can work hand-in-hand with the existing public PKI infrastructure, or it can do an end-run around it. Depending on who you talk to, the reliance on DNSSEC makes it a non-starter, or humanity’s last hope! In any case, support is not yet widespread. Today DANE can be used in some browsers via add-ons, and the OpenSSL and GnuTLS libraries have some support.

Nowadays all publicly-trusted CAs, and some private PKIs, log all issued certificates to Certificate Transparency (CT) logs. These logs are auditable (publicly if the log is public), cryptographically verifiable logs of CA activity. CT was imposed after the detection of many serious misissuances by several publicly-trusted CAs (most of whom are no longer trusted by anyone). Now, even failure to log a certificate to a CT log is reason enough to revoke trust (because what else might they have failed to log? Certificates for google.com or yourbank.ch?) What does CT have to do with Name Constraints? When you consider that client Name Constraints validation support is patchy at best, a CT-based logging and audit solution is a credible alternative to Name Constraints, or at least a valuable complementary control.

Conclusion

So, we have looked at what the Name Constraints extension does, and why it can be useful. We have discussed its limitations and some alternative or related mechanisms. We looked at the state of FreeIPA support, and did a deep dive into NSS to investigate the one bug that seems to be getting in the way.

Name Constraints is one of the many complex features that makes X.509 both so versatile yet so painful to work with. It’s a necessary feature, but support is not consistent and where it exists, there are usually bugs. Although I did discuss some “alternatives”, a big reason you might look for an alternative is because the support is not great in the first place. In my opinion, the best way forward is to ensure Name Constraints validation is performed more often, and more correctly, while (separately) preparing the way for comprehensive CT logging in enterprise CAs. A combination of monitoring (CT) and validation controls (browsers correctly validating names, Name Constraints and requiring evidence of CT logging) seems to be improving security in the public PKI. If we fix the client libraries and make CT logging and monitoring easy, it could work well for enterprise PKIs too.

January 29, 2019 12:00 AM

January 18, 2019

William Brown

Structuring Rust Transactions

Structuring Rust Transactions

I’ve been working on a database-related project in Rust recently, which takes advantage of my concurrently readable datastructures. However I ran into a problem of how to structure Read/Write transaction structures that shared the reader code, and container multiple inner read/write types.

Some Constraints

To be clear, there are some constraints. A “parent” write, will only ever contain write transaction guards, and a read will only ever contain read transaction guards. This means we aren’t going to hit any deadlocks in the code. Rust can’t protect us from mis-ording locks. An additional requirement is that readers and a single write must be able to proceed simultaneously - but having a rwlock style writer or readers behaviour would still work here.

Some Background

To simplify this, imagine we have two concurrently readable datastructures. We’ll call them db_a and db_b.

struct db_a { ... }

struct db_b { ... }

Now, each of db_a and db_b has their own way to protect their inner content, but they’ll return a DBWriteGuard or DBReadGuard when we call db_a.read()/write() respectively.

impl db_a {
    pub fn read(&self) -> DBReadGuard {
        ...
    }

    pub fn write(&self) -> DBWriteGuard {
        ...
    }
}

Now we make a “parent” wrapper transaction such as:

struct server {
    a: db_a,
    b: db_b,
}

struct server_read {
    a: DBReadGuard,
    b: DBReadGuard,
}

struct server_write {
    a: DBWriteGuard,
    b: DBWriteGuard,
}

impl server {
    pub fn read(&self) -> server_read {
        server_read {
            self.a.read(),
            self.b.read(),
        }
    }

    pub fn write(&self) -> server_write {
        server_read {
            self.a.write(),
            self.b.write(),
        }
    }
}

The Problem

Now the problem is that on my server_read and server_write I want to implement a function for “search” that uses the same code. Search or a read or write should behave identically! I wanted to also avoid the use of macros as the can hide issues while stepping in a debugger like LLDB/GDB.

Often the answer with rust is “traits”, to create an interface that types adhere to. Rust also allows default trait implementations, which sounds like it could be a solution here.

pub trait server_read_trait {
    fn search(&self) -> SomeResult {
        let result_a = self.a.search(...);
        let result_b = self.b.search(...);
        SomeResult(result_a, result_b)
    }
}

In this case, the issue is that &self in a trait is not aware of the fields in the struct - traits don’t define that fields must exist, so the compiler can’t assume they exist at all.

Second, the type of self.a/b is unknown to the trait - because in a read it’s a “a: DBReadGuard”, and for a write it’s “a: DBWriteGuard”.

The first problem can be solved by using a get_field type in the trait. Rust will also compile this out as an inline, so the correct thing for the type system is also the optimal thing at run time. So we’ll update this to:

pub trait server_read_trait {
    fn get_a(&self) -> ???;

    fn get_b(&self) -> ???;

    fn search(&self) -> SomeResult {
        let result_a = self.get_a().search(...); // note the change from self.a to self.get_a()
        let result_b = self.get_b().search(...);
        SomeResult(result_a, result_b)
    }
}

impl server_read_trait for server_read {
    fn get_a(&self) -> &DBReadGuard {
        &self.a
    }
    // get_b is similar, so ommitted
}

impl server_read_trait for server_write {
    fn get_a(&self) -> &DBWriteGuard {
        &self.a
    }
    // get_b is similar, so ommitted
}

So now we have the second problem remaining: for the server_write we have DBWriteGuard, and read we have a DBReadGuard. There was a much longer experimentation process, but eventually the answer was simpler than I was expecting. Rust allows traits to have Self types that enforce trait bounds rather than a concrete type.

So provided that DBReadGuard and DBWriteGuard both implement “DBReadTrait”, then we can have the server_read_trait have a self type that enforces this. It looks something like:

pub trait DBReadTrait {
    fn search(&self) -> ...;
}

impl DBReadTrait for DBReadGuard {
    fn search(&self) -> ... { ... }
}

impl DBReadTrait for DBWriteGuard {
    fn search(&self) -> ... { ... }
}

pub trait server_read_trait {
    type GuardType: DBReadTrait; // Say that GuardType must implement DBReadTrait

    fn get_a(&self) -> &Self::GuardType; // implementors must return that type implementing the trait.

    fn get_b(&self) -> &Self::GuardType;

    fn search(&self) -> SomeResult {
        let result_a = self.get_a().search(...);
        let result_b = self.get_b().search(...);
        SomeResult(result_a, result_b)
    }
}

impl server_read_trait for server_read {
    fn get_a(&self) -> &DBReadGuard {
        &self.a
    }
    // get_b is similar, so ommitted
}

impl server_read_trait for server_write {
    fn get_a(&self) -> &DBWriteGuard {
        &self.a
    }
    // get_b is similar, so ommitted
}

This works! We now have a way to write a single “search” type for our server read and write types. In my case, the DBReadTrait also uses a similar technique to define a search type shared between the DBReadGuard and DBWriteGuard.

January 18, 2019 02:00 PM

SUSE Open Build Service cheat sheet

SUSE Open Build Service cheat sheet

Part of starting at SUSE has meant that I get to learn about Open Build Service. I’ve known that the project existed for a long time but I have never had a chance to use it. So far I’m thoroughly impressed by how it works and the features it offers.

As A Consumer

The best part of OBS is that it’s trivial on OpenSUSE to consume content from it. Zypper can add projects with the command:

zypper ar obs://<project name> <repo nickname>
zypper ar obs://network:ldap network:ldap

I like to give the repo nickname (your choice) to be the same as the project name so I know what I have enabled. Once you run this you can easily consume content from OBS.

Package Management

As someone who has started to contribute to the suse 389-ds package, I’ve been slowly learning how this work flow works. OBS similar to GitHub/Lab allows a branching and request model.

On OpenSUSE you will want to use the osc tool for your workflow:

zypper in osc
# If you plan to use the "service" command
zypper in obs-service-tar obs-service-obs_scm obs-service-recompress obs-service-set_version obs-service-download_files

You can branch from an existing project to make changes with:

osc branch <project> <package>
osc branch network:ldap 389-ds

This will branch the project to my home namespace. For me this will land in “home:firstyear:branches:network:ldap”. Now I can checkout the content on to my machine to work on it.

osc co <project>
osc co home:firstyear:branches:network:ldap

This will create the folder “home:…:ldap” in the current working directory.

From here you can now work on the project. Some useful commands are:

Add new files to the project (patches, new source tarballs etc).

osc add <path to file>
osc add feature.patch
osc add new-source.tar.xz

Edit the change log of the project (I think this is used in release notes?)

osc vc

To ammend your changes, use:

osc vc -e

Build your changes locally matching the system you are on. Packages normally build on all/most OpenSUSE versions and architectures, this will build just for your local system and arch.

osc build

Make sure you clean up files you aren’t using any more with:

osc rm <filename>
# This commands removes anything untracked by osc.
osc clean

Commit your changes to the OBS server, where a complete build will be triggered:

osc commit

View the results of the last commit:

osc results

Enable people to use your branch/project as a repository. You edit the project metadata and enable repo publishing:

osc meta prj -e <name of project>
osc meta prj -e home:firstyear:branches:network:ldap

# When your editor opens, change this section to enabled (disabled by default):
<publish>
  <enabled />
</publish>

NOTE: In some cases if you have the package already installed, and you add the repo/update it won’t install from your repo. This is because in SUSE packages have a notion of “vendoring”. They continue to update from the same repo as they were originally installed from. So if you want to change this you use:

zypper [d]up --from <repo name>

You can then create a “request” to merge your branch changes back to the project origin. This is:

osc sr

A helpful maintainer will then review your changes. You can see this with.

osc rq show <your request id>

If you change your request, to submit again, use:

osc sr

And it will ask if you want to replace (supercede) the previous request.

I was also helped by a friend to provie a “service” configuration that allows generation of tar balls from git. It’s not always appropriate to use this, but if the repo has a “_service” file, you can regenerate the tar with:

osc service ra

So far this is as far as I have gotten with OBS, but I already appreciate how great this work flow is for package maintainers, reviewers and consumers. It’s a pleasure to work with software this well built.

As an additional piece of information, it’s a good idea to read the OBS Packaging Guidelines
to be sure that you are doing the right thing!

January 18, 2019 02:00 PM

January 01, 2019

William Brown

Useful USG pro 4 commands and hints

Useful USG pro 4 commands and hints

I’ve recently changed from a FreeBSD vm as my router to a Ubiquiti PRO USG4. It’s a solid device, with many great features, and I’m really impressed at how it “just works” in many cases. So far my only disappointment is lack of documentation about the CLI, especially for debugging and auditing what is occuring in the system, and for troubleshooting steps. This post will aggregate some of my knowledge about the topic.

Current config

Show the current config with:

mca-ctrl -t dump-cfg

You can show system status with the “show” command. Pressing ? will cause the current compeletion options to be displayed. For example:

# show <?>
arp              date             dhcpv6-pd        hardware

DNS

The following commands show the DNS statistics, the DNS configuration, and allow changing the cache-size. The cache-size is measured in number of records cached, rather than KB/MB. To make this permanent, you need to apply the change to config.json in your controllers sites folder.

show dns forwarding statistics
show system name-server
set service dns forwarding cache-size 10000
clear dns forwarding cache

Logging

You can see and aggregate of system logs with

show log

Note that when you set firewall rules to “log on block” they go to dmesg, not syslog, so as a result you need to check dmesg for these.

It’s a great idea to forward your logs in the controller to a syslog server as this allows you to aggregate and see all the events occuring in a single time series (great when I was diagnosing an issue recently).

Interfaces

To show the system interfaces

show interfaces

To restart your pppoe dhcp6c:

release dhcpv6-pd interface pppoe0
renew dhcpv6-pd interface pppoe0

There is a current issue where the firmware will start dhcp6c on eth2 and pppoe0, but the session on eth2 blocks the pppoe0 client. As a result, you need to release on eth2, then renew of pppoe0

If you are using a dynamic prefix rather than static, you may need to reset your dhcp6c duid.

delete dhcpv6-pd duid

To restart an interface with the vyatta tools:

disconnect interface pppoe
connect interface pppoe

OpenVPN

I have setup customised OpenVPN tunnels. To show these:

show interfaces openvpn detail

These are configured in config.json with:

# Section: config.json - interfaces - openvpn
    "vtun0": {
            "encryption": "aes256",
            # This assigns the interface to the firewall zone relevant.
            "firewall": {
                    "in": {
                            "ipv6-name": "LANv6_IN",
                            "name": "LAN_IN"
                    },
                    "local": {
                            "ipv6-name": "LANv6_LOCAL",
                            "name": "LAN_LOCAL"
                    },
                    "out": {
                            "ipv6-name": "LANv6_OUT",
                            "name": "LAN_OUT"
                    }
            },
            "mode": "server",
            # By default, ubnt adds a number of parameters to the CLI, which
            # you can see with ps | grep openvpn
            "openvpn-option": [
                    # If you are making site to site tunnels, you need the ccd
                    # directory, with hostname for the file name and
                    # definitions such as:
                    # iroute 172.20.0.0 255.255.0.0
                    "--client-config-dir /config/auth/openvpn/ccd",
                    "--keepalive 10 60",
                    "--user nobody",
                    "--group nogroup",
                    "--proto udp",
                    "--port 1195"
            ],
            "server": {
                    "push-route": [
                            "172.24.0.0/17"
                    ],
                    "subnet": "172.24.251.0/24"
            },
            "tls": {
                    "ca-cert-file": "/config/auth/openvpn/vps/vps-ca.crt",
                    "cert-file": "/config/auth/openvpn/vps/vps-server.crt",
                    "dh-file": "/config/auth/openvpn/dh2048.pem",
                    "key-file": "/config/auth/openvpn/vps/vps-server.key"
            }
    },

Netflow

Net flows allow a set of connection tracking data to be sent to a remote host for aggregation and analysis. Sadly this process was mostly undocumented, bar some useful forum commentors. Here is the process that I came up with. This is how you configure it live:

set system flow-accounting interface eth3.11
set system flow-accounting netflow server 172.24.10.22 port 6500
set system flow-accounting netflow version 5
set system flow-accounting netflow sampling-rate 1
set system flow-accounting netflow timeout max-active-life 1
commit

To make this persistent:

"system": {
            "flow-accounting": {
                    "interface": [
                            "eth3.11",
                            "eth3.12"
                    ],
                    "netflow": {
                            "sampling-rate": "1",
                            "version": "5",
                            "server": {
                                    "172.24.10.22": {
                                            "port": "6500"
                                    }
                            },
                            "timeout": {
                                    "max-active-life": "1"
                            }
                    }
            }
    },

To show the current state of your flows:

show flow-accounting

January 01, 2019 02:00 PM

The idea of CI and Engineering

The idea of CI and Engineering

In software development I see and interesting trend and push towards continuous integration, continually testing, and testing in production. These techniques are designed to allow faster feedback on errors, use real data for application testing, and to deliver features and changes faster.

But is that really how people use software on devices? When we consider an operation like google or amazon, this always online technique may work, but what happens when we apply a continous integration and “we’ll patch it later” mindset to devices like phones or internet of things?

What happens in other disciplines?

In real engineering disciplines like aviation or construction, techniques like this don’t really work. We don’t continually build bridges, then fix them when they break or collapse. There are people who provide formal analysis of materials, their characteristics. Engineers consider careful designs, constraints, loads and situations that may occur. The structure is planned, reviewed and verified mathematically. Procedures and oversight is applied to ensure correct building of the structure. Lessons are learnt from past failures and incidents and are applied into every layer of the design and construction process. Communication between engineers and many other people is critical to the process. Concerns are always addressed and managed.

The first thing to note is that if we just built lots of scale-model bridges and continually broke them until we found their limits, this would waste many resources to do this. Bridges are carefully planned and proven.

So whats the point with software?

Today we still have a mindset that continually breaking and building is a reasonable path to follow. It’s not! It means that the only way to achieve quality is to have a large test suite (requires people and time to write), which has to be further derived from failures (and those failures can negatively affect real people), then we have to apply large amounts of electrical energy to continually run the tests. The test suites can’t even guarantee complete coverage of all situations and occurances!

This puts CI techniques out of reach of many application developers due to time and energy (translated to dollars) limits. Services like travis on github certainly helps to lower the energy requirement, but it doesn’t stop the time and test writing requirements.

No matter how many tests we have for a program, if that program is written in C or something else, we continually see faults and security/stability issues in that software.

What if we CI on … a phone?

Today we even have hardware devices that are approached as though they “test in production” is a reasonable thing. It’s not! People don’t patch, telcos don’t allow updates out to users, and those that are aware, have to do custom rom deployment. This creates an odd dichomtemy of “haves” and “haves not”, of those in technical know how who have a better experience, and the “haves not” who have to suffer potentially insecure devices. This is especially terrifying given how deeply personal phones are.

This is a reality of our world. People do not patch. They do not patch phones, laptops, network devices and more. Even enterprises will avoid patching if possible. Rather than trying to shift the entire culture of humans to “update always”, we need to write software that can cope in harsh conditions, for long term. We only need to look to software in aviation to see we can absolutely achieve this!

What should we do?

I believe that for software developers to properly become software engineers we should look to engineers in civil and aviation industries. We need to apply:

  • Regualation and ethics (Safety of people is always first)
  • Formal verification
  • Consider all software will run long term (5+ years)
  • Improve team work and collaboration on designs and development

The reality of our world is people are deploying devices (routers, networks, phones, lights, laptops more …) where they may never be updated or patched in their service life. Even I’m guilty (I have a modem that’s been unpatched for about 6 years but it’s pretty locked down …). As a result we need to rely on proof that the device can not fail at build time, rather than patch it later which may never occur! Putting formal verification first, and always considering user safety and rights first, shifts a large burden to us in terms of time. But many tools (Coq, fstar, rust …) all make formal verification more accessible to use in our industry. Verifying our software is a far stronger assertion of quality than “throw tests at it and hope it works”.

You’re crazy William, and also wrong

Am I? Looking at “critical” systems like iPhone encryption hardware, they are running the formally verified Sel4. We also heard at Kiwicon in 2018 that Microsoft and XBox are using formal verification to design their low levels of their system to prevent exploits from occuring in the first place.

Over time our industry will evolve, and it will become easier and more cost effective to formally verify than to operate and deploy CI. This doesn’t mean we don’t need tests - it means that the first line of quality should be in verification of correctness using formal techniques rather than using tests and CI to prove correct behaviour. Tests are certainly still required to assert further behavioural elements of software.

Today, if you want to do this, you should be looking at Coq and program extraction, fstar and the kremlin (project everest, a formally verified https stack), Rust (which has a subset of the safe language formally proven). I’m sure there are more, but these are the ones I know off the top of my head.

Conclusion

Over time our industry must evolve to put the safety of humans first. To achive this we must look to other safety driven cultures such as aviation and civil engineering. Only by learning from their strict disciplines and behaviours can we start to provide software that matches behavioural and quality expectations humans have for software.

January 01, 2019 02:00 PM

December 30, 2018

William Brown

Nextcloud and badrequest filesize incorrect

Nextcloud and badrequest filesize incorrect

My friend came to my house and was trying to share some large files with my nextcloud instance. Part way through the upload an error occurred.

"Exception":"Sabre\\DAV\\Exception\\BadRequest","Message":"expected filesize 1768906752 got 1768554496"

It turns out this error can be caused by many sources. It could be timeouts, bad requests, network packet loss, incorrect nextcloud configuration or more.

We tried uploading larger files (by a factor of 10 times) and they worked. This eliminated timeouts as a cause, and probably network loss. Being on ethernet direct to the server generally also helps to eliminate packet loss as a cause compared to say internet.

We also knew that the server must not have been misconfigured because a larger file did upload, so no file or resource limits were being hit.

This also indicated that the client was likely doing the right thing because larger and smaller files would upload correctly. The symptom now only affected a single file.

At this point I realised, what if the client and server were both victims to a lower level issue? I asked my friend to ls the file and read me the number of bytes long. It was 1768906752, as expected in nextcloud.

Then I asked him to cat that file into a new file, and to tell me the length of the new file. Cat encountered an error, but ls on the new file indeed showed a size of 1768554496. That means filesystem corruption! What could have lead to this?

HFS+

Apple’s legacy filesystem (and the reason I stopped using macs) is well known for silently eating files and corrupting content. Here we had yet another case of that damage occuring, and triggering errors elsewhere.

Bisecting these issues and eliminating possibilities through a scientific method is always the best way to resolve the cause, and it may come from surprising places!

December 30, 2018 02:00 PM

December 20, 2018

William Brown

Identity ideas …

Identity ideas …

I’ve been meaning to write this post for a long time. Taking half a year away from the 389-ds team, and exploring a lot of ideas from other projects has led me to come up with some really interesting ideas about what we do well, and what we don’t. I feel like this blog could be divisive, as I really think that for our services to stay relevant we need to make changes that really change our own identity - so that we can better represent yours.

So strap in, this is going to be long …

What’s currently on the market

Right now the market for identity has two extremes. At one end we have the legacy “create your own” systems, that are build on technologies like LDAP and Kerberos. I’m thinking about things like 389 Directory Server, OpenLDAP, Active Directory, FreeIPA and more. These all happen to be constrained heavily by complexity, fragility, and administrative workload. You need to spend months to learn these and even still, you will make mistakes and there will be problems.

At the other end we have hosted “Identity as a Service” options like Azure AD and Auth0. These have very intelligently, unbound themself from legacy, and tend to offer HTTP apis, 2fa and other features that “just work”. But they are all in the cloud, and outside your control.

But there is nothing in the middle. There is no option that “just works”, supports modern standards, and is unhindered by legacy that you can self deploy with minimal administrative fuss - or years of experience.

What do I like from 389?

  • Replication

The replication system is extremely robust, and has passed many complex tests for cases of eventual consistency correctness. It’s very rare to hear of any kind of data corruption or loss within our replication system, and that’s testament to the great work of people who spent years looking at the topic.

  • Performance

We aren’t as fast as OpenLDAP is 1 vs 1 server, but our replication scalability is much higher, where in any size of MMR or read-only replica topology, we have higher horizontal scaling, nearly linear based on server additions. If you want to run a cloud scale replicated database, we scale to it (and people already do this!).

  • Stability

Our server stability is well known with administrators, and honestly is a huge selling point. We see servers that only go down when administrators are performing upgrades. Our work with sanitising tools and the careful eyes of the team has ensured our code base is reliable and solid. Having extensive tests and amazing dedicated quality engineers also goes a long way.

  • Feature rich

There are a lot of features I really like, and are really useful as an admin deploying this service. Things like memberof (which is actually a group resolution cache when you think about it …), automember, online backup, unique attribute enforcement, dereferencing, and more.

  • The team

We have a wonderful team of really smart people, all of whom are caring and want to advance the state of identity management. Not only do they want to keep up with technical changes and excellence, they are listening to and want to improve our social awareness of identity management.

Pain Points

  • C

Because DS is written in C, it’s risky and difficult to make changes. People constantly make mistakes that introduce unsafety (even myself), and worse. No amount of tooling or intelligence can take away the fact that C is just hard to use, and people need to be perfect (people are not perfect!) and today we have better tools. We can not spend our time chasing our tails on pointless issues that C creates, when we should be doing better things.

  • Everything about dynamic admin, config, and plugins is hard and can’t scale

Because we need to maintain consistency through operations from start to end but we also allow changing config, plugins, and more during the servers operation the current locking design just doesn’t scale. It’s also not 100% safe either as the values are changed by atomics, not managed by transactions. We could use copy-on-write for this, but why? Config should be managed by tools like ansible, but today our dynamic config and plugins is both a performance over head and an admin overhead because we exclude best practice tools and have to spend a large amount of time to maintain consistent data when we shouldn’t need to. Less features is less support overhead on us, and simpler to test and assert quality and correct behaviour.

  • Plugins to address shortfalls, but a bit odd.

We have all these features to address issues, but they all do it … kind of the odd way. Managed Entries creates user private groups on object creation. But the problem is “unix requires a private group” and “ldap schema doesn’t allow a user to be a group and user at the same time”. So the answer is actually to create a new objectClass that let’s a user ALSO be it’s own UPG, not “create an object that links to the user”. (Or have a client generate the group from user attributes but we shouldn’t shift responsibility to the client.)

Distributed Numeric Assignment is based on the AD rid model, but it’s all about “how can we assign a value to a user that’s unique?”. We already have a way to do this, in the UUID, so why not derive the UID/GID from the UUID. This means there is no complex inter-server communication, pooling, just simple isolated functionality.

We have lots of features that just are a bit complex, and could have been made simpler, that now we have to support, and can’t change to make them better. If we rolled a new “fixed” version, we would then have to support both because projects like FreeIPA aren’t going to just change over.

  • client tools are controlled by others and complex (sssd, openldap)

Every tool for dealing with ldap is really confusing and arcane. They all have wild (unhelpful) defaults, and generally this scares people off. I took months of work to get a working ldap server in the past. Why? It’s 2018, things need to “just work”. Our tools should “just work”. Why should I need to hand edit pam? Why do I need to set weird options in SSSD.conf? All of this makes the whole experience poor.

We are making client tools that can help (to an extent), but they are really limited to system administration and they aren’t “generic” tools for every possible configuration that exists. So at some point people will still find a limit where they have to touch ldap commands. A common request is a simple to use web portal for password resets, which today only really exists in FreeIPA, and that limits it’s application already.

  • hard to change legacy

It’s really hard to make code changes because our surface area is so broad and the many use cases means that we risk breakage every time we do. I have even broken customer deployments like this. It’s almost impossible to get away from, and that holds us back because it means we are scared to make changes because we have to support the 1 million existing work flows. To add another is more support risk.

Many deployments use legacy schema elements that holds us back, ranging from the inet types, schema that enforces a first/last name, schema that won’t express users + groups in a simple away. It’s hard to ask people to just up and migrate their data, and even if we wanted too, ldap allows too much freedom so we are more likely to break data, than migrate it correctly if we tried.

This holds us back from technical changes, and social representation changes. People are more likely to engage with a large migrational change, than an incremental change that disturbs their current workflow (IE moving from on prem to cloud, rather than invest in smaller iterative changes to make their local solutions better).

  • ACI’s are really complex

389’s access controls are good because they are in the tree and replicated, but bad because the syntax is awful, complex, and has lots of traps and complexity. Even I need to look up how to write them when I have to. This is not good for a project that has such deep security concerns, where your ACI’s can look correct but actually expose all your data to risks.

  • LDAP as a protocol is like an 90’s drug experience

LDAP may be the lingua franca of authentication, but it’s complex, hard to use and hard to write implementations for. That’s why in opensource we have a monoculture of using the openldap client libraries because no one can work out how to write a standalone library. Layer on top the complexity of the object and naming model, and we have a situation where no one wants to interact with LDAP and rather keeps it at arm length.

It’s going to be extremely hard to move forward here, because the community is so fragmented and small, and the working groups dispersed that the idea of LDAPv4 is a dream that no one should pursue, even though it’s desperately needed.

  • TLS

TLS is great. NSS databases and tools are not.

  • GSSAPI + SSO

GSSAPI and Kerberos are a piece of legacy that we just can’t escape from. They are a curse almost, and one we need to break away from as it’s completely unusable (even if it what it promises is amazing). We need to do better.

That and SSO allows loads of attacks to proceed, where we actually want isolated token auth with limited access scopes …

What could we offer

  • Web application as a first class consumer.

People want web portals for their clients, and they want to be able to use web applications as the consumer of authentication. The HTTP protocols must be the first class integration point for anything in identity management today. This means using things like OAUTH/OIDC.

  • Systems security as a first class consumer.

Administrators still need to SSH to machines, and people still need their systems to have identities running on them. Having pam/nsswitch modules is a very major requirement, where those modules have to be fast, simple, and work correctly. Users should “imply” a private group, and UID/GID should by dynamic from UUID (or admins can override it).

  • 2FA/u2f/TOTP.

Multi-factor auth is here (not coming, here), and we are behind the game. We already have Apple and MS pushing for webauthn in their devices. We need to be there for these standards to work, and to support the next authentication tool after that.

  • Good RADIUS integration.

RADIUS is not going away, and is important in education providers and business networks, so RADIUS must “just work”. Importantly, this means mschapv2 which is the universal default for all clients to operate with, which means nthash.

However, we can make the nthash unlinked from your normal password, so you can then have wifi password and a seperate loging password. We could even generate an NTHash containing the TOTP token for more high security environments.

  • better data structure (flat, defined by object types).

The tree structure of LDAP is confusing, but a flatter structure is easier to manage and understand. We can use ideas from kubernetes like tags/labels which can be used to provide certain controls and filtering capabilities for searches and access profiles to apply to.

  • structured logging, with in built performance profiling.

Being able to diagnose why an operation is slow is critical and having structured logs with profiling information is key to allowing admins and developers to resolve performance issues at scale. It’s also critical to have auditing of every single change made in the system, including internal changes that occur during operations.

  • access profiles with auditing capability.

Access profiles that express what you can access, and how. Easier to audit, generate, and should be tightly linked to group membership for real RBAC style capabilities.

  • transactions by allowing batch operations.

LDAP wants to provide a transaction system over a set of operations, but that may cause performance issues on write paths. Instead, why not allow submission of batches of changes that all must occur “at the same time” or “none”. This is faster network wise, protocol wise, and simpler for a server to implement.

What’s next then …

Instead of fixing what we have, why not take the best of what we have, and offer something new in parallel? Start a new front end that speaks in an accessible way, that has modern structures, and has learnt from the lessons of the past? We can build it to standalone, or proxy from the robust core of 389 Directory Server allowing migration paths, but eschew the pain of trying to bring people to the modern world. We can offer something unique, an open source identity system that’s easy to use, fast, secure, that you can run on your terms, or in the cloud.

This parallel project seems like a good idea … I wonder what to name it …

December 20, 2018 02:00 PM

December 08, 2018

William Brown

Work around docker exec bug

Work around docker exec bug

There is currently a docker exec bug in Centos/RHEL 7 that causes errors such as:

# docker exec -i -t instance /bin/sh
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:110: decoding init error from pipe caused \"read parent: connection reset by peer\""

As a work around you can use nsenter instead:

PID=docker inspect --format {{.State.Pid}} <name of container>
nsenter --target $PID --mount --uts --ipc --net --pid /bin/sh

For more information, you can see the bugreport here.

December 08, 2018 02:00 PM

November 30, 2018

Fraser Tweedale

Diagnosing Dogtag cloning failures

Diagnosing Dogtag cloning failures

Sometimes, creating a Dogtag clone or a FreeIPA CA replica fails. I worked with Dogtag and FreeIPA for nearly five years. Over these years I’ve analysed a lot of these clone/replica installation failures and internalised a lot of knowledge about how cloning works, and how it can break. Often when I read a problem report and inspect the logs I quickly get a “gut feeling” about the cause. The purpose of this post is to externalise my internal intuition so that others can benefit. Whether you are an engineer or not, this post explains what you can do to get to the bottom of Dogtag cloning failures faster.

How Dogtag clones are created

Some notes about terminology: in FreeIPA we talk about replicas, but in Dogtag we say clones. These terms mean the same thing. When you create a FreeIPA CA replica, FreeIPA creates a clone of the Dogtag CA instance behind the scenes. I will use the term master to refer to the server from which the clone/replica is being created.

The pkispawn(8) program, depending on its configuration, can be used to create a new Dogtag subsystem or a clone. pkispawn, a Python program, manages the whole clone creation process, with the possible exception of setting up LDAP database and replication. But some stages of the configuration are handled by the Dogtag server itself (thus implemented in Java). Furthermore, the Dogtag server on the master must service some requests to allow the new clone to integrate into the topology.

The high level procedure of CA cloning is roughly:

  1. (ipa-replica-install) Create temporary Dogtag admin user account and add to relevant groups
  2. (ipa-replica-install or pkispawn) Establish LDAP replication of the Dogtag database
  3. (pkispawn) Extract private keys and system certifiates into Dogtag’s NSSDB
  4. (pkispawn) Lay out the Dogtag instance on the filesystem
  5. (pkispawn) Start the pki-tomcatd instance
  6. (pkispawn) Send a configuration request to the new Dogtag instance
    1. (pki-tomcatd on clone) Send security domain login request to master (using temporary admin user credentials)
    2. (pki-tomcatd on master) Authenticate user, return cookie.
    3. (pki-tomcatd on clone) Send number range requests to master
    4. (pki-tomcatd on master) Service number range requests for clone
  7. (ipa-replica-install) remove temporary admin user account

There are several places a problem could occur: in pkispawn, pki-tomcatd on the clone, or pki-tomcatd on the master. Therefore, depending on what failed, the best data about the failure could be in pkispawn output/logs, the Dogtag debug log on the replica, or the master, or even the system journal on either of the hosts. Recommendation: when analysing Dogtag cloning or FreeIPA CA replica installation failures, inspect all of these logs. It is often not obvious where the error is occurring, or what caused it. Having all these log files helps a lot.

Case studies

Failure to set up replication

Description: ipa-replica-install or pkispawn fail with errors related to replication (failure to establish). I don’t know how common this is in production environments. I’ve encountered it in my development environments. I think it is usually caused by stale replication agreements or something of that nature.

Workaround: A “folk remedy”: uninstall and clean up the instance, then try again. Most often the error does not recur.

Replication races

Description: pkispawn fails; replica debug log indicates security domain login failure; master debug log indicates user unknown; debug log indicates token/session unkonwn

During cloning, the clone adds LDAP objects in its own database. It then performs requests against the master, assuming that those objects (or effects of other LDAP operations) have been replicated to the master. Due to replication lag, the data have not been replicated and as a consequence, a request fails.

In the past couple of years several replication races were discovered and fixed (or mitigated) in Dogtag or FreeIPA:

updateNumberRange failure due to missing session object

Ticket: https://pagure.io/dogtagpki/issue/2557

Description: After security domain login (locally on the replica) the session object gets replicated to the master. The cookie/token conveyed in the updateNumberRange range referred to a session that the master did not yet know about.

Resolution: the replica sleeps (duration configuration; default 5s) after security domain login, giving time for replication. This is not guaranteed the avoid the problem: the complete solution (yet to be implemented) will be to use a signed/MACed token.

Security domain login failure due to missing user or group membership

Ticket: https://pagure.io/freeipa/issue/7593

Description: This bug was actually in FreeIPA, but manifested in pki-tomcatd on master as a failure to log into the security domain. This could occur for one of two reasons: either the user was unknown, or the user was not a member of a required group. FreeIPA performs the relevant LDAP operations on the replica, but they have not replicated to master yet. The pkispawn/ipa-replica-install error message looks something like:

com.netscape.certsrv.base.PKIException: Failed to obtain
installation token from security domain:
com.netscape.certsrv.base.UnauthorizedException: User
admin-replica1.ipa.example is not a member of Enterprise CA
Administrators group.

Workaround: no supported workaround. (You could hack in a sleep though).

Resolution: The user creation routine was already waiting for replication but the wait routine had a timeout bug causing false positives, and the group memberships were not being waited on. The timeout bug was fixed. The wait routine was enhanced to support waiting for particular attribute values; this feature was used to ensure group memberships have been replicated before continuing.

Other updateNumberRange failures

Ticket: https://pagure.io/dogtagpki/issue/3055

Description: When creating a clone from a master that was itself a clone, an updateNumberRange request fails at master with status 500. A NullPointerException backtrace appears in the journal for the pki-tomcatd@pki-tomcat unit (on master). The problem arises because the initial number range assignment for the second clone is equal to the range size of the first clone (range transfer size is a fixed number). This scenario was not handled correctly, leading to the exception.

Workaround: Ensure that each clone services one of each kind of number (e.g. one full certificate request and issuance operation). This ensures that the clone’s range is smaller than the range transfer size, so that a subsequent updateNumberRange request will be satisfied from the master’s “standby” range.

Resolution: detect range depletion due to updateNumberRange requests and eagerly switch to the standby range. A better fix (yet to be implemented) will be to allocate each clone a full-sized range from the unallocated numbers.

Discussion

Dogtag subsystem cloning is a complex procedure. Even more so in the FreeIPA context. There are lots of places failure can occur.

The case studies above are a few examples of difficult-to-debug failures where the cause was non-obvious. Often the error occurs on a different host (the master) from where the error was observed. And the important data about the true cause may reside in ipareplica-install.log, pkispawn log output, the Dogtag CA debug log (on replica or master) or the system journal (again on replica or master). Sometimes the 389DS logs can be helpful too.

Normally the fastest way to understand a problem is to gather all these sources of data and look at them all around the time the error occurred. When you see one failure, don’t assume that that is the failure. Cross-reference the log files. If you can’t see anything about an error, you probably need to look in a different file…

…or a different part of the file! It is important to note that Dogtag time stamps are in local time, whereas most other logs are UTC. Different machines in the topology can be in different timezones, so you could be dealing with up to three timezones across the log files. Check carefully what timezone the timestamps are in when you are “lining up” the logfiles. Many times I have seen (and often erred myself) an incorrect conclusion that “there is no error in the debug log” because of this trap.

In my experience, the most common causes of Dogtag cloning failure have involved Security Domain authentication issues and number range management. Over time I and others have fixed several bugs in these areas, but I am not confident that all potential problems have been fixed. The good news is that checking all the relevant logs usually leads to a good theory about the root cause.

What if you are not an engineer or not able to make sense of the Dogtag codebase? (This is fine by the way—Dogtag is a huge, gnarly beast!) The best thing you can do to help us analyse and resolve the issue is to collect all the logs (from the master and replica) and prune them to the relevant timeframe (minding the timezones) before passing them to an engineer for analysis.

In this post I only looked at Dogtag cloning failures. I have lots of other Dogtag “gut knowledge” that I plan to get out in upcoming posts.

November 30, 2018 12:00 AM

November 27, 2018

Rob Crittenden

Setting up a Mac (OSX) as an IPA client

I periodically see people trying to setup a Mac running OSX as an IPA client. I don’t have one myself so can’t really provide assistance.

There is this guide which seems to be pretty thorough, https://linuxguideandhints.com/centos/freeipa.html#mac-clients

This upstream ticket also has some information on setting up a client, though it isn’t always directly related to simply configuring a client, https://pagure.io/freeipa/issue/4813

So I record this here so I know where to look later 🙂

by rcritten at November 27, 2018 10:10 PM

November 26, 2018

Rob Crittenden

How do I get a certificate for my web site with IPA?

That’s a bit of a loaded question that begs additional questions:

  1. Is the web server enrolled as an IPA client?
  2. What format does the private key and certificate need to be in? (OpenSSL-style PEM, NSS, other?)

If the answer to question 1 is YES then you can do this on that client machine (to be clear, this first step can be done anywhere or in the UI):

$ kinit admin$ ipa service-add HTTP/webserver.example.com

You can use certmonger to request and manage the certificate which includes renewals (bonus!).

If you are using NSS and let’s say mod_nss you’d do something like after creating the database and/or putting the password into /etc/httpd/alias/pwdfile.txt:

# ipa-getcert request -K HTTP/webserver.example.com -d /etc/httpd/alias -n MyWebCert -p /etc/httpd/alias/pwdfile.txt -D webserver.example.com

Let’s break down what these options mean:

  • -K is the Kerberos principal to store the certificate with. You do NOT need to get a keytab for this service
  • -d the NSS database directory. You can use whatever you want but be sure the service has read access and SELinux permission access to it.
  • -n the NSS nickname. This is just a shortcut name to your cert, use what you want.
  • -p the path to the pin/password for the NSS database
  • -D creates a DNS SAN for the hostname webserver.example.com. This is current best practice.

You’ll also need to add the IPA cert chain to the NSS database using certutil.

If you are using OpenSSL and say mod_ssl you’d do something like:

# ipa-getcert request -K HTTP/webserver.example.com -k /etc/pki/tls/private/httpd.key -f /etc/pki/tls/certs/httpd.pem -D webserver.example.com

Similar options as above but instad -f -d and -n:

  • -k path to the key file
  • -f path to the certificate file

To check on the status of your new request you can run:

# ipa-getcert list -n <numeric id that was spit out before>

It should be in status MONITORING.

If the answer to #1 is NO then you have two options: use certmonger on a different machine to generate the key and certificate and transfer them to the target or generate a CSR manually.

For the first case, using certmonger on a different machine, the steps are similar to the YES case.

Create a host and service for the web server:

$ kinit admin$ ipa host-add webserver.example.com$ ipa service-add HTTP/webserver.example.com

Now we need to grant the rights to the current machine to get certificates for the HTTP service of webserver.example.com

$ ipa service-add-host --hosts <your current machine FQDN> HTTP/webserver.example.com

Now run the appropriate ipa-getcert command above to match the format you need and check the status in a similar way.

Once it’s done you need to transfer the cert and key to the webserver.

Finally, if you want to get certificates on an un-enrolled system the basic steps are:

  1. Create a host entry and service as above
  2. Generate a CSR, see https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/linux_domain_identity_authentication_and_policy_guide/certificates#requesting-cert-certutil (or the next section)
  3. Submit that CSR per the above docs

If your webserver is not registered in DNS then you can use the –force option to host-add and service-add to force their creation.

This should pretty generically apply to all versions of IPA v4+, and probably to v3 as well.

 

by rcritten at November 26, 2018 09:48 PM

Powered by Planet