# AWS Authentication Implementation Notes
## AWS Account
Per [its documentation](https://docs.aws.amazon.com/STS/latest/APIReference/API_GetCallerIdentity.html,
the GetCallerIdentity API call that the server makes to STS to authenticate
the user using MONGODB-AWS auth mechanism requires no privileges. This means
in order to test authentication using non-temporary credentials (i.e.,
AWS access key id and secret access key only) it is sufficient to create an
IAM user that has no permissions but does have programmatic access enabled
(i.e. has an access key id and secret access key).
## AWS Signature V4
The driver implements the AWS signature v4 internally rather than relying on
a third-party library (such as the
[AWS SDK for Ruby](https://docs.aws.amazon.com/sdk-for-ruby/v3/api/index.html))
to provide the signature implementation. The implementation is quite compact
but getting it working took some effort due to:
1. [The server not logging AWS responses when authentication fails
](https://jira.mongodb.org/browse/SERVER-46909)
2. Some of the messages from STS being quite cryptic (I could not figure out
what the problem was for either "Request is missing Authentication Token" or
"Request must contain a signature that conforms to AWS standards", and
ultimately resolved these problems by comparing my requests to those produced
by the AWS SDK).
3. Amazon's own documentation not providing an example signature calculation
that could be followed to verify correctness, especially since this is a
multi-step process and all kinds of subtle errors are possible in many of the
steps like using a date instead of a time, hex-encoding a MAC in an
intermediate step or not separating header values from the list of signed
headers by two newlines.
### Reference Implementation - AWS SDK
To see actual working STS requests I used Amazon's
[AWS SDK for Ruby](https://docs.aws.amazon.com/sdk-for-ruby/v3/api/index.html)
([API docs for STS client](https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/EC2/Client.html),
[configuration documentation](https://docs.aws.amazon.com/sdk-for-ruby/v3/developer-guide/setup-config.html))
as follows:
1. Set the credentials in the environment (note that the region must be
explicitly provided):
export AWS_ACCESS_KEY_ID=AKIAREALKEY
export AWS_SECRET_ACCESS_KEY=Sweee/realsecret
export AWS_REGION=us-east-1
2. Install the correct gem and launch IRb:
gem install aws-sdk-core
irb -raws-sdk-core -Iaws/sts
3. Send a GetCallerIdentity request, as used by MongoDB server:
Aws::STS::Client.new(
logger: Logger.new(STDERR, level: :debug),
http_wire_trace: true,
).get_caller_identity
This call enables HTTP request and response logging and produces output
similar to the following:
opening connection to sts.amazonaws.com:443...
opened
starting SSL for sts.amazonaws.com:443...
SSL established, protocol: TLSv1.2, cipher: ECDHE-RSA-AES128-SHA
<- "POST / HTTP/1.1\r\nContent-Type: application/x-www-form-urlencoded; charset=utf-8\r\nAccept-Encoding: \r\nUser-Agent: aws-sdk-ruby3/3.91.1 ruby/2.7.0 x86_64-linux aws-sdk-core/3.91.1\r\nHost: sts.amazonaws.com\r\nX-Amz-Date: 20200317T194745Z\r\nX-Amz-Content-Sha256: ab821ae955788b0e33ebd34c208442ccfc2d406e2edc5e7a39bd6458fbb4f843\r\nAuthorization: AWS4-HMAC-SHA256 Credential=AKIAREALKEY/20200317/us-east-1/sts/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date, Signature=6cd3a60a2d7dfba0dcd17f9c4c42d0186de5830cf99545332253a327bba14131\r\nContent-Length: 43\r\nAccept: */*\r\n\r\n"
-> "HTTP/1.1 200 OK\r\n"
-> "x-amzn-RequestId: c56f5d68-8763-4032-a835-fd95efd83fa6\r\n"
-> "Content-Type: text/xml\r\n"
-> "Content-Length: 401\r\n"
-> "Date: Tue, 17 Mar 2020 19:47:44 GMT\r\n"
-> "\r\n"
reading 401 bytes...
-> ""
-> "\n \n arn:aws:iam::5851234356:user/test\n AIDAREALUSERID\n 5851234356\n \n \n c56f5d68-8763-4032-a835-fd95efd83fa6\n \n\n"
read 401 bytes
Conn keep-alive
I, [2020-03-17T15:47:45.275421 #9815] INFO -- : [Aws::STS::Client 200 0.091573 0 retries] get_caller_identity()
=> #
Note that:
1. The set of headers sent by the AWS SDK differs from the set
of headers that the MONGODB-AWS auth mechanism specification mentions.
I used the AWS SDK implementation as a guide to determine the correct shape
of the request to STS and in particular the `Authorization` header.
The source code of Amazon's implementation is
[here](https://github.com/aws/aws-sdk-ruby/blob/master/gems/aws-sigv4/lib/aws-sigv4/signer.rb)
and it generates, in particular, the x-amz-content-sha256` header
which the MONGODB-AWS auth mechanism specification does not mention.
2. This is a working request which can be replayed, making it possible
to send this request that was created by the AWS SDK repeatedly with minor
alterations to study STS error reporting behavior. STS as of this writing
allows a 15 minute window during which a request may be replayed.
3. The printed request only shows the headers and not the request body.
In case of the GetCallerIdentity, the payload is fixed and is the same as
what the MONGODB-AWS auth mechanism specification requires
(`Action=GetCallerIdentity&Version=2011-06-15`).
Because the AWS SDK includes a different set of headers in its requests,
it not feasible to compare the canonical requests generated by AWS SDK
verbatim to the canonical requests generated by the driver.
### Manual Requests
It is possible to manually send requests to STS using OpenSSL `s_client`
tool in combination with the [printf](https://linux.die.net/man/3/printf)
utility to transform the newline escapes. A sample command replaying the
request printed above is as follows:
(printf "POST / HTTP/1.1\r\nContent-Type: application/x-www-form-urlencoded; charset=utf-8\r\nAccept-Encoding: \r\nUser-Agent: aws-sdk-ruby3/3.91.1 ruby/2.7.0 x86_64-linux aws-sdk-core/3.91.1\r\nHost: sts.amazonaws.com\r\nX-Amz-Date: 20200317T194745Z\r\nX-Amz-Content-Sha256: ab821ae955788b0e33ebd34c208442ccfc2d406e2edc5e7a39bd6458fbb4f843\r\nAuthorization: AWS4-HMAC-SHA256 Credential=AKIAREALKEY/20200317/us-east-1/sts/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date, Signature=6cd3a60a2d7dfba0dcd17f9c4c42d0186de5830cf99545332253a327bba14131\r\nContent-Length: 43\r\nAccept: */*\r\n\r\n" &&
echo "Action=GetCallerIdentity&Version=2011-06-15" &&
sleep 5) |openssl s_client -connect sts.amazonaws.com:443
Note the sleep call - `s_client` does not wait for the remote end to provide
a response before exiting, thus the sleep on the input side allows 5 seconds
for STS to process the request and respond.
For reference, Amazon provides [GetCallerIdentity API documentation
](https://docs.aws.amazon.com/STS/latest/APIReference/API_GetCallerIdentity.html).
### Integration Test - Signature Generation
The Ruby driver includes an integration test for signature generation, where
the driver makes the call to `GetCallerIdentity` STS endpoint using the
provided AWS credentials. This test is in
`spec/integration/aws_auth_request_spec.rb`.
### STS Error Responses
The error responses produced by STS sometimes do not clearly indicate the
problem. Below are some of the puzzling responses I encountered:
- *Request is missing Authentication Token*: request is missing the
`Authorization` header, or the value of the header does not begin with
`AWS4-`. For example, this error is produced if the signature algorithm
is erroneously given as `AWS-HMAC-SHA256` instead of `AWS4-HMAC-SHA256`
with the remainder of the header value being correctly constructed.
This error is also produced if the value of the header erroneously includes
the name of the header (i.e. the header name is specified twice in the header
line) but the value is otherwise completely valid. This error has no relation
to the "session token" or "security token" as used with temporary AWS
credentials.
- *The security token included in the request is invalid*: this error can be
produced in several circumstances:
- When the AWS access key id, as specified in the scope part of the
`Authorization` header, is not a valid access key id. In the case of
non-temporary credentials being used for authentication, the error refers to
a "security token" but the authentication process does not actually use a
security token as this term is used in the AWS documentation describing
temporary credentials.
- When using temporary credentials and the security token is not provided
in the STS request at all (x-amz-security-token header).
- *Signature expired: 20200317T000000Z is now earlier than 20200317T222541Z
(20200317T224041Z - 15 min.)*: This error happens when `x-amz-date` header
value is the formatted date (`YYYYMMDD`) rather than the ISO8601 formatted
time (`YYYYMMDDTHHMMSSZ`). Note that the string `20200317T000000Z` is never
explicitly provided in the request - it is derived by AWS from the provided
header `x-amz-date: 20200317`.
- *The request signature we calculated does not match the signature
you provided. Check your AWS Secret Access Key and signing method. Consult
the service documentation for details*: this is the error produced when
the signature is not calculated correctly but everything else in the
request is valid. If a different error is produced, most likely the problem
is in something other than signature calculation.
- *The security token included in the request is expired*: this error is
produced when temporary credentials are used and the credentials have
expired.
See also [AWS documentation for STS error messages](https://docs.aws.amazon.com/STS/latest/APIReference/CommonErrors.html).
### Resources
Generally I found Amazon's own documentation to be the best for implementing
the signature calculation. The following documents should be read in order:
- [Signing AWS requests overview](https://docs.aws.amazon.com/general/latest/gr/sigv4_signing.html)
- [Creating canonical request](https://docs.aws.amazon.com/general/latest/gr/sigv4-create-canonical-request.html)
- [Creating string to sign](https://docs.aws.amazon.com/general/latest/gr/sigv4-create-string-to-sign.html)
- [Calculating signature](https://docs.aws.amazon.com/general/latest/gr/sigv4-calculate-signature.html)
### Signature Debugger
The most excellent [awssignature.com](http://www.awssignature.com/) was
indispensable in debugging the actual signature calculation process.
### MongoDB Server
MongoDB server internally defines the set of headers that it is prepared to
handle when it is processing AWS authentication. Headers that are not part
of that set cause the server to reject driver's payloads.
The error reporting when additional headers are provided and when the
correct set of headers is provided but the headers are not ordered
lexicographically [can be misleading](https://jira.mongodb.org/browse/SERVER-47488).
## Direct AWS Requests
[STS GetCallerIdentity API docs](https://docs.aws.amazon.com/STS/latest/APIReference/API_GetCallerIdentity.html)
When making direct requests to AWS, adding `Accept: application/json`
header will return the results in the JSON format, including the errors.
## AWS CLI
[Configuration reference](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html)
Note that AWS CLI uses `AWS_DEFAULT_REGION` environment variable to configure
the region used for operations.
## AWS Ruby SDK
[Configuration reference](https://docs.aws.amazon.com/sdk-for-ruby/v3/developer-guide/setup-config.html)
Note that AWS Ruby SDK uses `AWS_REGION` environment variable to configure
the region used for operations.
[STS::Client#assume_role documentation](https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/STS/Client.html#assume_role-instance_method)
## IMDSv2
`X-aws-ec2-metadata-token-ttl-seconds` is a required header when using
IMDSv2 EC2 instance metadata requests. This header is used in the examples
on [Amazon's page describing
IMDSv2](https://aws.amazon.com/blogs/security/defense-in-depth-open-firewalls-reverse-proxies-ssrf-vulnerabilities-ec2-instance-metadata-service/),
but is not explicitly stated as being required.
Not providing this header fails the PUT requests with HTTP code 400.
## IAM Roles For EC2 Instances
### Metadata Rate Limit
[Amazon documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html#instancedata-throttling)
states that the EC2 instance metadata endpoint is rate limited. Since the
driver accesses it to obtain credentials whenever a connection is established,
rate limits may adversely affect the driver's ability to establish connections.
### Instance Profile Assignment
It can take over 5 seconds for an instance to see its instance profile change
reflected in the instance metadata. Evergreen test runs seem to experience
this delay to a significantly larger extent than testing in a standalone
AWS account.
## IAM Roles For ECS Tasks
### ECS Task Roles
When an ECS task (or more precisely, the task definition) is created,
it is possible to specify an *execution role* and a *task role*. The two are
completely separate; an execution role is required to, for example, be
able to send container logs to CloudWatch if the container is running in
Fargate, and a task role is required for AWS authentication purposes.
The ECS task role is also separate from EC2 instance role and the IAM role
for a user to assume a role - these roles all require different configuration.
### `AWS_CONTAINER_CREDENTIALS_RELATIVE_URI` Scope
As stated in [this Amazon support document](https://aws.amazon.com/premiumsupport/knowledge-center/ecs-iam-task-roles-config-errors/),
the `AWS_CONTAINER_CREDENTIALS_RELATIVE_URI` environment variable is only
available to the PID 1 process in the container. Other processes need to
extract it from PID 1's environment:
strings /proc/1/environment
### Other ECS Metadata
`strings /proc/1/environment` also shows a number of other enviroment
variables available in the container with metadata. For example a test
container yields:
HOSTNAME=f893c90ec4bd
ECS_CONTAINER_METADATA_URI=http://169.254.170.2/v3/5fb0b11b-c4c8-4cdb-b68b-edf70b3f4937
AWS_DEFAULT_REGION=us-east-2
AWS_EXECUTION_ENV=AWS_ECS_FARGATE
AWS_REGION=us-east-2
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI=/v2/credentials/f17b5770-9a0d-498c-8d26-eea69f8d0924
### Metadata Rate Limit
[Amazon documentation](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/troubleshoot-task-iam-roles.html)
states that ECS task metadata endpoint is subject to rate limiting,
which is configured via [ECS_TASK_METADATA_RPS_LIMIT container agent
parameter](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-config.html).
When the rate limit is reached, requests fail with `429 Too Many Requests`
HTTP status code.
Since the driver accesses this endpoint to obtain credentials whenever
a connection is established, rate limits may adversely affect the driver's
ability to establish connections.