GitLabPage YML Breakdown

Andrew Schwabe

2021-01-27

Rather than establish an understanding of CI/CD (continuous integration / continuous deployment). In this post, I would rather skip over that and talk about this specific example. There is a benefit in understanding the why, however, sometimes we just need to get something done. Regardless of our understanding.

First of all the full configuration file:

image: node:10.15.3

cache:
 paths:
   - node_modules/

before_script:
 - npm install hexo-cli -g
 - test -e package.json && npm install
 - hexo generate

pages:
 script:
   - hexo generate
 artifacts:
   paths:
     - public
 only:
   - master

`image` Breakdown

image: node:10.15.3

To build a project on a build agent, specifically a build agent we do not own. We must first define the environment and dependencies needed to build the project. Remember this is not our local development environment anymore. So we have to define everything. The image property allows us to specify which Docker container to use as a build agent. Luckily, in this example the build agent configuration is quite simple. node:10.15.3 is readily available via DockerHub.

`cache` Breakdown

cache:
  paths:
    - node_modules/

When building a project, the project is likely to reference similar, if not the same, items between builds. To save on repeat download time. Increasing the speed of the build. We want to store these repeat files somewhere off to the side for the next build. For this we use a cache, in the configuration, we define the path we want to cache is node_modules/ and everything under it. The first build that is executed will prime the cache with all the required node modules being referenced. As the node modules change over time, the cache will be updated. Only fetching and adding the newly required/changed node modules to the cache.

`before_script` Breakdown

before_script:
  - npm install hexo-cli -g
  - test -e package.json && npm install
  - hexo generate

Now that we have a brand new build agent. We have told the build agent there is a cache to use. Next, we need to prepare the build agent to build. This is where before_script comes in. We pass in an array of commands to be executed in the order they appear. First, it must install Hexo CLI tools to generate the static files. Then, the build agent will test for the existence of a package.json file. If that is true, the build agent will install all the necessary node modules. Finally, the build agent will generate the static site from the repository. The output of the static site is referred to as a deployment artifact. This is what will be deployed to GitLabPages in the final step.

`pages` Breakdown

pages:
  script:
    - hexo generate
  artifacts:
    paths:
      - public
  only:
    - master

The final step in the process, deployment to the webserver. pages is a special node in the yml file. This is unique to GitLab pages. It indicates to the build agent the details to build and deploy to the GitLabPages webserver(s). The first argument is script which defines the command to execute to build the deployment artifact. Wait, didn't we just do this in the previous step. Funny story, yeah we did. I will get to this later for a more clear explanation and not take away from the deployment configuration. Now that we have an artifact to deploy, we need to define which directory should be deployed onto the webserver. That is done by defining artifacts > paths. If you have ever executed the hexo generate command locally, you will see that the output is placed in a directory named public. Finally, only limits what branch or branches will trigger the execution. In my case, master is the only branch. However, if we had other branches for experimentation, I may add additional steps here to deploy the experiments to some other location.

Why is `hexo generate` executed twice

As mentioned above, hexo generate is executed 2 times. First, it is executed as part of the before_script. Then as part of pages > scripts. But why? At first, this was not abundantly clear to me. Remember this is all pretty new, so this seems like learn by experiment opportunity.

Experiment #1 - Remove the command from `before_script`

before_script:
  - npm install hexo-cli -g
  - test -e package.json && npm install

pages:
  script:
    - hexo generate
  artifacts:
    paths:
      - public
  only:
    - master

Removing the execution of hexo generate from build_script is a logical separation. Now, before_script has the sole responsibility of preparing the build agent for the build and deployment. Committing this change, everything worked as expected. It even took ~5 seconds off the pipeline total time. Not bad, but is this the only solution?

Experiment #2 - Remove the command from `pages` > `script`

before_script:
  - npm install hexo-cli -g
  - test -e package.json && npm install
  - hexo generate

pages:
  artifacts:
    paths:
      - public
  only:
    - master

The above was an immediate failure. Come to find script is required under the pages configuration. Time to roll back the yml configuration back to experiment #1.

This is great, in detailing yml I found an unnecessary configuration in the yml. I guess this is a prime opportunity to contribute to the open-source community. Opening a pull request to the example project, hopefully, it will be accepted.

For additional resources check out this reference.

GitLabPage YML Breakdown

image Breakdown

cache Breakdown

before_script Breakdown

pages Breakdown

Why is hexo generate executed twice

Experiment #1 - Remove the command from before_script