Puppet Plan: patching

Defined in:
plans/init.pp

Summary

Our generic and semi-opinionated workflow.

Overview

It serves as a showcase of how all of the building blocks in this module can be tied together to create a full blown patching workflow. This is a great initial workflow to patch servers. We fully expect others to take this workflow as a build-block and customize it to meet their needs.

Examples:

CLI - Basic usage

bolt plan run patching --targets linux_patching,windows_patching

CLI - Disable snapshot creation, because an old patching run failed and we have an old snapshot to rely on

bolt plan run patching --targets linux_patching,windows_patching snapshot_create=false

CLI - Disable snapshot deletion, because we want to wait for app teams to test.

bolt plan run patching --targets linux_patching,windows_patching snapshot_delete=true

# sometime in the future, delete the snapshots
bolt plan run patching::snapshot_vmare --targets linux_patching,windows_patching action='delete'

CLI - Customize the pre/post update plans to use your own module’s version

bolt plan run patching --targets linux_patching pre_update_plan='mymodule::pre_update' post_update_plan='mymodule::post_update'

CLI - Wait 10 minutes for systems to become available as some systems take longer to reboot.

bolt plan run patching --targets linux_patching,windows_patching --reboot_wait 600

Parameters:

  • targets (TargetSpec)

    Set of targets to run against.

  • filter_offline_targets (Boolean) (defaults to: false)

    Flag to determine if offline targets should be filtered out of the list of targets returned by this plan. If true, when running the puppet_agent::version check, any targets that return an error will be filtered out and ignored. Those targets will not be returned in any of the data structures in the result of this plan. If false, then any targets that are offline will cause this plan to error immediately when performing the online check. This will result in a halt of the patching process.

  • monitoring_enabled (Optional[Boolean]) (defaults to: undef)

    Flag to enable/disable the execute of the monitoring_plan. This is useful if you don’t want to call out to a monitoring system during provisioning. To configure this globally, use the ‘patching_monitoring_enabled` var.

  • monitoring_plan (Optional[String]) (defaults to: undef)

    Name of the plan to use for disabling/enabling monitoring steps of the workflow. To configure this globally, use the ‘patching_monitoring_plan` var.

  • update_provider (Optional[String]) (defaults to: undef)

    What update provider to use. For Linux (RHEL, Debian, SUSE, etc.) this parameter is not used. For Windows the available values are: ‘windows’, ‘chocolatey’, ‘all’ (both ‘windows’ and ‘chocolatey’). The default value for Windows is ‘all’. If ‘all’ is passed and Chocolatey isn’t installed then Chocolatey will simply be skipped. If ‘chocolatey’ is passed and Chocolatey isn’t installed, then this will error.

  • pre_update_plan (Optional[String]) (defaults to: undef)

    Name of the plan to use for executing the pre-update step of the workflow.

  • post_update_plan (Optional[String]) (defaults to: undef)

    Name of the plan to use for executing the post-update step of the workflow.

  • reboot_message (Optional[String]) (defaults to: undef)

    Message displayed to the user prior to the system rebooting

  • reboot_strategy (Optional[Enum['only_required', 'never', 'always']]) (defaults to: undef)

    Determines the reboot strategy for the run.

    - 'only_required' only reboots hosts that require it based on info reported from the OS
    - 'never' never reboots the hosts
    - 'always' will reboot the host no matter what
    
  • reboot_wait (Optional[Integer]) (defaults to: undef)

    Time in seconds that the plan waits before continuing after a reboot. This is necessary in case one of the groups affects the availability of a previous group. Two use cases here:

    1. A later group is a hypervisor. In this instance the hypervisor will reboot causing the
       VMs to go offline and we need to wait for those child VMs to come back up before
       collecting history metrics.
    2. A later group is a linux router. In this instance maybe the patching of the linux router
       affects the reachability of previous hosts.
    
  • disconnect_wait (Optional[Integer]) (defaults to: undef)

    How long (in seconds) to wait before checking whether the server has rebooted. Defaults to 10.

  • snapshot_plan (Optional[String]) (defaults to: undef)

    Name of the plan to use for executing snaphot creation and deletion steps of the workflow You can also pass ‘’disabled’‘ or `undef’‘ as an easy way to disable both creation and deletion.

  • snapshot_create (Optional[Boolean]) (defaults to: undef)

    Flag to enable/disable creating snapshots before patching groups. A common usecase to disabling snapshot creation is that, say you run patching with ‘snapshot_create` enabled and something goes wrong during patching and the run fails. The sanpshot still exists and you want to retry patching but don’t want to create ANOTHER snapshot on top of the one we already have. In this case we would pass in ‘snapshot_create=false` when running the second time.

  • snapshot_delete (Optional[Boolean]) (defaults to: undef)

    Flag to enable/disable deleting snapshots after patching groups. A common usecase to disable snapshot deletion is that, say you want to patch your hosts and wait a few hours for application teams to test after you’re done patching. In this case you can run with ‘snapshot_delete=false` and then a few hours later you can run the `patching::snapshot_vmware action=delete` sometime in the future.

  • report_format (Optional[Enum['none', 'pretty', 'csv']]) (defaults to: undef)

    The method of formatting the report data.

  • report_file (Optional[String]) (defaults to: undef)

    Path of the filename where the report should be written. Default = ‘patching_report.csv’. If you would like to disable writing the report file, specify a value of ‘disabled’. NOTE: If you’re running PE, then you’ll need to disable writing reports because it will fail when running from the console.

  • noop (Boolean) (defaults to: false)

    Flag to enable noop mode for the underlying plans and tasks.



114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
# File 'plans/init.pp', line 114

plan patching (
  TargetSpec        $targets,
  Boolean           $filter_offline_targets = false,
  Optional[Boolean] $monitoring_enabled   = undef,
  Optional[String]  $monitoring_plan      = undef,
  Optional[String]  $pre_update_plan      = undef,
  Optional[String]  $update_provider      = undef,
  Optional[String]  $post_update_plan     = undef,
  Optional[Enum['only_required', 'never', 'always']] $reboot_strategy = undef,
  Optional[String]  $reboot_message       = undef,
  Optional[Integer] $reboot_wait          = undef,
  Optional[Integer] $disconnect_wait      = undef,
  Optional[String]  $snapshot_plan        = undef,
  Optional[Boolean] $snapshot_create      = undef,
  Optional[Boolean] $snapshot_delete      = undef,
  Optional[Enum['none', 'pretty', 'csv']] $report_format = undef,
  Optional[String]  $report_file          = undef,
  Boolean           $noop                 = false,
) {
  ## Filter offline targets
  $check_puppet_result = run_plan('patching::check_puppet', $targets,
                                  filter_offline_targets => $filter_offline_targets)
  # use all targets, both with and without puppet
  $_targets = $check_puppet_result['all']

  # read variables for plan-level settings
  $plan_vars = $_targets[0].vars
  $reboot_wait_plan = pick($reboot_wait,
                            $plan_vars['patching_reboot_wait'],
                            300)
  $report_format_plan = pick($report_format,
                              $plan_vars['patching_report_format'],
                              'pretty')
  $report_file_plan = pick($report_file,
                            $plan_vars['patching_report_file'],
                            'patching_report.csv')

  ## Group all of the targets based on their 'patching_order' var
  $ordered_groups = run_plan('patching::ordered_groups', $_targets)

  # we can now use the $ordered_keys array above to index into our $ordered_hash
  # pretty cool huh?
  $ordered_groups.each |$group_hash| {
    $ordered_targets = $group_hash['targets']
    if $ordered_targets.empty {
      fail_plan("Targets not assigned the var: 'patching_order'")
    }

    # override configurable parameters on a per-group basis
    # if there is no customization for this group, it defaults to the global setting
    # set at the plan level above
    $group_vars = $ordered_targets[0].vars
    # Prescedence: CLI > Config > Default
    $monitoring_plan_group = pick($monitoring_plan,
                                  $group_vars['patching_monitoring_plan'],
                                  'patching::monitoring_solarwinds')
    $monitoring_enabled_group = pick($monitoring_enabled,
                                      $group_vars['patching_monitoring_enabled'],
                                      true)
    $reboot_strategy_group = pick($reboot_strategy,
                                  $group_vars['patching_reboot_strategy'],
                                  'only_required')
    $reboot_message_group = pick($reboot_message,
                                  $group_vars['patching_reboot_message'],
                                  'NOTICE: This system is currently being updated.')
    $update_provider_group = pick_default($update_provider,
                                          $group_vars['patching_update_provider'],
                                          undef)
    $reboot_wait_group = pick($reboot_wait,
                              $group_vars['patching_reboot_wait'],
                              300)
    $pre_update_plan_group = pick($pre_update_plan,
                                  $group_vars['patching_pre_update_plan'],
                                  'patching::pre_update')
    $post_update_plan_group = pick($post_update_plan,
                                    $group_vars['patching_post_update_plan'],
                                    'patching::post_update')
    $snapshot_plan_group = pick($snapshot_plan,
                                $group_vars['patching_snapshot_plan'],
                                'patching::snapshot_vmware')
    $snapshot_create_group = pick($snapshot_create,
                                  $group_vars['patching_snapshot_create'],
                                  true)
    $snapshot_delete_group = pick($snapshot_delete,
                                  $group_vars['patching_snapshot_delete'],
                                  true)
    $disconnect_wait_group = pick($disconnect_wait,
                                  $group_vars['patching_disconnect_wait'],
                                  10)

    # do normal patching

    ## Update patching cache (yum update, apt-get update, etc)
    run_task('patching::cache_update', $ordered_targets,
              _noop => $noop)

    ## Check for updates on hosts
    $available_results = run_plan('patching::available_updates', $ordered_targets,
                                  provider => $update_provider_group,
                                  format   => 'pretty',
                                  noop     => $noop)
    $update_targets = $available_results['has_updates']
    if $update_targets.empty {
      next()
    }

    ## Disable monitoring
    if $monitoring_enabled_group and $monitoring_plan_group and $monitoring_plan_group != 'disabled' {
      run_plan($monitoring_plan_group, $update_targets,
                action => 'disable',
                noop   => $noop)
    }

    ## Create VM snapshots
    if $snapshot_create_group and $snapshot_plan_group and $snapshot_plan_group != 'disabled'{
      run_plan($snapshot_plan_group, $update_targets,
                action => 'create',
                noop   => $noop)
    }

    ## Run pre-patching script.
    run_plan($pre_update_plan_group, $update_targets,
              noop => $noop)

    ## Run package update.
    $update_result = run_task('patching::update', $update_targets,
                              provider       => $update_provider_group,
                              _catch_errors  => true,
                              _noop          => $noop)

    ## Collect list of successful updates
    $update_ok_targets = $update_result.ok_set.targets
    ## Collect list of failed updates
    $update_errors = $update_result.error_set

    ## Check if any hosts with failed updates.
    if $update_errors.empty {
      $status = 'OK: No errors detected.'
    } else {
      # TODO print out the full error message for each of these
      alert('The following hosts failed during update:')
      alert($update_errors)
      $status = 'WARNING: Errors detected during update.'
    }

    if !$update_ok_targets.empty {
      ## Run post-patching script.
      run_plan($post_update_plan_group, $update_ok_targets,
                noop => $noop)

      ## Check if reboot required and reboot if true.
      run_plan('patching::reboot_required', $update_ok_targets,
                strategy        => $reboot_strategy_group,
                message         => $reboot_message_group,
                wait            => $reboot_wait_group,
                disconnect_wait => $disconnect_wait_group,
                noop            => $noop)

      ## Remove VM snapshots
      if $snapshot_delete_group and $snapshot_plan_group and $snapshot_plan_group != 'disabled' {
        run_plan($snapshot_plan_group, $update_ok_targets,
                  action => 'delete',
                  noop   => $noop)
      }
    }
    # else {
    #   # TODO should we break here?
    # }

    ## enable monitoring
    if $monitoring_enabled_group and $monitoring_plan_group and $monitoring_plan_group != 'disabled'  {
      run_plan($monitoring_plan_group, $update_targets,
                action => 'enable',
                noop   => $noop)
    }
  }

  ## Re-establish all targets availability after reboot
  # This is necessary in case one of the groups affects the availability of a previous group.
  # Two use cases here:
  #  1. A later group is a hypervisor. In this instance the hypervisor will reboot causing the
  #     VMs to go offline and we need to wait for those child VMs to come back up before
  #     collecting history metrics.
  #  2. A later group is a linux router. In this instance maybe the patching of the linux router
  #     affects the reachability of previous hosts.
  wait_until_available($_targets, wait_time => $reboot_wait_plan)

  ## Collect summary report
  run_plan('patching::update_history', $_targets,
            format      => $report_format_plan,
            report_file => $report_file_plan)

  ## Display final status
  return()
}