Friday, July 22, 2022

AWS Java SDK DynamoDBv2 Scan

AWS Java SDK DynamoDBv2 (com.amazonaws.services.dynamodbv2.document.Table) has a terrible API at performing scan operation. Against common sense, the table.scan(scanSpec) returns a ItemCollection object, which requires the developer to call ItemCollection.iterator() in order to trigger an actual scan. If the ItemCollection.iterator() method is not triggered, the itemCollection.lastLowLevelResult field will be null.

This doesn't work, and will reach Null Pointer Exception:

itemCollection = table.scan(scanSpec) 

System.out.println(itemCollection.lastLowLevelResult.items.size)

This will work - calling of iterator method is required to populate the itemCollection.lastLowLevelResult field.

itemCollection = table.scan(scanSpec)

          List<Item> items = new ArrayList() 

CollectionUtils.addAll(items, itemCollection.iterator()) 

System.out.println(itemCollection.lastLowLevelResult.items.size)

Monday, July 11, 2022

XGBoost Parameter

This is a quick documentation of my understanding of the XGBoost parameters

  • max_depth: how deep can one tree grow

  • num_rounds : how many trees are in a prediction model

  • learning_rate: the weight between applying result (residual value) to the next tree

  • alpha: regularization term. (related to pruning trees)

  • lambda: regularization term. (related to pruning trees)

  • gamma: minimum loss reduction (related to limiting the depth of a tree)

  • Reference: