intltechventures.blogspot.com: August 2018

2018-08-31

2018-08-31 Friday - The Greater Foundation, Opportunities for Community Service

In looking for a community service program to support - I happened upon The Greater Foundation, founded by Russell Okung - they are looking for mentors in the LA and Seattle area.

"Feed someone a fish, and you feed them for a day. Teach them to fish, and you feed them for a lifetime. Teach them to start a fishing company, and they can feed the entire world." -The Greater Foundation

https://begreater.org/get-involved/#be-a-mentor

2018-08-28

2018-08-28 Tuesday - Published Updated High Level Design (HLD) Template

https://github.com/intltechventures/Consulting.Project.Tools/blob/master/templates/HighLevelDesign.md

Rationale: Numerous times, in my travels as a consultant, I've encountered organizations in which there is no established standard/template for what should be included in an HLD. Quite often (even in organizations that ostensibly have a template), the artifacts that architects and engineers produce devolve into a Wild Wild West of anything-goes. Consistency engenders repeatability - which can help reduce variability in quality - and thus, a worthy goal.

Therefore, the intended purpose in sharing this...at least as a starting point for further customization by a client organization.

STATUS: DRAFT
The goal of an HLD is to facilitate communication and coordination, both within internal teams and organizations - as well as serving as a tool for communication and coordination with external partners.

The HLD provides a consistent format for teams to assemble details - which supports a goal of being able to achieve some level of reuse of design artifacts - and is intended to provide sufficient information to Program/Project Management to be able to plan, estimate, and coordinate large-scale development efforts. Additionally, an HLD servces as an effective mechanism to support Design Review and Architecture Governance efforts.

The ideal place for an HLD to live is in a Wiki or in a Markdown file (i.e. in the git repository for the project)

The target level of detail is to scope the effort, provide sufficient input to the estimation process, and to clearly articulate the How of an approach.

The What should be ideally defined in a collection of User Stories, Use Cases, Feature Requests, etc. - or, minimally, in a Business Requirements Document (BRD).

Investing effort in the creation of the HLD is intended for larger-scoped projects/epics - that have some non-trivial number of unknowns, technical complexity, a high number of coordination points, or extensive external integration/coordination requirements.

2018-08-15

2018-08-15 Wednesday - U.S. Data Privacy Laws

I recently became aware of new changes that are occurring in the United States, at the State level, with regards to new data privacy laws that are mirroring (or exceeding?) the European GDPR laws.

This posting is a placeholder for me to gather links to resources and articles related to these concerns - that may be relevant to some of the consulting work I do - and may be of interest to others.

California
California Consumer Privacy Act (CCPA)

https://www.caprivacy.org/

Effective Date: 2020-01-01

Highlights:

Right to know all data collected by a business on you.
Right to say NO to the sale of your information.
Right to DELETE your data.
Right to be informed of what categories of data will be collected about you prior to its collection, and to be informed of any changes to this collection.
Mandated opt-in before sale of children’s information (under the age of 16).
Right to know the categories of third parties with whom your data is shared.
Right to know the categories of sources of information from whom your data was acquired.
Right to know the business or commercial purpose of collecting your information.
Enforcement by the Attorney General of the State of California.
Private right of action when companies breach your data, to make sure these companies keep your information safe.
On 2020-01-01, companies must also comply with being able to verify (or provide) twelve month's of history (going back to 2019-01-01)

Assumed Reporting/Governance/Compliance Implications:

Businesses will be required to track and report, at the category > field/data element level (e.g. of a database, log file, blob storage) - data collected about customers and visitors to their site
Compliance departments will need to be able to produce reports (e.g. for audit/legal purposes) - identifying what field/data elements are involved in storing such data; when requests are received for such information (or requests for deletion); and when such data is deleted/purged (either by specific request - or by normal operational data management policies)
Compliance (and Security teams) will need to be able to produce reports identifying what customer-related tables/fields/data elements are stored (identifying with or without encryption ?)

Trigger Criteria [see page-10, 798.106. Definitions, (b), (1) and (2)]

Illustrative, not exhaustive:

For-Profit legal entity that does business in California and meets one of the following thresholds:

$50M+ in annual gross revenue;
Or, sells information, annually, for 100K consumers or devices (combined or separately);
Or, derives 50% or more of its annual revenues from selling consumers personal information.

GDPR

https://datagrail.io/blog/is-it-time-for-an-american-gdpr

Massachusetts
{To Be Research}

New York

NYDFS

https://www.dfs.ny.gov/about/cybersecurity.htm
https://www.dfs.ny.gov/docs/legal/regulations/adoptions/dfsrf500txt.pdf

23 NYCRR 500: CYBERSECURITY REQUIREMENTS FOR FINANCIAL SERVICES COMPANIES

https://www.dataprivacymonitor.com/tag/nydfs/
https://digitalguardian.com/blog/what-nydfs-cybersecurity-regulation-new-cybersecurity-compliance-requirement-financial
http://f.datasrvr.com/fr1/017/12004/Cybersecurity_Alert_2.23.17.pdf
https://www.hldataprotection.com/2017/08/articles/cybersecurity-data-breaches/a-guide-to-nydfs-cybersecurity-regulations-august-28-implementation-deadline/

"...your policy or policies must apply specifically to your entity and cover the following topics, as relevant to your organization:"

Information security
Data governance and classification
Asset inventory and device management
Access controls and identity management
Business continuity and disaster recovery planning and resources
Systems operations and availability concerns
Systems and network security and monitoring
Physical security and environmental controls
Customer data privacy
Incident response

https://www.hldataprotection.com/2018/10/articles/cybersecurity-data-breaches/new-obligations-under-the-nydfs-cybersecurity-regulation-came-online-in-september/

As of Tuesday, September 4, 2018, covered entities are required to be in compliance with additional requirements relating to:

Audit Trail (Section 500.06);
Application Security (Section 500.08);
Limitations on Data Retention (Section 500.13);
Monitoring of Authorized Users (Section 500.14(a)); and
Encryption of Non-public Information (Section 500.15).

Washington

https://www.geekwire.com/2019/washington-state-considers-new-privacy-law-regulate-data-collection-facial-recognition-tech

2018-08-11

2018-08-11 Saturday - Julia 1.0 released yesterday.

https://julialang.org/blog/2018/08/one-point-zero

https://insidehpc.com/2018/08/julia-1-0-release-opens-doors-connected-world/

"Julia powers the Federal Aviation Administration’s NextGen Aircraft Collision Avoidance System (ACAS-X), BlackRock’s trademark Aladdin analytics platform and the New York Federal Reserve Bank’s Dynamic Stochastic General Equilibrium (DSGE) macroeconomic model"

https://juliacomputing.com/case-studies/aviva.html

"Aviva is one of Europe’s largest insurance companies. Their legacy risk modeling system was built using IBM Algorithmics in 2012. But by 2016, that system wasn’t fast enough or sophisticated enough for Solvency II compliance."
"The Julia models ran about 1,000 times faster than IBM Algorithmics"
"Aviva managed to reduce the number of lines of code from ~14,000 lines in IBM Algorithmics to just ~1,000 lines in Julia - a 93% reduction"
"The server cluster size required to run Aviva’s risk model simulations fell 95% from 100 servers to 5 servers."
"These factors plus reduced licensing fees moving from a proprietary program (IBM Algorithmics) to an open-source language (Julia) resulted in overall savings of millions of pounds per year"

https://juliacomputing.com/case-studies/celeste.html

"Parallel Supercomputing for Astronomy: Loaded an aggregate of ~178 terabytes of image data and produced parameter estimates for 188 million stars and galaxies in 14.6 minutes"

If you have an interest in learning Julia, this book, Think Julia, by Ben Lauwens - might be of interest:

https://benlauwens.github.io/ThinkJulia.jl/latest/book.html

Personal Note:

I met the co-creators of Julia in 2012, when they traveled to St. Louis to present Julia at The Strange Loop 2012 conference.

2018-08-11 Saturday - Git Push Error to Bitbucket

Today I created a Bitbucket account - and while experimenting with it - noted that I got the following error whenever I performed a "git push" (note: "git clone" worked fine)

fatal: HttpRequestException encountered.
An error occurred while sending the request.

NOTE: (I'm already running Git v2.18.0)

https://github.com/git-for-windows/git/releases
Comes with Git v2.18.0.
Comes with Git Credential Manager v1.16.2.

You may need to update Microsoft Git Credential Manager for Windows

"Allow Bitbucket access tokens to be cast as credentials, and properly handle personal access tokens used as authentication in network requests. Now that Bitbucket access tokens are allowed to be case to credentials, start returning them from OAuthAuthenticator"

See the v1.17.1 changelog for Bitbucket related fixes...

NOTE: This fixed the problem for me

Additional reference links:

https://developercommunity.visualstudio.com/content/problem/201457/unable-to-connect-to-github-due-to-tls-12-only-cha.html

2018-08-08

2018-08-08 Wednesday - Google's The Site Reliability Workbook

Useful Tip: Download for free until August 23rd

The Site Reliability Workbook Edited by Betsy Beyer, Niall Richard Murphy, David K. Rensin, Kent Kawahara and Stephen Thorne

"The Site Reliability Workbook is the hands-on companion to the bestselling Site Reliability Engineering book and uses concrete examples to show how to put SRE principles and practices to work. This book contains practical examples from Google’s experiences and case studies from Google’s Cloud Platform customers. Evernote, The Home Depot, The New York Times, and other companies outline hard-won experiences of what worked for them and what didn’t."

2018-08-08 Wednesday - Good Cryptography Books

I'm doing an initial quick read of Bruce Schneier (@schneierblog) 20th Anniversary Edition, Applied Cryptography (Protocols, Algorithms, and Source Code in C) during my evenings this week - and finding it to still be an excellent text for a foundational level of knowledge-building.

However, for folks interested in perhaps a more recent treatment, they may find his collaboration with Niels Ferguson, and Tadayoshi Kohno, in their book - Cryptography Engineering: Design Principles and Practical Applications - to be of interest.

Side Note:

I had the pleasure of meeting Mr. Schneier at a conference some years ago (QCon in San Francisco, I believe - but it may have been at Strange Loop in Saint Louis) - in which I asked him a question about his estimation of the prevalence of polymorphic viruses/trojans - his answer was essentially, too paraphrase: "not much". I have long pondered his answer...

2018-08-08 Wednesday - Jepsen.io Benchmark Report on MongoDB 3.4.0-rc3 (Feb 7, 2017)

If you are using MongoDB somewhere in your solution stack, you may be interested in the Feb 7th 2017 report prepared by Jepsen.io

https://jepsen.io/analyses/mongodb-3-4-0-rc3

"In April 2015, we discussed stale and dirty reads in MongoDB 2.6.7. However, writes appeared to be safe; update-only workloads with majority write concern were linearizable. This conclusion was not entirely correct. In this Jepsen analysis, we develop new tests which show the MongoDB v0 replication protocol is intrinsically unsafe, allowing the loss of majority-committed documents. In addition, we show that the new v1 replication protocol has multiple bugs, allowing data loss in all versions up to MongoDB 3.2.11 and 3.4.0-rc4. While the v0 protocol remains broken, fixes for v1 are available in MongoDB 3.2.12 and 3.4.0, and now pass the expanded Jepsen test suite. This work was funded by MongoDB, and conducted in accordance with the Jepsen ethics policy."

You may also be interested in the MITRE CVE (Critical Vulnerabilities database entries)

https://www.cvedetails.com/vulnerability-list/vendor_id-12752/product_id-25450/Mongodb-Mongodb.html

And finally, I would highly recommend reviewing the CIS Benchmarks (tm) report published by CIS Center for Internet Security

Securing MongoDB An objective, consensus-driven security guideline for the MongoDB Server Software.

https://www.cisecurity.org/benchmark/mongodb/

2018-12-04 Update

My friend and colleague Bob Harwood, mentioned that there was an update posted on 2018-10-23

https://jepsen.io/analyses/mongodb-3-6-4

"In February 2017, we discussed data loss and fixes in MongoDB 3.4.0-rc3’s v0 and v1 replication protocols. In this Jepsen report, we will verify that MongoDB 3.6.4’s sharded clusters offer comparable safety to non-sharded deployments. We’ll also discuss MongoDB’s new support for causal consistency (CC) in version 3.6.4 and 4.0.0-rc1, and show that sessions prevent anomalies so long as user stick to majority reads and writes. However, with MongoDB’s default consistency levels, CC sessions fail to provide the claimed invariants"

2018-08-06

2018-08-06 Monday - GDPR and Blockchain Privacy Implications

Some preliminary reading I'm doing...

GDPR

Blockchain Technology and the GDPR - How to Reconcile Privacy and Distributed Ledgers

https://heinonline.org/HOL/LandingPage?handle=hein.journals%2Fedpl2&div=71&id&page

Blockchains and Data Protection in the European Union

Max Planck Institute for Innovation & Competition Research Paper No. 18-01

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3080322
"This paper examines data protection on blockchains and other forms of distributed ledger technology (‘DLT’). Transactional data stored on a blockchain, whether in plain text, encrypted form or after having undergone a hashing process, constitutes personal data for the purposes of the GDPR. Public keys equally qualify as personal data as a matter of EU data protection law. We examine the consequences flowing from that state of affairs and suggest that in interpreting the GDPR with respect to blockchains, fundamental rights protection and the promotion of innovation, two normative objectives of the European legal order, must be reconciled. This is even more so given that, where designed appropriately, distributed ledgers have the potential to further the GDPR’s objective of data sovereignty. "

2018-08-02

2018-08-02 Thursday - Web Browser HTTP Caching

A former colleague asked me a few questions recently regarding the why of web browser caching, due to some perceived unexpected behavior in their SaaS solution offering - and I prepared this short list of resources for them to read. This is a topic that is a bit more complex than what might appear at first glance - due to the caching that can occur on the browser, or on the server side, or via some intermediary - such as a Content Distribution Network, or an in-memory data caching layer (such as Redis, MongoDB, or MemcacheDB, or Cassandra)

"Here be dragons..."

HTTP caching occurs when the browser stores local copies of web resources for faster retrieval the next time the resource is required. As your application serves resources it can attach cache headers to the response specifying the desired cache behavior.

A good overview of caching
https://medium.com/@codebyamir/a-web-developers-guide-to-browser-caching-cc41f3b73e7c

https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching

Caching is a technique that stores a copy of a given resource and serves it back when requested. When a web cache has a requested resource in its store, it intercepts the request and returns its copy instead of re-downloading from the originating server.

NOTE: See the discussion on "Freshness"
https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching#Freshness

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control

The Cache-Control general-header field is used to specify directives for caching mechanisms in both requests and responses. Caching directives are unidirectional, meaning that a given directive in a request is not implying that the same directive is to be given in the response.

A bit older article, but might be a good read too...
https://www.mnot.net/cache_docs/

some good technical articles - with good technical discussion

https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching

https://www.incapsula.com/cdn-guide/glossary/cache-control.html

https://f5.com/resources/white-papers/caching-behavior-of-web-browsers

https://www.keycdn.com/blog/http-cache-headers/

https://stackoverflow.com/questions/49547/how-to-control-web-page-caching-across-all-browsers

RFC-7234 Hypertext Transfer Protocol (HTTP/1.1): Caching
https://tools.ietf.org/html/rfc7234

RFC-2616, Section 14 Header Field Defintiions
https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

Some additional articles to read/review: