Registration Open for Research Computing Summer School

The Center for Integrated Research Computing (CIRC) will host a 6-week Summer School to help students, postdocs, research staff, and faculty learn programming languages and sharpen their data analysis skills. The classes are designed for beginners and will cover basic topics to give enough direction to move on to self-learning tutorials or other more advanced coursework.

 

Each topic is an individual module that will take place during 1.5-hour lectures over 3 or 6 days. Each module is independent, and participants may sign up for one or more modules.

The classes will be taught in the VISTA Collaboratory, located on the first floor of Carlson Library. Classes start on July 19 and continue through August 25. Space is limited, so early registration is encouraged.

Register for CIRC Summer School

Schedule and Modules

Prerequisite: Connectivity Clinic

The workshops require attendees to bring their own laptops and to have a computing account on BlueHive. If registrants have not connected to BlueHive from their laptop before, they should attend one of the four Connectivity Clinics, where CIRC staff will assist them with connecting their laptops to BlueHive using X2Go graphical sessions. Registration is not required for Connectivity Clinics; workshop registrants may stop by at any time during clinic hours.

  • July 6, 2016 — 10:00am – noon and 1:00 – 3:00pm (VISTA Collaboratory)
  • July 7, 2016 — 10:00am – noon and 1:00 – 3:00pm (VISTA Collaboratory)
  • July 12, 2016 — 10:00am – noon and 1:00 – 3:00pm (VISTA Collaboratory)
  • July 13, 2016 — 10:00am – noon and 1:00 – 3:00pm (VISTA Collaboratory)

Module 1: SAS — 10:00 - 11:30am on July 19-21, 2016
  • Introduction to SAS and its usage
  • SAS program editor line commands
  • SAS data sets and data types
  • Importing and exporting data
  • Summarizing and creating statistical reports for data analysis
  • Exploring an visualizing data
  • Conditional and iterative processing
  • SAS and Excel
  • Using SAS on BlueHive
Module 2: Stata — 1:00 - 2:30pm on July 19-21, 2016
  • Stata language and common use cases
  • Variables and data types
  • Control syntax
  • Basic calculations
  • Importing and merging data
  • Exporting, graphing, and exploring data
  • Statistical analysis of data sets (including t-Test, ANOVA, regression, etc.)
  • Hierarchical Linear Modeling (HLM) and Structural Equation Modeling (SEM)
  • Project management and limitations of Stata
  • Using Stata on BlueHive and with large memory requirements
Module 3: MySQL — 10:00 - 11:30am on July 26-28, 2016
  • General overview of structured data
  • General overview of relational databases
  • Creating a MySQL database on BlueHive
  • Creating schema, entity relationships, models, and indexes
  • Querying a database
  • Importing and exporting data
  • Using Oracle MySQL Workbench
Module 4: Linux — 1:00 - 2:30pm on July 26-28, 2016
  • History
  • Basic file system layout
  • Commands for manipulating files and directories
  • Creating and editing files
  • Monitoring and controlling processes
  • Input/output redirection and pipes
  • Wildcards and conventions
  • Survey of useful commands (e.g. find, tar, etc.)
  • Scheduling jobs on BlueHive

Module 5: MATLAB — 10:00 - 11:30am on August 2-4, 2016
  • History
  • MATLAB desktop environment
  • Data types (operators, arrays, etc.) and intrinsic function
  • Basic plotting
  • Saving and loading sessions
  • MATLAB scripting
  • Control structures and loops
  • Functions
  • Reading/writing data to files
  • Symmetric multi-processing and the Parallel Computing Toolbox
  • Accelerated MATLAB with GPUs
  • Turning MATLAB routines into C code

Module 6: Bash Shell — 1:00 - 2:30pm on August 2-4, 2016
  • Review of basic Linux commands
  • Overview of shell scripting
  • File permissions and execution of scripts
  • Variables and expressions
  • Environment variables
  • Conditional expressions
  • If and while statements
  • Loops and control
  • Case and select
  • Strings, parsing, and text processing

Modules 7 and 9: R — 10:00 - 11:30am on August 9-11 and 16-18, 2016

  • Scalars, vectors, lists, and data frames
  • Basic calculations and syntax
  • Importing and exporting data
  • Graphing and visualizing data
  • Summary statistics of data sets
  • Probability distributions and statistical tests
  • Clustering and supervised learning
  • R packages
  • Functions
  • R programming
  • Running R on BlueHive

Module 8 and 10: Python — 1:00 - 2:30pm on August 9-11 and 16-18, 2016
  • Basic elements of Python scripts (comments, variables, strings, numbers, quotes)
  • Variables
  • Lists and tuples
  • Dictionaries
  • Conditionals and loops
  • Functions
  • Input and output
  • Numerical expressions
  • String expressions
  • Regular expressions and text
  • Classes and objects

Module 11: Visualization — 10:00 - 11:30am on August 23-25, 2016
  • Classifying scientific data and determining appropriate visualization techniques
  • Color space, channels, and maps
  • OpenGL accelerated visualizations
  • Remote accelerated visualization on BlueHive
  • Visualizing time varying flows (streamlines, streaklines, pathlines, and timelines)
  • Various techniques for 3D rendering (lighting sources, shading, ray casting, splatting, texturing, transfer functions)
  • Applications for scientific visualization (Paraview, Visit, Fiji, MATLAB, Amira, IDL, VolView, 3D Slicer, Pymol, VMD, R, VTK)
  • Using VISTA for visualization
Module 12: Big Data — 1:00 - 2:30pm on August 23-25, 2016

Prerequisite: Knowledge of Python, Java, or Scala

  • What is Big Data?
  • Types of Big Data analytics
  • Map reduce programming model
  • Hadoop ecosystem
  • Map reduce with Spark
  • Graph processing with Spark
  • Machine learning with Spark
  • A Historical Perspective of Databases
  • Key-value and document databases
  • Column-family stores
  • Graph databases
  • How to choose when and when not to use NoSQL options