The Center for Integrated Research Computing (CIRC) will host a 6-week Summer School to help students, postdocs, research staff, and faculty learn programming languages and sharpen their data analysis skills. The classes are designed for beginners and will cover basic topics to give enough direction to move on to self-learning tutorials or other more advanced coursework.
Each topic is an individual module that will take place during 1.5-hour lectures over 3 or 6 days. Each module is independent, and participants may sign up for one or more modules.
The classes will be taught in the VISTA Collaboratory, located on the first floor of Carlson Library. Classes start on July 19 and continue through August 25. Space is limited, so early registration is encouraged.
Register for CIRC Summer School
Schedule and Modules
The workshops require attendees to bring their own laptops and to have a computing account on BlueHive. If registrants have not connected to BlueHive from their laptop before, they should attend one of the four Connectivity Clinics, where CIRC staff will assist them with connecting their laptops to BlueHive using X2Go graphical sessions. Registration is not required for Connectivity Clinics; workshop registrants may stop by at any time during clinic hours.
- July 6, 2016 — 10:00am – noon and 1:00 – 3:00pm (VISTA Collaboratory)
- July 7, 2016 — 10:00am – noon and 1:00 – 3:00pm (VISTA Collaboratory)
- July 12, 2016 — 10:00am – noon and 1:00 – 3:00pm (VISTA Collaboratory)
- July 13, 2016 — 10:00am – noon and 1:00 – 3:00pm (VISTA Collaboratory)
- Introduction to SAS and its usage
- SAS program editor line commands
- SAS data sets and data types
- Importing and exporting data
- Summarizing and creating statistical reports for data analysis
- Exploring an visualizing data
- Conditional and iterative processing
- SAS and Excel
- Using SAS on BlueHive
- Stata language and common use cases
- Variables and data types
- Control syntax
- Basic calculations
- Importing and merging data
- Exporting, graphing, and exploring data
- Statistical analysis of data sets (including t-Test, ANOVA, regression, etc.)
- Hierarchical Linear Modeling (HLM) and Structural Equation Modeling (SEM)
- Project management and limitations of Stata
- Using Stata on BlueHive and with large memory requirements
- General overview of structured data
- General overview of relational databases
- Creating a MySQL database on BlueHive
- Creating schema, entity relationships, models, and indexes
- Querying a database
- Importing and exporting data
- Using Oracle MySQL Workbench
- History
- Basic file system layout
- Commands for manipulating files and directories
- Creating and editing files
- Monitoring and controlling processes
- Input/output redirection and pipes
- Wildcards and conventions
- Survey of useful commands (e.g. find, tar, etc.)
- Scheduling jobs on BlueHive
- History
- MATLAB desktop environment
- Data types (operators, arrays, etc.) and intrinsic function
- Basic plotting
- Saving and loading sessions
- MATLAB scripting
- Control structures and loops
- Functions
- Reading/writing data to files
- Symmetric multi-processing and the Parallel Computing Toolbox
- Accelerated MATLAB with GPUs
- Turning MATLAB routines into C code
- Review of basic Linux commands
- Overview of shell scripting
- File permissions and execution of scripts
- Variables and expressions
- Environment variables
- Conditional expressions
- If and while statements
- Loops and control
- Case and select
- Strings, parsing, and text processing
- Basic elements of Python scripts (comments, variables, strings, numbers, quotes)
- Variables
- Lists and tuples
- Dictionaries
- Conditionals and loops
- Functions
- Input and output
- Numerical expressions
- String expressions
- Regular expressions and text
- Classes and objects
- Classifying scientific data and determining appropriate visualization techniques
- Color space, channels, and maps
- OpenGL accelerated visualizations
- Remote accelerated visualization on BlueHive
- Visualizing time varying flows (streamlines, streaklines, pathlines, and timelines)
- Various techniques for 3D rendering (lighting sources, shading, ray casting, splatting, texturing, transfer functions)
- Applications for scientific visualization (Paraview, Visit, Fiji, MATLAB, Amira, IDL, VolView, 3D Slicer, Pymol, VMD, R, VTK)
- Using VISTA for visualization
Prerequisite: Knowledge of Python, Java, or Scala
- What is Big Data?
- Types of Big Data analytics
- Map reduce programming model
- Hadoop ecosystem
- Map reduce with Spark
- Graph processing with Spark
- Machine learning with Spark
- A Historical Perspective of Databases
- Key-value and document databases
- Column-family stores
- Graph databases
- How to choose when and when not to use NoSQL options