This user guide is intended to be able to walk a user through the process of uploading data via the Secure File Transfer Protocol to SeRP.
It spells out the perquisites that is required of users, projects and datasets. It assumes that the user is a valid user of SeRP and an approved administration has the knowledge and rights to create projects, assigning permissions to roles and users to roles.
Step-by-step guide
1. HDP Prerequisites
The prerequisites listed under 1.1 – 1.3, should be created within the S3 feature of SeRP, to be reached at: https://ingest.ukserp.ac.uk
All of the perquisites listed under section 1 should be carried out as a minimum of 2 hours before data upload is commissioned.
1.1 Security > Projects
A corresponding project should be created within S3, with a status of current (end date has not passed) for any data uploads.
1.2 Security > Projects > Roles
Any user assigned with the responsibility for carrying out a data upload should have the permissions in Fig 1. below, assigned to a role within the project that data is being provided to.
Fig 1,
Recommendation: Create a Role titled “SFTP User”, provide these permissions to the role and add the required user to that role.
1.3 Projects & Datasets > Create New Dataset
Within the newly created project, there is a requirement to create a new dataset for the data being uploaded via SFTP to be assigned to.
“Create New Dataset” > “Provide a Dataset Name” > “Edit the Dataset”
Page 1. Populate the information related to the dataset and assign any information related to the project.
Page 5. Share Settings” > “Enable” and select the settings for the actions for the system to perform. See Fig 2 below.
Fig 2.
“Wait for Control Fire”: As an advanced method, it is possible to specify the schema to be used when ingesting the data. However, if no schema is available. Leave this as “No”.
Auto Publish is recommended to be set to “Auto” if there is no decision to review the defined schema before making the data available to end users. However, if it is preferred to validate that the schema has been created in accordance with the data, select “Manual”.
2. FTP Client
It is recommended that an appropriate FTP client is installed on the user machine performing the data upload to HDP. The FTP client must be able to support SFTP.
There are several FTP clients available, the SeRP team recommend the use of “Cyberduck” (https://cyberduck.io/) OR “FileZilla” (https://filezilla-project.org/)
3. Performing a data transfer
Once the prerequisite tasks listed under section 1 have been carried out and an appropriate FTP client is installed. A data transfer can now take place.
3.1 Configure the FTP Client
Within FTP Client, ensure the following details in the corresponding fields in the FTP Client.
Protocol: Ensure that the “SFTP” protocol is to be used
URL: https://ingest.ukserp.ac.uk
Username: Use the username (e.g. XXXX@chi.swan.ac.uk)
Password: Use the password for your account.
Following this, you will then be able to connect to the SFTP end point.
3.2 SFTP Destination
Providing that you have authenticated appropriately. Within the FTP client, you will be provided with a folder list of projects that you have access to as a data contributor user.
Note: These folders are prefixed with SeRP Name, followed by the project name configured in 1.1. Within each project folder will be a folder named as per the dataset created under 1.3. There will also be a 2nd folder called "SFTP Synched Files" . For each different dataset version created, subsequent folders will be created. However, there will only be one folder called "SFTP Synched Files".
3.3 Transferring Files
Within the FTP client, the user can “drag/drop” the required files/folders OR use the “upload” feature within the client to move the required files/folders.
The upload will then commence.
Note: When using control file. These can either be placed in a zip file or all files can be moved into the folder destination.
Note: If you wish the user to upload a file and make it available to both the database and the file storage, then they will need to move the file to the dataset appended with “SFTP Synched Files”.
3.4 Tracking Progress
To monitor progress behind the FTP upload and data ingest process.
Visit the URL https://ingest.ukserp.ac.uk and navigate to “Project & Datasets” > identify the project and dataset created in 1.1 and 1.3.
Next to the dataset created in 1.3 and select the “Progress” button, the user interface will demonstrate the 0% of data ingest progress.