How are you accessing/updating your wind data?

Discussion of meteorological data.

Moderators: Bonnie.Jonkman, Andy.Clifton

Jennifer.Rinker
Posts: 21
Joined: Tue Jun 25, 2013 11:34 am
Organization: Duke University
Location: NC, USA

How are you accessing/updating your wind data?

Postby Jennifer.Rinker » Fri Jul 12, 2013 11:45 am

Hello,

My current method of using the M4 met tower data is to go the url provided by Andy Clifton, then download the .mats manually to my hard drive. However, this is time-consuming, and if the data are re-analyzed at NREL and updated online, the data on my hard-drive is bad and I have to re-download everything again. It is not the ideal method of doing this, but I can't figure out how to tell MATLAB to take the .mats directly from the url. fread, urlwrite, and urlread have failed me.

How are other people accessing the data? And how do I know if the data I have on my hard drive are up-to-date? Is there a log so that I can know when the data on the website have been updated and re-uploaded?

Thanks!

Jenni
Jenni Rinker, Ph.D.
Mechanical Engineering & Materials Science
Duke University/NWTC

Andy.Clifton
Posts: 83
Joined: Wed Feb 29, 2012 3:13 pm
Organization: NREL
Location: Boulder, CO
Contact:

Re: How are you accessing/updating your wind data?

Postby Andy.Clifton » Fri Jul 12, 2013 1:07 pm

The problem is that we're using the http directory listing, which I now realise will cause trouble for wget, curl and co. What you need to do is get the text of the directory listing and then convert it into file names. You can then download each of the file names. I know this is tedious, but you should be able to figure it out.

Here's an example of how you can get the file names and download them using R.

Code: Select all

# script to demonstrate downloading NWTC met tower data using R

# script to demonstrate downloading NWTC met tower data using R



# START OF INPUTS
# get the location of the files online (ony need the root directory)
NREL.URL.Base <- "http://wind.nrel.gov/MetData/135mData/M5Twr/20Hz/mat"
# and define the location we will write to locally (again, only the root)
Local.URL.Base <- "~/Documents/temp"

# define the
my.years = c("2011","2012")
my.months = c("01","02","03","04","05","06","07","08","09","10","11","12")
my.days = c("01","02")

# END OF INPUTS

# load packages
require(RCurl)

# define the connection we will use to the NREL database
NREL.con = getCurlHandle(ftp.use.epsv = FALSE,
                         maxconnects=1,
                         fresh.connect=0,
                         timeout = 60,
                         useragent = "R")

# loop through the times and dates we defined
for (year in my.years){
  for (month in my.months){
    for (day in my.days){
      date.path <- paste(year,"/",
                         month,"/",
                         day,"/",
                         sep = "")
      # make the URL we want to check
      source.file.path <- paste(NREL.URL.Base,"/",
                                date.path,
                                sep = "")     
      if (url.exists(url = source.file.path)){     
        # get a file listing
        source.listing <- unlist(strsplit(getURL(source.file.path,
                                                 curl = NREL.con,
                                                 verbose=FALSE),
                                          "\n"))
       
        # scrape that listing into a list of files
        matches <- regexpr(pattern = "#?[0-9\\_]{18}\\.mat(?=<)",
                           text = source.listing,
                           perl = TRUE)
        mat.files <- NULL
        for (row in 1:NROW(matches)){
          if(matches[row] >0 ){
            mat.files <- c(mat.files, substr(source.listing[row],
                                             matches[row],
                                             matches[row]+attr(matches,"match.length")[row]-1))
           
            attr(matches,"match.length")[row]
          } else {
            # no match
          }
        }
        # make a directory to dump the files into
        dest.file.path = paste(Local.URL.Base,"/",
                               date.path,
                               sep = "")
        dir.create(dest.file.path,
                   recursive = TRUE)
        for (row in 1:NROW(mat.files)){
          download.file(url = paste(NREL.URL.Base,"/",
                                    date.path,
                                    mat.files[row],
                                    sep = ""),
                        destfile = paste(Local.URL.Base,"/",
                                         date.path,
                                         mat.files[row],     
                                         sep = ""))
        }
      } # end of url.exists loop
    } # end of the day loop
  } # end of the month loop
} # end of the day loop
Andy Clifton, Ph.D.
Senior Engineer

Everett.Perry
Posts: 33
Joined: Tue Jan 29, 2013 9:53 am
Organization: Texas Tech University
Location: Oregon

Re: How are you accessing/updating your wind data?

Postby Everett.Perry » Fri Jul 12, 2013 1:36 pm

Hi Jenni,

I have some code that may help. It is a bit clunky but works pretty well (actually it is very clunky). I could not figure out how to get a directory for the URL. However, I noticed that in general the file names are very similar and predictable (except for the last few characters sometimes). This program will download the "predictable" names and then you will only have to download the few files that this program doesn't catch. I built a counter into the program to let you know if an expected file is not detected.

Currently, the code is set to download all files for Jan 24th 2013 for the M5 20Hz data.

The best way to use it is to download a single day at a time (should be 144 10 minute files). At least with this program, you won't have to "click" on every file on the URL.

As far as the upgraded versions go, I have had to download the data more than once to assure I have the latest version. I just keep tabs on the forums to see when updates come out (or to see if an update even affects me).

Hope this helps. It looks like some of the long comments in the code have been wrapped (might look strange when pasted into Matlab).
Everett

PS: Looks like Andy has already posted something that is much more clever!

Code: Select all

%NWTC_20Hz_mat_file_saver.m
%This program will save files from the NREL HTTP site offered by Dr. Andrew
%Clifton. The program will take a little insight to run, but should be
%pretty straight forward. User will have to modify the three variables listed
%below (and a few other locations marked in the program) to start at the
%right file on the server. I couldn't find a way to get a directory listing
%for the server but since most of the files end with a "_020.mat" or
%something similar. This program will get those files. Then I only have to
%get the few files manually that don't end with the "_020.mat". Last update
%by Everett Perry: 07/12/2013 (comments only)
%
%###################### Set these three variables #################
startHour = 0;
startMin = 0; %Do not set to 50, will cause problems for minCntr loop. (0, 10, 20, 30, 40 only) Should just fix this.
the_year = '3'; %this will be either '2' or '3' (i.e 2012 or 2013)
%#################################################################

sec = '00';
noFileCntr = 0;
for day_cntr = 24:24 %Change this to run consecutive days (currently set at 24 to run day 24 only)
   for hourCntr = startHour:23 %i.e 0 to 23 hours
      for minCntr = startMin:10:50 %i.e
         if hourCntr<10
            hourStr = ['0' num2str(hourCntr)];
         elseif hourCntr>= 10
            hourStr = num2str(hourCntr);
         else
            disp('An error has occurred in the Hour of the filename!');
            disp('Program Terminated');
            clear
            return %Clear variables and KILL program
         end
         if minCntr ==0
            minStr = ['0' num2str(minCntr)];
         elseif minCntr ~=0
            minStr = num2str(minCntr);
         else
            disp('An error has occurred in the Minute of the filename!');
            disp('Program Terminated');
            clear
            return %Clear variables and KILL program
         end
         hour_min= [hourStr '_' minStr];
         
         %Some old notes here (outdated)
%          urlFileName = ['01183_' hour_min '_' sec '_016.mat'];
%          fullURL = ['http://wind.nrel.gov/MetData/135mData/M4Twr/20Hz/mat/2013/01/18/' urlFileName];
%          %http://wind.nrel.gov/MetData/135mData/M4Twr/20Hz/mat/2013/01/20/
%          %http://wind.nrel.gov/MetData/135mData/M4Twr/20Hz/mat/2013/01/17/
%            http://wind.nrel.gov/MetData/135mData/M5Twr/20Hz/mat/2013/01/
%          %filename = ['K:\X Research\NWTC Research\M4_Tower\M4_20Hz_Updated\mat\2013\01\20\' urlFileName];
%          filename = ['K:\X Research\NWTC Research\M4_Tower\M4_Ver1.27\Jan_2013\01-18-2013\' urlFileName];
         
         %###################################################################
         %###################################################################
         
         urlFileName = ['01' num2str(day_cntr) the_year '_' hour_min '_' sec '_003.mat']; %would have to change the '01' and the '_003.mat' (month and extension)
         fullURL = ['http://wind.nrel.gov/MetData/135mData/M5Twr/20Hz/mat/2013/01/' num2str(day_cntr) '/' urlFileName]; %would have to change this M4 or M5,2012 or 2013, 01 (month) etc
         filename = ['K:\X Research\NWTC Research\M5_Tower\M5_Ver1.27\Jan_2013\01-' num2str(day_cntr) '-2013\' urlFileName]; %Your directory here
         
         %###################################################################
         %###################################################################

         [F,STATUS] = urlwrite(fullURL,filename);
         pause(2) %I included a small pause here so I don't swamp the server

         if STATUS==0
            disp(' '); %Create a blank line
            disp('An error has occurred!!!');
            disp('It does not look like the filename exists!!!');
            disp('Verify the following filename');
            disp(' '); %Create a blank line
            disp(urlFileName);
            disp(' '); %Create a blank line
            noFileCntr = noFileCntr+1;
            disp('Program Terminated');
            clear
            return %Clear variables and KILL program
         end %End if
      end %End minCntr
   end %End hourCntr
end %day_cntr

disp(' '); %Blank Line
disp('noFileCntr');
disp(noFileCntr);
disp('Done');
clear
Everett Perry
PhD Candidate, National Wind Institute
Texas Tech University

Jennifer.Rinker
Posts: 21
Joined: Tue Jun 25, 2013 11:34 am
Organization: Duke University
Location: NC, USA

Re: How are you accessing/updating your wind data?

Postby Jennifer.Rinker » Tue Aug 13, 2013 10:28 am

Thanks for the responses! I ended up using Andy's code because it can be used to download the data from all of the days with minimal fuss, even including my inexperience with R. In case someone else is interested in implementing Andy's script, here's a brief idiot's guide for getting the script to work on Windows 7:

1. Download/install the latest version of R. Add the path to Rscript.exe to your system PATH variable, restart your computer (may not be necessary, but just in case).
2. Open the R GUI from your start menu, go to Packages -> Install Packages. Pick a mirror site (not sure the choice matters, I used USA (CA1)), then choose the RCurl package. If prompted, pick to use a personal library. Once the package is installed, close the GUI.
3. In your command line, change your directory to wherever you have saved the script. Enter the command "Rscript <name>.r". Things should start downloading to your specified folder.

This may not be the most correct/elegant method, but it worked on my computer so I thought I'd post it just in case.

Jenni

UPDATE: I didn't check the .mat files before posting yesterday, but because Windows is sensitive to binary/text issues, I needed to modify Andy's script slightly so that the files would open properly in MATLAB. Specifically, in the "download.file" command, I had to specify that the download mode was binary by adding in the flag 'mode = "wb" '. See the code except below.

Code: Select all

          download.file(url = paste(NREL.URL.Base,"/",
                                    date.path,
                                    mat.files[row],
                                    sep = ""),
                        destfile = paste(Local.URL.Base,"/",
                                         date.path,
                                         mat.files[row],     
                                         sep = ""),
                        mode = "wb")
Jenni Rinker, Ph.D.
Mechanical Engineering & Materials Science
Duke University/NWTC

Jennifer.Rinker
Posts: 21
Joined: Tue Jun 25, 2013 11:34 am
Organization: Duke University
Location: NC, USA

Re: How are you accessing/updating your wind data?

Postby Jennifer.Rinker » Mon Mar 02, 2015 5:01 pm

Hey all,

I've extended the work here in a Python script that will download new data or update old local data if the online modification date is newer than the local creation date. If you're interested, details are here: viewtopic.php?f=31&t=1256.

Cheers,
Jenni
Jenni Rinker, Ph.D.
Mechanical Engineering & Materials Science
Duke University/NWTC


Return to “NWTC Wind Data”

Who is online

Users browsing this forum: No registered users and 1 guest