Saturday, April 28, 2012

LoTW Processing

Some time ago saw some news about LoTW (Log book of the world) having problems processing the incoming QSO log's. It seems that their system was receiving a lot of log's and had issues processing them.

I've been thinking about this for some time but yesterday decided to make some tests on log processing to see and learn what could be the pitfalls on such a system. Not that I wanted to build such a system but because I burn my brains out doing some rather boring algorithm and needed to cool my ideas :)


Let's see what time it would take to create 500K (500000) QSO's on file then to look for uniq callsign's and finally to search for the position of a specific callsign and qso in the same file.


 Not bad... 52 seconds on my slow machine...
QSO's and call signs randomly create in this format:
CALLSIGN1:CALLSIGN2:YEARmonthDAYhourMinute:RST:qth,op,

Now let's find the uniq callsigns from the QSO list, since they were random generated and due also their long size (2 leters, 1 number and 3 letters) almost none (in percentage) were duplicate, for 1Million (500K * 2) callsigns, 993864 were uniq:

 OK, now we start to see that searching is fast and file creation is slow and that alone starts to explain the slow log processing (or not, since I have no clue on the type of system used), especially if they come by network, although multiple concurrent connections and process can speed it up....I'm sure DOS is not used :)

Also fast is searching 1 callsign QSO's position in file:



...Specially after buffering (file read, in this case done by the OS)... see the difference in the first iteration of the program an the subsequent ones... I am sure that looking for all call signs qso's position in the file after buffering would take less than 20 hours...

I didn't tried different algorithms to optimize the system nor I used a database something I hope LoTW uses. Also the language chosen is not the most blazing fast for this type of operation.

Here's the code used in case you need it for something...





------------//---------
Create 500k random qso's
------//-----------
 // create random qso contacts bettwen random call signs and write on file... just for testing matching qso's
 // by CT2GQV 2012
 // Licence: use and abuse, it's free
 // if you don't change the settings it will create 500K records...
 // settings
 set_time_limit(120); // 2 minutes... instead of 30s... only with safe mode disabled.. or change php.ini..
 $contacts_file = './contacts.qsl';
 $create_how_many=500000;
 // may not be possible in all systems...
 $start_time=microtime(true);
 // one stupid way of generating rando chars...
 $characters = array("A","B","C","D","E","F","G","H","J","K","L","M","N","P","Q","R","S","T","U","V","W","X","Y","Z");
 // let's create
 $rst_count=0;
 $a=0;
 // let's open the file before...
 $fh = fopen($contacts_file, 'a') or die("ERROR: can't open contacts");
 while ($a<$create_how_many) {
  // 2 leters.... 1 number, 3 letters... for simplifity
  $call1=$characters[rand(0,23)].$characters[rand(0,23)].rand(0,9).$characters[rand(0,23)].$characters[rand(0,23)].$characters[rand(0,23)];
  $call2=$characters[rand(0,23)].$characters[rand(0,23)].rand(0,9).$characters[rand(0,23)].$characters[rand(0,23)].$characters[rand(0,23)];
  // minimum signal is 233 :)
  $rst=rand(2,5).rand(3,9).rand(3,9);
  // just the creation date...
  $utc=date("YmdGi");
  // just for fun...
  if($rst=="599"){$rst_count++;};
  // remove next line if no echo is needed
  //  echo "$call1:$call2:$utc:$rst:Just a comment\n";  

///// create file contacts.qsl beforeand and chmod to writable...
//    $fh = fopen($contacts_file, 'a') or die("ERROR: can't open contacts");
    $data_to_apend="$call1:$call2:$utc:$rst:qth,op,\n";
    fwrite($fh, $data_to_apend);
//    fclose($fh);
  // add the counter...
  $a++;
 };
// closed only after the loop to save some time...
fclose($fh);
$end_time=microtime(true);
$time = $end_time - $start_time;
echo "\n\nDone $create_how_many contacts in $time secounds and $rst_count QSO's were 599...\n\n";
?>





--------//-----------
Find uniq call
-------//------------ 
 // some settings
 set_time_limit(120);
 // the file with the QSO's
 $qsl_file = './500kcontacts.qsl';
 $uniq_call_file =  './uniq-call.qsl';
 $uniq_call_array=array();
 $temp=array();
 $start_time=microtime(true);
 // let's loop the QSO's file
 $file_handle = fopen($qsl_file, "r") or die("ERROR: can't open the QSO's file");
 // were we are going to store the uniq callsigns
 $file_handle2 = fopen($uniq_call_file, 'a') or die("ERROR: can't open uniq callsign file");
 $count=0;
 while (!feof($file_handle)) {
  $lines = fgets($file_handle);
  $pieces=explode(":", $lines);
  if($pieces[0]!="" || $pieces[1]!=""){ // if one or the other are not empty callsigns then save... 
  // the only issue is an empty callsign, but rand on the creation doesn't allow :)
     $temp[]=$pieces[0]; $temp[]=$pieces[1];
  };
 }; // end loop reading the QSO's file

   fclose($file_handle);
   // it's good to free before another mem request...
   $uniq_call_array = array_unique($temp);

   foreach ($uniq_call_array as $value) {
     // echo "$value\n";
     $add_to_file="$value\n";
     fwrite($file_handle2, $add_to_file);
     $count++;
   }
  fclose($file_handle2);
  $end_time=microtime(true);
  $time = $end_time - $start_time;
  echo "\n$count Uniq callsigns list:\n";
  print_r($result);
  echo "In $time secounds\n";
?>
-------//---------
Find a contact from a call in the file

-------//---------
$start_time=microtime(true);
 $search_call="ET3QPV";
 $file = file_get_contents("./500kcontacts.qsl");
 $offset = 0;
 $counter = 0;

    if(strpos($file, $search_call) == 0){
        $counter++;
        echo "\nQSO #$counter at pos: 0";
    }

    while($offset = strpos($file, $search_call, $offset + 1)){
        $counter++;
        echo "\nQSO #$counter at pos: $offset";
    }

$end_time=microtime(true);
$time = $end_time - $start_time;
echo "\nFound $counter QSO's in $time secounds";
$time=$time*993864;
$hour=$time/3600;
echo "\nFor 993864 call's that should be more or less: $time Secounds... or $hour hours";
?>

Simple hum?

For now I will continue to use paper and a pen for log processing...

No comments: