[ Tutorial ] Make Salesforce REST API Calls With Postman

Posted on August 31, 2018 by ihong5 • Leave a comment

In this tutorial, we will learn how to make simple Salesforce REST API calls with Postman. To get started, I would highly recommend creating a free Salesforce org for your own personal use by signing up here.

OAuth2 Authentication with Salesforce

To make any Salesforce REST API calls, you’ll need a bearer token. Think of it as a REST API access token, and without it, you’ll get a response from Salesforce telling you that the resource you’re trying to access is protected and thus, you need to authenticate.

Here, you need to pass in 5 following parameters:

grant_type
client_id
client_secret
username
password

grant_type is always going to be a hard-coded string “password”. Your username and password parameters will be your login credentials, when you login to your Salesforce org. To retrieve client_id and client_secret, follow these steps:

Click Setup link in the top-right page.
On the left most panel, under Build section, expand Create and click Apps link.
In the Connected Apps section, there should already be an app created for you called “CPQ Integration User Connected App”. If so, click app name and skip to step 7.
If CPQ Integration User Connected App does not exist, Click New in the Connected Apps section.
In the following page, fill out these parameters:
1. Connected App Name – REST API test (you can name this whatever you desire)
2. API Name – will be auto-populated
3. Contact Email – test@email.com (can be fake email)
4. Check Enable OAuth Settings in the API (Enable OAuth Settings) section
5. Callback URL – http://localhost
6. Select Access and manage your data (api), Full access (full), and Perform requests on your behalf at any time (refresh_token, offline_access) from Selected OAuth Scopes and add them.
7. Should look like screenshot below:
Click Save, and then click Continue on the following page.
Here you’ll see the client_id and client_secret denoted by Consumer Key and Consumer Secret respectively. You’ll need to refer to these values when retrieving access token from Postman.

Last thing you’ll need is the security token. You can skip this paragraph if you already have a security token in hand. When sending OAuth request, you’ll need to have security token in hand. First, go to My Settings page. In the left panel, click Personal to expand submenus. Open the Reset My Security Token submenu, then click Reset My Security Token button. It will send you an email with your security token. It’s an 24-characters long alphanumeric string.

From Postman, create a new request. Set the HTTP method to POST, and the request URL should be https://login.salesforce.com/services/oauth2/token.

In the body, make sure to have form-data checked, and fill in these 5 parameters as follows:

grant_type = password
client_id = <client-id-from-step-7>
client_secret = <client-secret-from-step-7>
username = <your-username>
password = <your-password><security-token>

The password parameter should be your login password appended by security token you retrieved from above. For example, if your password is “mypass”, and your security token is “abcdEFGHijklmn1234567890”, your password parameter value should be “mypassabcdEFGHijklmn1234567890”.

Screen Shot 2018-08-31 at 2.24.01 PM

Click Send, and you’ll get a response as follows:

{
    "access_token": "****",
    "instance_url": "https://na57.salesforce.com",
    "id": "https://login.salesforce.com/id/****/****",
    "token_type": "Bearer",
    "issued_at": "1535739081652",
    "signature": "****"
}

Screen Shot 2018-08-31 at 2.31.22 PM
Only thing you need here is the access_token from the response.

Sending REST API Request

In the next request, we are going to pass in a simple SOQL query to retrieve all accounts from our Salesforce org.

Create a new request in Postman, and set the HTTP method to GET. Set the request URL to https://xx.salesforce.com/services/data/v43.0/query?q=SELECT+Id,Name+FROM+Account.

In the Authorization tab Type field, select Bearer Token. You’ll see a text field to paste in your bearer token. Copy and paste the access_token from the previous request, and click Send.

Screen Shot 2018-08-31 at 2.39.54 PM

You should now get a response like below.

{
    "totalSize": 12,
    "done": true,
    "records": [
        { .... },
        { .... },
        { .... },
        ...
    ]
}

Modified Consistent Hashing Rings in OpenStack Swift

Posted on August 22, 2014 by ihong5 • Tagged consistent hashing algorithm, consistent hashing rings, openstack, openstack swift • Leave a comment

Now let’s talk about how Swift takes a slightly different approach in consistent hashing algorithm, and talk about the importance of rings in Swift.

Please note: After referring to Swift articles few times, it is my belief that the terms drive and devices are used interchangeably, so I’ll be doing the same here.

On the last entry, I mentioned how some objects have to move to another drive when I add or remove a drive from the cluster. What happens to that object being transported? It won’t be available for use, and we wouldn’t want to wait for that object to move to another drive; we want it to be available at all times. Swift builds the ring (and rebuilds it by executing the rebalance command when necessary) to ensure the data is distributed throughout the ring evenly.

As far as I know, there are 3 types of rings that I have read so far: account rings, container rings, and object rings. To read more about what I found about the relations among account, container, and object, see Basic Architecture and Data Model of OpenStack Swift.

1. Partitions and Partition Power

Swift uses a partition, each with a fixed width. Those partitions are then assigned to drives by using a placement algorithm.

While Swift creates a cluster, it picks an integer – a partition power. It uses the value of partition power to calculate the total number of partitions to assign to the drives in the cluster. Let N = total number of partitions, and p = partition power:

N = 2^p

Let’s say that Swift chose 7 for the partition power. In that case, the cluster will have 2⁷ = 128 partitions, which will then be mapped to the available drives in the cluster. What’s good about this is that in the cluster, the number of partitions will stay the same at all times, although the number of drives may change, whether it is added or removed.

But that’s not all.

2. Replica Count and Replica Locks

That’s what Replica count is the number of partition copies – this is what makes Swift redundant; it keeps multiple copies of each partition to place across the cluster. Say that I have a replica count of three: Each partition will have 3 copies, and each copy of those partitions will be distributed among different devices in the cluster – helps with redundancy.

It helps us to have higher replica count of partitions; it keeps us more protected against losing data, or data not being available. Should the device be added or removed, and the partitions are moving to different devices, we still have other replica available.

Let’s say that I’m moving a partition, let’s call it partition_1, to another device. While one copy of partition_1 is being moved, replicas of that partition_1 should not be moved, so Swift uses replica locks to lock those replicas, so they won’t be able to move to another device to ensure availability of those replicas.

3. Data Distribution

Swift uses two other mechanisms to evenly distribute the data across the cluster.

3.1. Weight

Remember that I mentioned that data needs to be evenly distributed to help with the load balance when I talked about consistent hashing algorithm? (See Multiple Markers in Consistent Hashing Algorithm section) It appears the first mechanism, weight, helps the cluster to decide which partition power to choose, and to calculate a specific number of partitions needs to be assigned to each drive. This is a user-defined value during the cluster creation and also used when re-balancing the rings: that is, Swift will re-build the rings to evenly distribute data among different drives. Higher weight means higher number of partitions needs to be created, for one, so higher number of partitions need to be assigned to the drives.

3.2. Unique-as-possible

Second mechanism that Swift uses is more like a placement algorithm, called unique-as-possible. Simply put, this is an algorithm that finds the region ⇒ zone ⇒ ( <ip-address>:<port-number> ) that are not used as much compared to other regions, zones, and servers, in that order. If necessary, it will also find the drive that is not used as much. Once found, Swift places the partitions in them.

4. Ring Data Structures

Every time the ring is built (and rebuilt), it seems that two important data structures are also created as well: device list and device look-up table. Knowing that proxy server handles the REST calls from client, it is my belief that the proxy server relies on these two data structures to deal with the incoming/outgoing objects accordingly.

4.1. Device Look-up Table

Device Look-up Table contains an information that proxy server process looks up to find which device a certain data is located in. Say the client sends a GET request to download an object. It would calculate the hash value of that object sent with the GET request to map to the specific partition value. Also remember, the partitions are replicated and then mapped to different drives, so the process would be directed to the correct devices containing the object.

Each row in the Device Look-up Table represents the replicas (replica 0 being the first, 1 being second, and so on). Each column in the device look-up table represents the partition. Each data in the table represents the drive ID. Given that, the process looks at the device where the first replica is located, and then the next n – 1 rows, n being the number of replicas present in the cluster.

Example of device look-up table:

In the table above, we can see that the data was found in partition 2. Replicas 0, 1, and 2 are located in partition 2, which are mapped to the drives 5, 2, and 15.

4.2. Device List

Device List contains a list of active devices in the cluster. If we look more into the Swift architecture and its data models, this will probably make a lot more sense. Each device (which maps the partitions) belongs to the storage node, which in turn belongs to a zone. Each individual zone belongs to a region. That region is a typical geographical location that are user-defined values, when they are prompted to provide values for country, city, (and some others) while creating a Swift cluster. Above all, those regions all fall into one Swift cluster (see Basic Architecture and Data Models of OpenStack Swift for more details)

So the point of a device list is to contain information about each device: what region/zone they fall in, its weight, device ID, etc. I believe that proxy server uses this information to handle the REST calls, and refer the objects from/to the correct devices.

Example of device list:

	0	1	2	3	…
Devices	device id=0 region=2 zone=1 weight=100 …	device id=1 region=3 zone=1 weight=100 …	device id=2 region=1 zone=3 weight=100 …	device id=3 region=2 zone=2 weight=100 …	…

Going back to the earlier example with device look-up table when the proxy server process found the data in partition 2, it also found the first replica of the data was found in partition 2 of the drive 2. So then it will refer to the device list and look at the device 2, see that it is located in region 1, zone 3, and so forth.

Consistent Hashing Algorithm

Posted on August 19, 2014 by ihong5 • Tagged consistent hashing algorithm • 2 Comments

Today, I’ll talk about how the Consistent Hashing Algorithm works, which will be followed by how it is utilized in OpenStack Swift Rings in my next blog entry. To start off, let’s talk about the Basic Hash Function.

1. Basic Hash Function

Basic hash function maps the objects into different drives based on the hash of an object so it can be fetched later. It is probably best to think of the basic hashing algorithm as a typical encyclopedia; it doesn’t matter how many times I look up the information on computers, I will always be checking for a CAL – EDU volume (as long as I am looking through the same edition). Think of the encyclopedia volume as a drive, and the encyclopedia volume number as a mapping value.

Let’s say I have a series of objects I want to store, and I have 4 drives (or partitions) in my storage server, which are labeled Drive 0, 1, 2 and 3. In basic hash function, it will take the object data and hash it using the MD5 hashing algorithm, to produce a shorter, fixed length. It generates a hexadecimal value of length 32, which can be converted into decimal value. Finally, it will divide that hash value by the number of drives; in our example, it is 4, because I have 4 drives. It then stores that object based on the remainder of the division, which will be any value from 0 to 3 – let’s call this value a mapping value.

Here are the example objects I want to store and their hash values:

Table 1.1. Mapping of Objects to Different Drives using Basic Hash Function

Mapping of Objects to Different Drives
Object	Hash Value (Hexadecimal)	Mapping Value	Drive Mapped To
Image 1	b5e7d988cfdb78bc3be1a9c221a8f744	hash(Image 1) % 4 = 2	Drive 2
Image 2	943359f44dc87f6a16973c79827a038c	hash(Image 2) % 4 = 3	Drive 3
Image 3	1213f717f7f754f050d0246fb7d6c43b	hash(Image 3) % 4 = 3	Drive 3
Music 1	4b46f1381a53605fc0f93a93d55bf8be	hash(Music 1) % 4 = 1	Drive 1
Music 2	ecb27b466c32a56730298e55bcace257	hash(Music 2) % 4 = 0	Drive 0
Music 3	508259dfec6b1544f4ad6e4d52964f59	hash(Music 3) % 4 = 0	Drive 0
Movie 1	69db47ace5f026310ab170b02ac8bc58	hash(Movie 1) % 4 = 2	Drive 2
Movie 2	c4abbd49974ba44c169c220dadbdac71	hash(Movie 2) % 4 = 1	Drive 1

But what if we have to add/remove drives? The hash values of all objects will stay the same, but we need to re-compute the mapping value for all objects, then re-map them to the different drives.

That’s too much work for our servers.

2. Consistent Hashing Algorithm

Consistent hashing algorithm achieves a similar goal but does things differently. It will still hash the object data, but instead of getting the mapping value of each object, each drive will be assigned a range of hash values to store the objects. Again, think of this as an encyclopedia; each volume will be the drive, except that the range of first 3 letters of information each volume contains is like the hash value of each object mapped to a drive accordingly.

Table 2.1. Range of Hash Values for Each Drive

Range of Hash Values for Each Drive
Drive	Range of Hash Values
Drive 0	0000… ~ 3fff…
Drive 1	3fff… ~ 7ffe…
Drive 2	7fff… ~ bffd…
Drive 3	bffd… ~ ffff…

Note: This is just an example. Hash values are much longer.

Table 2.2. Range of Hash Values for Each Drive

Mapping of Objects to Different Drives
Object	Hash Value (Hexadecimal)	Drive Mapped To
Image 1	b5e7d988cfdb78bc3be1a9c221a8f744	Drive 2
Image 2	943359f44dc87f6a16973c79827a038c	Drive 2
Image 3	1213f717f7f754f050d0246fb7d6c43b	Drive 0
Music 1	4b46f1381a53605fc0f93a93d55bf8be	Drive 1
Music 2	ecb27b466c32a56730298e55bcace257	Drive 3
Music 3	508259dfec6b1544f4ad6e4d52964f59	Drive 1
Movie 1	69db47ace5f026310ab170b02ac8bc58	Drive 1
Movie 2	c4abbd49974ba44c169c220dadbdac71	Drive 3

Now if I added additional drives, only thing that changes is each drive will get a new range of hash values it is going to store. Each object’s hash value will still remain the same. Any objects whose hash value is within range of its current drive will remain. For any other objects whose hash value is not within range of its current drive will be mapped to another drive; but that number of objects is very few using consistent hashing algorithm, compared to the basic hash function.

I’ll add another drive and re-illustrate my point on the picture below:

Notice how only Movie 2 and Music 2 objects were mapped to my new drive (drive 4), and Image 1 had to be mapped to drive 3. If we used basic hash function, we would most likely have to re-calculate the mapping values for all objects, and re-map them accordingly. Imagine how much workload that is for thousands, or even millions of objects.

But there’s more to it once it’s modified.

3. Multiple Markers in Consistent Hashing Algorithm

First, let’s look at what the multiple markers do for us.

Remember in consistent hashing algorithm, each drive has one big range of hash values to map the objects. Multiple markers helps to evenly distribute the objects into drives, thus helping with the load balancing, but how?

Instead of having one big hash range for each drive, multiple markers serve to split those large hash range into smaller chunks, and those smaller hash ranges will be assigned to different drives in the server. How does that help?

Let’s say I have 20 objects I want to store, and I still have 4 drives, each with different range of hash values of equal length. But what if out of those 20 objects, maybe 14 are mapped to drive 0, and the rest are equally distributed to drives 1, 2, and 3? This causes the ring to be unbalanced in weight, because drive 0 holds much more hash values than the rest of the drives. This is where the smaller hash ranges can help a lot with load balancing.

As mentioned earlier, consistent hashing algorithm uses multiple markers for the drives to map several smaller ranges of hash values instead of one big range. This has two positive effects: First, if the new drive was to be added, that new drive will gain more objects from all other existing drives in the server, instead of just a few objects from a neighboring drive – this results in more and smaller hash ranges. Likewise, if one of the existing drive was to be removed, all objects that drive was holding onto will be evenly distributed to the other existing drives – results in less and larger hash ranges. Second, by doing this, the overall distribution of objects will be fairly even, meaning the weight among different drives will be very close to evenly distributed – helps with load balancing.

Picture above shows several objects close to each other in terms of its hash value are distributed among different segments of the different drives. Multiple markers splits 4 big hash ranges into several smaller hash ranges and assigns them into all other drives.

On my next entry, I will talk about how Swift utilizes this algorithm and how it takes a different approach.

[ Tutorial ] Apache ZooKeeper – Setting ACL in ZooKeeper Client

Posted on July 24, 2014 by ihong5 • Tagged ZooKeeper, zookeeper acl • 6 Comments

Now let’s talk about setting the ACL of a znode in ZooKeeper. Before getting into the details, let’s talk more about the scheme and ID.

1. Scheme and ID

ID, as name suggests, is an identifier comprised of a username and password. By default, when the znode has an ACL set accessible by a specific group of users or an individual, the <username>:<password> is first hashed using an SHA-1 hashing algorithm, and then it (hex-string) is base-64 encoded.

As mentioned in earlier blog entries, scheme is like a group of users that are authorized to access a certain znode with a scheme-and-id-specific ACL set.

1.1. World Scheme

World scheme has one ID (anyone). This represents any user in the world.

For example, we type in the following command to set the znode accessible by anyone.

setAcl /newznode world:anyone:crdwa

By doing it correctly, You should get something like this in return:

1.2. Auth Scheme

Auth scheme represents a "manually" set group of authenticated users. According to the ZooKeeper documentation (http://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html), auth does not utilize any ID. Unless I am mistaken, this seems not to be the case. Because if you try to set ACL on a znode using auth scheme and not provide any ID, it tells you that is not a valid ID, or some form of ID is needed. Below is a (bad) example:

setAcl /newznode auth:crdwa

as seen above, I did not provide any form of ID. This is what I get:

A correct way to use this scheme would be as follows:

setAcl /newznode auth:username:password:crdwa

Using auth scheme allows us to have multiple authorized users to access a single znode with the different username and password combination. Say we have 3 users:

username : password
user_123 : pwd_123
user_456 : pwd_456
user_789 : pwd_789

We can use the same syntax above by replacing username with user_123, user_456, or user_789 and password with pwd_123, pwd_456, or pwd_789 respectively.

1.2.1 addauth Command

One important thing to note is you must use the addauth command before proceeding to set the ACL of a znode using the auth scheme. If you try to set the ACL before executing the addauth command, you will get an error as below:

Correct way to do is to execute addauth command first, and then execute the setAcl command. Below is the syntax of command execution for addauth:

addauth /<node-name> digest <username>:<password>

By adding the authenticator and setting ACL accordingly, you can ensure that you set the ACL correctly.

Repeat the steps for additional username and password combo, and the ACL for that newznode looks like this:

1.3. Digest Scheme

Digest scheme represents an individual user with authentication. This uses username:password string that is hashed using the SHA-1 hashing algorithm, and that hashed string is in turn base64 encoded. According to the ZooKeeper website, it is stated that the MD5 hash of <username>:<password> is used as an ACL ID identity. Unless I am mistaken, that seems not to be the case. Instead, what I found was that <username>:<Base64 encoded SHA-1 hash of username:password> is used as an ACL ID (Please see above pictures under the Auth section).

What’s really funny is that if I authenticate an individual user on a znode using digest scheme on ZooKeeper client, instead of storing the username and encoded hash string of <username:password> like it should, it stores a clear, human-readable text of <username:password> as an ID. Executing the addauth command before setting the ACL with digest scheme does not work either. Below is the picture that illustrates my point:

Unless it is easy to work backwards – decoding the user_abc:pwd_abc, and then take that decoded string and undo the SHA-1 hashing part, it turns out setting ACL using digest scheme on a znode in ZooKeeper client is pointless.

Good thing is that if you setAcl a znode using digest scheme via client, you can delete it.

1.4. Host Scheme

Host scheme represents anyone within the same hosting server. I have not done enough with the host scheme yet, but I will come back to this with more details.

1.5. IP Scheme

IP scheme represents any user within the same IP address. Easiest example to use in this case would be 127.0.0.1, which represents the user of that any local machine, since any local machine will have 127.0.0.1 point to the localhost. Below is the syntax of setAcl using IP scheme:

setAcl /<node-name> ip:<IPv4-address>:<permission-set>

Using the syntax above, below is an example using the 127.0.0.1 IP address:

setAcl /newnode ip:127.0.0.1:crdwa

If done correctly, you should get the znode stat like the picture below:

That is it for now. On my next blog post, I will briefly talk about how to access them in Java; furthermore, I will talk more in detail about how username and password are stored. Thanks for reading as usual, and happy zookeeping!

[ Tutorial ] Apache ZooKeeper ACL (Access Control List) Getting Permission Sets

Posted on July 10, 2014 by ihong5 • Tagged ZooKeeper, zookeeper acl • Leave a comment

Today, I will talk about the basics of ACL in ZooKeeper and getting the permission sets of ACL.

1. What is ACL?

ACL (Access Control List) is basically an authentication mechanism implemented in ZooKeeper. It makes znodes accessible to users, depending on how it is set. For example, if its scheme is set to world and ID set to anyone, then it is accessible by anyone in the world, thus the world scheme and anyone ID. However, if the scheme is anything other than the world, then it’s a different story. Let’s talk about the basics and its attributes first.

A typical permission set of a znode looks something like this: crdwa. This is actually an acronym (can be in any order) that stands for: Create, Read, Delete, Write, and Admin.

1.1. Getting the ACL of a Znode

To get the ACL of a particular znode, we execute the getAcl command in the ZooKeeper client.

It will return that znode’s ACL in this format:

'[ scheme ],'[ id ]
: [ permission-set ]

Syntax of the getAcl command is: getAcl Path

Example: getAcl /getmyacl

Think of the scheme as more like a specific group of users. The world scheme would represent everyone in the world, literally. There are also different schemes in ZooKeeper, which are digest (individual user with unique username and password), ip, which is an individual or group of users within the same IP address, and host, which is a group of users within the same host.

ID I believe is self-explanatory; should the scheme be world, then ID always has to be anyone. There is no point to restrict specific users if it is meant to be viewed by anyone.

Here, I have an example of getting the ACL of the getmyacl znode. By typing in the command getAcl /getmyacl, you will get something like this:

1.2. More About Permission Set

Notice how the permission set says crdwa. If you were trying to get the permission set of a znode in Java, you would get an integer value in return.

First off, you would call getPerms method to get the permission set of a znode in Java. As mentioned earlier, it returns an integer value. In this case, with this znode having a permission set of crdwa, in Java it returns 31, meaning that the user is authorized to create a child znode, read data of that znode, delete that znode, overwrite (or set data) the znode, and has administrative rights of that znode.

Each permission (create, read, delete, write, admin) is actually a bit, either 0 or 1, where 0 represents not allowed, and 1 represents allowed. So, if you convert that 31 into a binary number, you would get 11111. Refer to the following bullet points:

Read – 2^0
Write – 2^1
Create – 2^2
Delete – 2^3
Admin – 2^4

Say we have a getmyaccl znode. Create, read, and admin are allowed, but delete and write are not. According to my little bullet points above, in Java it would return 21 for the permission set. Convert that to binary, we get 10101 ( (2^4 = 16) + (2^2 = 4) + (2^0 = 1) ) = 21

Let’s try to change its permission set to cwa (create, write, admin) and see what integer value is returned in Java.

This time it returned 22, or 10110 ( (2^4 = 16) + (2^2 = 4) + (2^1 = 2) ) = 22

To get the permission set of a znode, we need to import ACL class (from ZooKeeper package) and ArrayList. First, we need to create an instance of ArrayList that can store ACL object, and create a new instance of ACL object, assign that to the first element of the ArrayList. What’s interesting is that ArrayList contains only one element. Following is the code snippet on how to get the permission set of a znode:

List acl = new ArrayList(); // create new instance of ArrayList to store ACL object
acl = zk.getACL("/getmyacl", stat); 
ACL aclElement = acl.get(0);
System.out.println(aclElement.getPerms()); // for printing the permission set on the screen.  
                                    // this is also how I get 21 and 22 earlier for the permission set.

When creating a znode the simplest way, any user is authorized the full crdwa permission set. I will talk more about setting the permission set in 2 different blog entries: First one will be the easy; where any user can access the znode. Second one will be tricky (also to talk about and explain), which involves an individual’s username and password, group of users within the same host or the IP address.

This sums up how to get the permission set of a znode. As usual, thanks for reading, and happy zookeeping!

[ Tutorial ] How to Install And Setup Apache ZooKeeper Standalone (Windows)

Posted on July 7, 2014 by ihong5 • Tagged ZooKeeper • 1 Comment

Today, I will talk about how to install Apache ZooKeeper and run the instance of it.

Prerequisites: JRE 1.6 or higher required (for development purposes, I would install JDK instead 1.6 or higher) At the time of writing this blog, the current stable version of Java is 1.8 and that should work perfectly fine (I have 1.7.0_51)

NOTE: I noticed that some of my peers tend to forget to set the environment variables, so please remember to set them before proceeding.

1. Installing Apache ZooKeeper

1. Download Apache ZooKeeper. You can choose from any given mirror – http://www.apache.org/dyn/closer.cgi/zookeeper/
2. Extract it to where you want to install ZooKeeper. I prefer to save it in the C:\dev\tools directory. Unless you prefer this way as well, you will have to create that directory yourself.
3. Set up the environment variable.

To do this, first go to Computer, then click on the System Properties button.

Click on the Advanced System Settings link to the left.

On a new window pop-up, click on the Environment Variables... button.

Under the System Variables section, click New...
For the Variable Name, type in ZOOKEEPER_HOME. Variable Value will be the directory of where you installed the ZooKeeper. Taking mine for example, it would be C:\dev\tools\zookeeper-3.x.x.

Now we have to edit the PATH variable. Select Path from the list and click Edit...
It is VERY important that you DO NOT erase the pre-existing value of the Path variable. At the very end of the variable value, add the following: %ZOOKEEPER_HOME%\bin; Also, each value needs to be separated by semicolon.

Once that’s done, click OK and exit out of them all.

That takes care of the ZooKeeper installation part. Now we have to configure it so the instance of ZooKeeper will run properly.

2. Configuring ZooKeeper Server

If you look at the <zookeeper-install-directory> there should be a conf folder. Open that up, and then you’ll see a zoo-sample.cfg file. Copy and paste it in the same directory, it should produce a zoo-sample - Copy.cfg file. Open that with your favorite text editor (Microsoft Notepad should work as well).

Edit the file as follows:

tickTime=2000
initLimit=5
syncLimit=5
dataDir=/usr/zookeeper/data
clientPort=2181
server.1=localhost:2888:3888

NOTE: you really don’t need lines 2 (initLimit=5), 3 (syncLimit=5), and 6 (server.1=localhost:2888:3888). They’re just there for a good practice purposes, and especially for setting up a multi-server cluster, which we are not going to do here.

Save it as zoo.cfg. Also the original zoo-sample.cfg file, go ahead and delete it, as it is not needed.

Next step is to create a myid file. If you noticed earlier in the zoo.cfg file, we wrote dataDir=/usr/zookeeper/data. This is actually a directory you’re going to have to create in the C drive. Simply put, this is the directory that ZooKeeper is going to look at to identify that instance of ZooKeeper. We’re going to write 1 in that file.

So go ahead and create that usr/zookeeper/data directory, and then open up your favorite text editor.

Just type in 1, and save it as myid, set the file type as All files. This may not be insignificant, but we are going to not provide it any file extension, this is just for the convention.

Don’t worry about the version-2 directory from the picture. That is automatically generated once you start the instance of ZooKeeper server.

At this point, you should be done configuring ZooKeeper. Now close out of everything, click the Start button, and open up a command prompt.

3. Test an Instance of Running ZooKeeper Server

Type in the following command: zkServer.cmd and hit enter. You should get some junk like this that don’t mean much to us.

Now open up another command prompt in a new window. Type in the following command: zkCli.cmd and hit enter. Assuming you did everything correctly, you should get [zk: localhost:2181<CONNECTED> 0] at the very last line. See picture below:

If you are getting the same result, then you setup ZooKeeper server correctly. Thanks for reading, and happy zookeeping!

[ Tutorial ] ZNode Types and How to Create, Read, Delete, and Write in ZooKeeper (Java)

Posted on July 7, 2014 by ihong5 • Tagged ZooKeeper • Leave a comment

Please click here to read about Create, Read, Delete, and Write znodes in ZooKeeper Client.

This is more like a continuation of my blog entry about creating, deleting, reading, and writing a znode in ZooKeeper Client. It will be mostly a few snippets of code. Although there are multiple ways, I will only present the simplest and easiest ways to create, read, delete and write znodes.

1. Creating a Znode

Java’s ZooKeeper create method takes in 4 parameters: Path [String], Data [byte array], Access Control List [ArrayList] and Mode [CreateMode]. Enclosed in the square brackets is the object type each parameter is.

As mentioned few times in my earlier blog entries, path always needs to start with the root znode ( / ). Remember, the path consists of the root znode and the name of the znode you want to create. For example, to create a znode called new_znode, you would write /new_znode for its path.

Data of a znode is stored in byte array. For the learning purposes, we will use the getBytes() method, which is a String method that converts String into byte array. Size of each znode is 1MB.

Access Control List (ACL) is stored in an ArrayList. We will use the final variable in the Ids class, which we will need to import. Furthermore, I will talk about the ACL more in detail. But for now, we will just stick with the Ids class.

Mode is the types of znodes I mentioned on my last blog entry: persistent, ephemeral, and sequential. We will be importing the CreateMode class to do this. There are four modes: Persistent (default), Persistent-Sequential, Ephemeral, and Ephemeral-Sequential. For now, we will use the default mode.

Example create method:

public void createZnode(String path, byte[] data) throws KeeperException, InterruptedException {
    zk.create(path, data, Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
}

create method throws KeeperException and InterruptedException. To anticipate the possibility of throwing either exceptions, surround the create method in a try...catch block.

Another example create method:

try {
    zk.create("/new_znode", "new znode".getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
} catch(InterruptedException intrEx) {
    // do something to prevent the program from crashing
    System.out.println("\"new_znode\" already exists!");
} catch(KeeperException kpEx) {
    // do something to prevent the program from crashing
    System.out.println("\"new_znode\" already exists!");
}

2. Reading a Znode

To read a znode, we call the getData method. It takes 3 parameters: Path [String], watch [boolean], stat [Stat].

Again, it takes in the path to specify which znode you want to read. Watcher is relatively a simple concept, but I will talk more about this later. Stat contains a more in-depth information about the znode, such as number of children, when it was created, etc. Like the create method, this also throws InterruptedException and KeeperException.

Example getData method:

public byte[] readZnode(String path) throws KeeperException, InterruptedException {
    return zk.getData(path, true, zk.exists(path, true));
}

Another example getData method using try...catch block:

try {
    zk.getData("/new_znode", true, zk.exists("/new_znode", true));
} catch(InterruptedException intrEx) {
    // do something to prevent the program from crashing
    System.out.println("\"new_znode\" does not exist!");
} catch(KeeperException kpEx) {
    // do something to prevent the program from crashing
    System.out.println("\"new_znode\" does not exist!");
}

As you may have noticed, getData takes in a Watcher as part of the parameter. Watcher in a nutshell is a mechanism implemented in the ZooKeeper that keeps watch of the znode you specified; it serves to notify the client whether it has been deleted, its data has changed, or if there was any changes to that znode, it will notify the client. One thing to note is that once the Watcher notified the client of a watch_me znode (for example), a new Watcher needs to be set on that same znode again, or else it will not notify the client for the second time.

3. Writing (or Re-Writing) to a Znode

We use setData for writing to a znode. This method takes in 3 parameters: Path [String] as always, data [byte Array] that will overwrite the pre-existing data, and the version [int].

We have a new parameter (but fairly self-explanatory), which is version. Every time the znode gets updated, it makes sense to update its version. Because of this, if you try to pass in the integer value that is not a current version, it will throw a BadVersionException. Here is an example:

I have a znode named newznode and its dataVersion is 5. It would be a major hassle to go back to the code to update its version manually every time the client tries to update its data. Instead, we utilize the getVersion method (from Stat class) and pass that as an argument as follows:

public void writeZnode(String path, byte[] data) throws KeeperException, InterruptedException {
    Stat stat = zk.exists(path, true);
    zk.setData(path, data, stat.getVersion());
}

As usual, another example of setData method using the try...catch block:

try {
    Stat stat = zk.exists("/new_znode", true);
    zk.setData("/new_znode", "new data".getBytes(), stat.getVersion());
} catch(InterruptedException intrEx) {
    // do something to prevent the program from crashing
    System.out.println("\"new_znode\" does not exist!");
} catch(KeeperException kpEx) {
    // do something to prevent the program from crashing
    System.out.println("\"new_znode\" does not exist!");
}

Now, let’s try to pass in a random integer other than its version. Same scenario (except its version is now 6).

Instead of passing in the stat.getVersion() like I should be, I’ll try to pass in a non-6 value, and you should expect to see this:

4. Deleting a Znode

To delete a znode, first we use the delete method (I’m sure you could have guessed that much), and it takes in 2 parameters: Path [String] and Version [int]. Again, because I have already went over these two parameters, I will move straight into the code. Version parameter in this method is exactly the same as the version from the setData method.

public void deleteZnode(String path) throws KeeperException, InterruptedException {
    Stat stat = zk.exists(path, true);
    zk.delete(path, stat.getVersion());
}

And finally, another (and the last) example using the try...catch block:

try {
    Stat stat = zk.exists("/new_znode", true);
    zk.delete("/new_znode", stat.getVersion());
} catch(InterruptedException intrEx) {
    // do something to prevent the program from crashing
    System.out.println("\"new_znode\" does not exist!");
} catch(KeeperException kpEx) {
    // do something to prevent the program from crashing
    System.out.println("\"new_znode\" does not exist!");
}

Thanks for reading. Happy zookeeping

OpenStack Swift Introduction

Posted on July 2, 2014 by ihong5 • Tagged object storage, openstack, swift • Leave a comment

I heard about the OpenStack technology while working as an intern. My former supervisor, Kevin Stoll, introduced the interns and myself the OpenStack, with him being a huge fan of cloud architecture, automation, and other web service related technologies (He is also big on RESTful).

OpenStack technology has multiple open-source software, most of them which can be found in their own Github repository. Out of them, I will talk about Swift today.

What is OpenStack Swift?

OpenStack Swift (or just Swift, also known as StackSwift) is a powerful object storage system that is designed to store objects, such as media, documents, data, backups, projects, etc. Swift is a highly scalable software, exceptionally high that they claim it is able to “scale to thousands of machines providing hundreds of Petabytes of storage distributed in geographically distant regions.” (from https://swiftstack.com/openstack-swift/architecture) With a claim like this, it is seemingly guaranteed that Swift is able to scale horizontally with zero failures.

Swift utilizes the RESTful API (Representational State Transfer Application Programming Interface), which enables users to create, write (or overwrite), delete and read objects, as well as metadata, primarily for indexing and searching objects.

Now let’s talk about some of the characteristics that makes Swift attractive to big companies:

Swift is open-source software and freely available (https://github.com/openstack/swift)
Ideally runs on Linux OS and other x86 servers
Comparable to Amazon S3 (Amazon Simple Storage Service) in terms of scalability and eventual consistency
All objects have its own URL for access via browser
All objects are accessed and administered via RESTful HTTP API
Swift stores multiple copies of objects (to prevent loss should server crash)

How is/What makes Swift Strongly Consistent?

No one cannot stress enough how important it is for the object storage to maintain strong consistency. Swift is a prime example; it will want to prevent the loss of objects and/or data, and keep them from being corrupt as long as possible. Otherwise, it would defeat the purpose of Swift being an object storage software.

Swift protects its objects by keeping multiple copies of object throughout multiple nodes. This allows clients to access objects, should a node, or even multiple nodes fail. The way Swift is designed ensures all copies of stored objects to be most up-to-date. This way, should the node(s) fail, users will still have access to a good copy of objects. As the number and volume of objects grow, more distribution across multiple regions occur; this in turn allows Swift to maintain its already strong consistency.

What Are My Plans Regarding Swift

First things first: learn more about Swift, study the code, and probably make my own object storage server. As a developer, I would like to make some contributions, no matter how small it may be. Before working as an intern and being introduced to these concepts, I have always wanted to create my own home cloud server. Back then, I wanted to make my own tutorial videos and host them in my own server for other visitors to stream. It may be a small, but centralized start, I believe that Swift is an optimal solution for it. Being a developer, I am also planning to make contributions to this software and utilize it in my own home server later in the future. In addition to that, I will write more blog entries to talk more about Swift.

IBM InfoSphere DataStage Introduction

Posted on July 1, 2014 by ihong5 • Tagged data integration, IBM InfoSphere DataStage • Leave a comment

I recently started my new job and learned a little about the IBM InfoSphere DataStage software, as well as doing some research about it.

IBM InfoSphere DataStage (I’ll refer to this as DataStage from now on for short) is an ETL (Extract, Transform, Load) tool used for data integration, regardless type of data. It utilizes a high performance parallel framework for simultaneous processing. Thanks to its highly scalable platform, it can provide highly flexible data integration, including the big data both in static and dynamic on a distributed file systems.

DataStage uses client-to-server design, where the client would constantly communicate with the server to create jobs (up to this point, I have been creating a lot of Parallel Jobs, which I will talk more about in detail), and they are administered against the central repository located in the server.

Repository In a Nutshell

Every time a client access the server, they are faced with the repository panel on the left side, by default. Basically it is a set of directories, or more like a file system (very similar to that of ZooKeeper) where the clients would basically administer batch(es) of job(s). In addition to the batches of jobs, this is also where the data schema are stored, as well as the actual data that are read off of. I believe there is a lot more to it, but as far as my experience goes, that’s what I was able to get out of.

Key Strengths, Benefits and Features

DataStage apparently is an ideal tool for data integration, whether it be a so-called big data, or the system migration. Any job the client creates, it is able to import, export, or create the metadata. Somewhat similar to the operating systems, it is able to administer the jobs; such as scheduling each and individual jobs, monitoring the progress of jobs executed through log, and just executing (or running) the jobs. Also, thanks to its graphical notation of data integration, development and job execution can be administered in a single environment.

Additional Benefits and features of DataStage includes, but not limited to:

Due to its high scalabiltity, this allows the integration and transformation of large chunks and volumes of data.
Able to directly access big data on a distributed file system. Also the JSON support and new JDBC connector are both provided.
High speed for the flexibility and effectiveness of the building, deploying, updating, and managing the data integration.

Relationship With Apache Hadoop and ZooKeeper?

DataStage's relationship to the Apache Hadoop (which ZooKeeper is also linked, being started out as the sub-project of Hadoop) caught my attention. After further research, I learned that DataStage's ability to integrate big data, utilizing the concept of parallelism is based on Hadoop. In addition, ZooKeeper is well known for easing the distributed system builds. DataStage is able to integrate big data statically or in motion on a distributed and mainframe platforms.

[ Tutorial ] ZNode Types and How to Create, Read, Delete, and Write in ZooKeeper (via zkClient)

Posted on June 24, 2014 by ihong5 • Tagged ZooKeeper • Leave a comment

Please click here to read about Create, Read, Delete, and Write znodes in Java.

About a month ago, I wrote a blog entry about how to connect to a ZooKeeper. Now, I will talk about how to create, read, delete, and write the znodes, which are what I will refer to as the permission sets later on. First way is to do it via command prompt (in Windows) as a client, and on my next blog entry, I will talk about how to do them in Java.

Each znode has its own permission sets. They are: Create, Read, Delete, Write, and Admin (abbreviated CRDWA). I will talk more in detail about the permission sets, how to set them, as well as the access control list (ACL) in the later blog posts.

1. Types of Znodes

Before I get into creating them, let’s briefly talk about the types of znodes: persistent, ephemeral, and sequential.

1.1. Persistent Znodes

These are the default znodes in ZooKeeper. They will stay in the zookeeper server permanently, as long as any other clients (including the creator) leave it alone.

1.2. Ephemeral Znodes

Ephemeral znodes (also referred as session znodes) are temporary znodes. Unlike the persistent znodes, they are destroyed as soon as the creator client logs out of the ZooKeeper server. For example, let’s say client1 created eznode1. Once client1 logs out of the ZooKeeper server, the eznode1 gets destroyed.

1.3. Sequential Znodes

Sequential znode is given a 10-digit number in a numerical order at the end of its name. Let’s say client1 created a sznode1. In the ZooKeeper server, the sznode1 will be named like this:

sznode0000000001

If client1 creates another sequential znode, it would bear the next number in a sequence. So the next sequential znode will be called <znode name>0000000002.

2. Znode “Anatomy”

Each znode has variety of different stats, such as its path, name, data it stores, when it was created, its own permission sets, etc. For the sake of simplicity and demonstration, I will only talk about the path, name and data.

Path of all znodes will ALWAYS start with the root, or a slash symbol. In this example, the path of znode1 would be as follows: /znode1, and its name is znode1. All znodes consist of data (it can also be blank). Keep in mind that data is stored in a byte array, because this will be important to know by the time we deal with znode data in Java.

In order to do anything with a znode, you must specify its path. If you only specify its name, it will tell you about the syntax error, so keep that in mind.

3. Creating a Znode

As mentioned earlier, znodes are persistent by default. In a ZooKeeper client, we type in commands to perform different actions with the znode, such as create, delete, update its data, etc.

To create a znode, we need to specify its path (see above). Now remember, a path of any znode ALWAYS starts with the root znode. The command syntax for creating a znode is as follows:

create -<options> /<znode-name> <znode-data>

With that in mind, following are the examples for creating different types of znodes:

Persistent (Default): create /znode mydata

Ephemeral: create –e /eznode mydata

Sequential: create –s /sznode mydata

Each znode can also have a child znodes; this really depends on the permission set of that particular znode. For instance, if a nochild znode has a RDWA permission set, where create is not allowed, then nochild znode cannot have any children znodes. Please note that in the following syntax:

create /<parent-znode>/<child-znode> <child-znode-data>

the <parent-znode> MUST exist in order to create the <child-znode%gt;; otherwise, it will not work.

** NOTE: ephemeral znodes CANNOT contain any child znode! Because ephemeral znodes are temporary, they will be destroyed should the creator client logs out of the ZooKeeper server, meaning all children znodes under that ephemeral znode will be destroyed automatically as well. To prevent that, ephemeral znodes cannot bear any child znode.

4. Deleting a Znode

There are 2 ways to delete a znode. First way is to just typical deletion, and second way is recursive deletion.

delete /<znode>

We have a deleteme znode, and as you may have guessed, following command deletes the deleteme znode:

delete /deleteme

If we wanted to delete a child znode (i_am_bug in this example) instead, we just need to specify the path of that child znode, like so:

delete /i_have_bug/i_am_bug

Second way to delete a znode is recursive deletion. This method is necessary to delete a znode with child(ren) znode(s), because using the delete command will not work.

Recursion in general is taking a big problem, and taking baby steps to solve it. It’s also known as the “divide and conquer” approach of solving the problem. In this case, it will start deleting znodes at the lowermost level first, one by one, then work its way up.

rmr /<znode-with-child>

I really like this command, because it works with znodes without child(ren) z(s) as well. In that case, might as well use rmr command for any other znodes, just be careful not to delete the child znode you need to keep.

5. Reading a Znode Data

Here, we use the get command to fetch the data of that particular znode. As always, specify the path of the znode you want to get the data off of.

get /<znode-name>

This command will also return what’s also known as the stat. Data is located on the top line. Stat provides the detailed information of the znode, such as when it was created/re-written, version, number of children it contains, etc.

The only time get command will not work on a znode is if the read permission is not allowed in the permission set, or if the znode has ACL (Access Control List) set to digest, hosts (depends on the hosts), or IP (also depends on the IP). I will talk more about the ACL in my later blog posts.

6. (Re) Writing a Znode Data

We use the set command to overwriting the znode data. As always, specify the path of the znode you want to overwrite.

set /<znode-name> <new-data>

Once executed, it will return the stat of the znode, excluding the new data you have set. Like the get command, the set command will not work if the write permission is not allowed in the permission set, or if its ACL has been configured accordingly.

By now, you should be able to execute basic commands for the znodes. In the next blog post, I will talk about how to do them in Java.